1. Introduction
Synthetic Aperture Radar (SAR) is an active Earth observation system. Compared with the optical Earth observation system, SAR has the capability of all-day, all-weather Earth observation, which has important application value in the fields of military reconnaissance, resource survey and disaster warning [
1,
2,
3].
SAR ship target detection is one of the important contents of SAR image application, initially people use constant false alarm rate (CFAR) to detect SAR images [
4,
5,
6], which is a ship detection algorithm based on the statistical distribution of background clutter, and its use of statistical distribution to model the image background clutter. However, this scheme, which favours manual parameter selection, often has unsatisfactory detection results.
The emergence of neural network algorithms has led to significant breakthroughs in areas such as target detection. AlexNet [
7] was a pioneer in using Convolutional Neural Networks (CNNs) for the first time, and its model won the 2012 Imagenet Image Recognition Competition. Several subsequent model architectures, such as ResNet [
8] and DenseNet [
9], have addressed network degradation during training through residual concatenation. Additionally, CSPNet [
10] has reduced training costs by reducing repetitive gradient computations. Target detection algorithms can generally be classified into two categories: single-stage and two-stage. Two-stage algorithms, represented by the R-CNN [
11,
12,
13] family, generate a large number of prediction frames in an image, and then train convolution for each prediction frame. In contrast, single-stage algorithms, such as YOLO [
14,
15,
16], SSD [
17], and RetinaNet [
18], use whole-image convolution to make training faster and more efficient. Although the two-stage model initially outperformed the single-stage model in terms of generalization ability, the single-stage model gradually surpassed the two-stage model and achieved better performance as the YOLO model was continuously updated and iterated.
In the evolution of the YOLO series of algorithms, several modules have been added to enhance the model’s performance. The FPN [
19] network structure utilises a multi-scale fusion approach to combine feature information from the top to the bottom. This is because in the feature extraction process, the high-level feature map contains stronger semantic information but destroys the small targets, while the bottom-level feature map protects the small targets but does not have better semantic information. The PAN [
20] structure further improves the performance by adding a bottom-up approach to the FPN, enhancing the model’s robustness and detection ability.
In addition to improving the network structure, target detection algorithms can also optimise performance through data enhancement [
21], loss function design [
22], and post-processing. Data enhancement techniques can increase the diversity of samples and improve the generalization ability of the model by performing operations such as rotation, scaling and panning on the training data. In terms of loss function design, Focal Loss [
18] effectively solves the problem of imbalance between positive and negative samples in target detection by introducing a compensating factor, which improves the ability to detect small targets. Post-processing methods, such as non-maximum suppression (NMS), can eliminate overlapping detection results and improve the accuracy and efficiency of detection.
To improve the generalization ability of the target detection model to SAR maritime ship targets, Guo et al. [
21], proposed a SAR ship detection model called Masked Efficient Adaptive Network (MEA-Net), which is lightweight and highly accurate for unbalanced datasets. Tang et al. [
23], designed a Pyramid Mixed Attention Module (PPAM) to mitigate the effect of background noise on ship detection, while its parallel component facilitates the processing of multiple ship sizes. In addition, Hu et al. [
24], proposed attention mechanisms in spatial and channel dimensions to adaptively assign the importance of features at different scales.
However, the research on improving generalization ability mentioned above mainly focuses on training and testing on the same dataset. There are few existing studies on cross-domain detection. Recent studies, including Huang et al. [
25], have divided the target detection model into off-the-shelf and adaptation layers to dynamically analyze the cross-domain capability of each module. They proposed a method to reduce the difference in feature distribution between the source and target domains by using multi-source data for domain adaptation. Tang et al. [
26], proposed a cross-domain weakly supervised approach based on the DETR cross-domain weakly supervised target detection (CDWSOD) method. The aim is to adapt the detector from the source domain to the target domain through weak supervision.
The aforementioned studies have enhanced the CNN networks’ capability in SAR target detection to some extent. However, they rarely take into account the following aspects: 1. The trained networks are only capable of exhibiting high generalization ability under the same dataset they were trained and predicted on, and do not possess good cross-domain generalization ability. 2. The learning of image features is limited to unipolarised SAR images, and when the training data contains full polarisation data, the correlation between different polarisations is often ignored, the learned feature information is limited, and it is difficult to make further breakthroughs after a certain degree of generalization. The combination of the classification and localization tasks in single-stage target detection renders the model vulnerable to interference from complex backgrounds.
This paper proposes a multipolarisation fusion cross-domain adaptive network that is adapted to complex backgrounds. The network implements end-to-end migration learning, which enables it to adapt to different scenarios. Additionally, the network effectively utilises existing SAR image resources to fully extract the potential characteristics of the images.
The main contributions of this paper are as follows:
A method for achieving deep domain adaptation on SAR ship target detection is proposed through cross-domain adversarial learning;
A channel fusion module is proposed to combine SAR image features from four polarisations, enhancing the information and association of the features. Figure 2 shows the four polarised images under a single scene;
An anti-interference head is proposed to improve the generalization ability of the model under complex backgrounds;
The structure of the remaining parts of this article is as follows. The section 2 introduces the work related to this article, the section 3 introduces the principles of materials and methods, the section 4 reports on the experimental process and results, and the section 5 discusses the experimental results and provides future research directions.