1. Introduction
Synthetic aperture radar (SAR) is a microwave remote sensor used to provide high-resolution images with all-day-and-night and all-weather operating characteristics, which has been widely used in various military and civilian fields [
1,
2,
3]. Automatic target recognition (ATR) is a fundamental but also challenging task in SAR domain [
4]. It consists of two key procedures, i.e., feature extraction and target classification, which are independent of each other in traditional SAR ATR methods. Moreover, traditional methods rely on hand-crafted features, which hinders the development of SAR ATR.
With prosperous development and successful application of deep learning technologies in the field of remote sensing, studies on SAR ATR have gained significant breakthroughs [
5]. Numerous deep-learning based SAR ATR methods continue to emerge over the past few years and demonstrate their superiority to traditional methods. To name just a few, Chen et al. [
6] were among the first who applied deep convolutional neural network (CNN) to SAR ATR tasks, laying a foundation for follow-up studies in this field. Kechagias-Stamatis and Aouf [
7] proposed a SAR ATR method by fusing deep learning and sparse coding, which can achieve excellent recognition performance under different situations. Zhang et al. [
8] proposed a multi-view classification with semi-supervised learning for SAR target recognition. Pei et al. [
9] designed a two-stage algorithm based on contrastive learning for SAR image classification. Zhang et al. [
10] proposed a separability measure-based CNN for SAR ATR, which can quantitatively analyze the interpretability of feature maps.
One of the biggest challenges for most deep-learning based methods is that they are data-hungry and often require hundreds or thousands of training samples to achieve state-of-the-art accuracy [
11]. However, in real SAR ATR scenarios, the scarcity of labeled samples is a common problem due to the imaging mechanism of SAR. Under the situation where a scarcely few labeled SAR images are only available, which is termed as few-shot problem, most existing deep-learning based SAR ATR methods will suffer severe performance decline.
In face of this challenge, a variety of few-shot learning (FSL) methods have been proposed in the past few years. Among them, prototypical network (ProtoNet) [
12], relation network (RelationNet) [
13], transductive propagation network (TPN) [
14], cross attention network (CAN) and transductive CAN[
15], graph neural network (GNN) [
16], and edge-labeling GNN [
17] are some representatives in the field of computer vision. Subsequently, some FSL methods were proposed specifically for SAR ATR under few-shot conditions [
18,
19,
20,
21,
22]. For instance, Liu et al. [
23] put forward a bi-similarity prototypical network with capsule-based embedding (BSCapNet) to solve the problem of few-shot SAR target recognition. Experiments on moving and stationary target acquisition and recognition (MSTAR) dataset show its effectiveness and superiority to some state-of-the-arts. Bi et al. [
24] proposed a contrastive domain adaptation based SAR target classification method to solve the problem of insufficient samples. Experimental results on MSTAR dataset demonstrate the effectiveness. Fu et al. [
25] proposed a metalearning framework for few-shot SAR ATR (MSAR). Yang et al. [
26] came up with mixed loss graph attention network (MGANet) for few-shot SAR target classification. Wang et al. [
27] presented a multitask representation learning network (MTRLN) for few-shot SAR ATR. Yu et al. [
28] presented a transductive prototypical attention network (TPAN). Ren et al. [
29] proposed adaptive convolutional subspace reasoning network (ACSRNet). Liao et al. [
30] put forward a model-agnostic meta-learning (MAML) for few-shot image classification. Although some significant achievements have been made, studies on few-shot SAR ATR are yet in its infancy and there remains considerable potential to be explored.
Our goal in this paper is to boost the achievements on few-shot SAR ATR and to further improve the recognition performance by proposing a new method named enhanced prototypical network with customized region-aware convolution (CRCEPN). Extensive evaluation experiments on both the MSTAR dataset and the OpenSARship dataset verify the effectiveness as well as the superiority of the proposed method compared to some state-of-the-arts for few-shot SAR ATR. The main contributions of this paper can be summarized as follows.
A feature extraction network based on a customized and region-aware convolution (CRConv) is developed, which can adaptively adjust convolutional kernels and their receptive fields according to each sample’s own characteristics and the semantical similarity among spatial regions. Consequently, CRConv can adapt better to diverse SAR images and is more robust to variations in radar view, which augments its capacity to extract more informative and discriminative features. This greatly improves the recognition performance of the proposed method especially under few-shot conditions .
To achieve accurate and robust target identity prediction for few-shot SAR ATR, we propose an enhanced prototypical network, which can effectively enhance the representation ability of the class prototypes by utilizing both support and query samples, thereby raising the classification accuracy.
We propose a new loss function, namely aggregation loss to minimize the intra-class compactness. With the joint optimization of the aggregation loss and the cross-entropy loss, not only the inter-class differences are enlarged but also the intra-class variations are reduced in the feature space. Thereby, highly discriminative features can be obtained for few-shot SAR ART, thus improving the recognition performance, as supported by the experimental results.
The rest of this paper is organized as follows.
Section 2 details the framework and each key component of the proposed method. In section 3, extensive experiments on both the MSTAR and the OpenSARship dataset are performed, and experimental results are analyzed in detail.
Section 4 concludes this work.