Heterogeneous Deep Model Fusion for Automatic 2 Modulation Classification 3

: Deep learning has recently attracted much attention due to its excellent performance in


Introduction
Communication signal recognition is of great significance for several daily applications, such as operator regulation, signal feature map generation, and user identification.One of the main objectives of signal recognition is to detect the communication resources, which ensures the reliability of communications.To achieve this objective, automatic modulation classification (AMC) is indispensable because it can help users identify the modulation mode within a frequency band, which benefits the communication reconfiguration and electromagnetic environment analysis.
AMC plays an essential role in obtaining digital baseband information from the signal when only limited knowledge about the parameters is available.Such a technique is widely used in both military and civilian applications, e.g., intelligent cognitive radio and anomaly detection [1]- [2], which have attracted much attention from researchers in the past decades.
Basically, existing AMC algorithms can be divided into two main categories [3], namely, likelihood-based (LB) methods and feature-based (FB) methods.LB methods require calculating the likelihood function of received signals for all modulation modes and then make decisions in accordance with maximum likelihood ratio test [3].LB methods usually generate accurate classification results but suffer from heavy computational cost.Alternatively, a traditional FB method consists of two parts, namely, feature extraction and classifier, where classifier identifies digital modulation modes in accordance with the effective feature vectors extracted from the signals.
As opposite to the LB methods, the FB methods are computationally light but may not be theoretically optimal.To date, several FB methods have been validated effective on the AMC problem.For instance, they successfully extract features from various time-domain waveforms, such as cyclic spectrum [4], high-order cumulant [6], and wavelet coefficients.Afterwards, a classifier is used for final classification based on features mentioned above.With the development of learning algorithms, the performances have been improved, such as from the shallow neural network [7] and decision tree to the support vector machine (SVM).Recently, deep learning is widely applied to audio, image, and video processing, facilitating the applications such as face recognition and voice discrimination [8].However, a few works are done based on deep learning in the field of communication.
Although researchers have developed various algorithms to implement AMC of digital signals, these methods are suitable for simple communication equipment and struggle in the real-world applications where more complicated equipment is in use, because: 1) they cannot handle complex disturbance from other sources; 2)they usually separate feature extraction and classification process so that the information loss is inevitable; and 3) those methods must use distributed receivers to collect in-phase and quadrature signals, which costs additional storage space and bandwidth.In this paper, we propose to realize AMC using the convolution neural networks (CNNs) [9], long short-term memory (LSTM) [10], and their fusion model to directly process the time-domain waveform data.As is known to us, the temporal property of data is important for AMC applications.As a variant of recurrent neural network (RNN), LSTM uses the gate structure to realize the information transfer of the network in time sequence, which reflects the depth in time series.Therefore, LSTM has a super capacity to process the time series data.This paper proposes a heterogeneous deep model fusion (HDMF) method to solve the AMC problem in a unified framework.The framework is shown in Figure 1.Different from conventional methods, AMC does not need to rely on other methods to extract features.In addition, the modulation modes can be obtained directly on the basis of the previous training model.Such improvement helps the communication system to overcome the shortcoming that cognition based on a separate feature and classification process and enhance classification accuracy.We use CNNs and LSTM to process the time domain waveforms of modulation signal.Eleven types of single-carrier modulation signal samples (e.g., MASK, MFSK, MPSK, and MQAM) added with additive white Gaussian noise (AWGN) and a fading channel are generated under various signal-to-noise ratios (SNRs) based on an actual geographical environment.Two kinds of HDMFs based on the serial and parallel modes are proposed to increase the classification accuracy.The results show that HDMFs achieved much better results than the single CNN or LSTM method, when SNR is in the range of 0-20 dB.In a summary, the contributions are as follows:

CNNs
1) CNNs and LSTM are fused based on the serial and parallel modes to solve the AMC problem, thereby leading to two HDMFs.Both are trained in the end-to-end framework, which can learn features and make classifications in a unified framework.
2) The experimental results show that the performance of the fusion model has been significantly improved compared with the independent network and also traditional wavelet+SVM.
The serial version of HDFM achieves much better performance than the parallel version.
3) We collect communication signal data sets, which approximate the transmitted wireless channel in the actual geographical environment.Such datasets are very useful for training networks like CNNs and LSTM.
The rest of this paper is organized as follows.Section II briefly introduces the related works.
Section III introduces the principle of digital modulation signal and deep learning classification methods.Section IV presents the experiments and analysis.Section V summarizes the paper.

Related Works
AMC is a typical multi-classification problem in the field of communication.This section briefly introduces several feature extraction and classification methods in the traditional AMC system.The CNN and LSTM models are also presented.

Conventional works based on separated feature and classifiers
Traditionally the feature and classifier are separately built for an AMC system.For example, the envelope amplitude of signal, the power spectral variance of signal, and the mean of absolute value signal frequency, was extracted in [11] to describe the signal from several different aspects.
Yang and Soliman used the phase probability density function for AMC [12].Meanwhile, traditional methods usually combine instantaneous and statistical characteristics.Shermeh used the fusion of high-order moments and cumulants with instantaneous characteristics for AMC [13]- [14].
The features can describe the signals using both absolute and relative levels.In addition, the high-order characteristics can eliminate the effects of noise.The sixth and eighth statistics are widely used in several methods.
Classical algorithms have been widely used in the AMC system.Panagiotou et al. considered AMC as a multiple-hypothesis test problem and used decision theory to obtain the results [15].
They assumed that the phase of AWGN was random and dealt with the signals as random variables with the known probability distribution.Finally, the generalized likelihood ratio test or the average likelihood ratio test was used to obtain the classification results by the threshold.The classifiers were then used in the AMC system.In [16], shallow neural networks and SVM were used as classifiers.In [17]- [18], modulation modes were classified using CNNs with high-level abstract learning capabilities.
However, the traditional classifiers either need preprocessing to extract features or rely on the detailed prior information.This approach has led to negative influences of the classification performance.

CNN -based methods
Advantage of CNNs is achieved with local connections and tied weights followed by some form of pooling which results in translation invariant features.Furthermore, another benefit is that they have many fewer parameters than fully connected networks with the same number of hidden units.In [9], the authors treat the communication signal as a 2 dimensional data which similar to an image and take it as a matrix to a narrow 2D CNN for AMC.They study the adaptation of CNN to the time domain IQ data.A 3D CNN was used in [19]- [20] to process video information.The result showed that CNN multi-frames were considerably more suitable than a single-frame network for video cognition.In [21], Luan et al propose a Gabor Convolutional Networks, which combines Gabor filters and CNN model, to enhance the resistance of deep learned features to the orientation and scale changes.Recently, Zhang et al apply one-two-one network to compression artifacts reduction in remote sensing [22].This motivates us to solve the AMC problem.

LSTM -based methods
Various models have been used to process sequential signal, such as hidden semi-Markov models [23], conditional random fields [24], and finite-state machines [25].Recently, RNN became well-known with the development of deep learning.As a special RNN, LSTM has been widely used in the field of voice and video because of its ability to handle gradient disappearance in traditional RNNs.It has the less conditional independence hypothesis compared with the previous models and facilitates integration with other deep learning networks.Researchers have recently combined spatial/optical flow CNN features with vanilla LSTM models for global temporal modeling of videos [26]- [30].These studies have demonstrated that deep learning models have a significant effect on action recognition [27], [29], [31] and video description [30], [32].But to our best of knowledge, the fusion of CNN and LSTM is never investigated to solve the AMC problem.

Digital modulation signal description
The received signal in the communication system can be expressed as follows: where ( ) x t is the efficient signal from the transmitter, ( ) c t represents the transmitted wireless channel on the basis of the actual geographical environment, and ( ) n t denotes the AWGN.The digital modulation signals ( ) x t can be expressed as follows: where c A and s A are the amplitudes of the in-phase and quadrature channel, respectively; f stands for the center frequency; θ is the initial phase of the carrier; and As one of the most common noise, AWGN is always true whether or not the signal is in the communication system.The power spectrum density is a constant at all frequencies, and the noise amplitude obeys the Gauss distribution.

CNNs
CNNs are a hierarchical neural network that contains convolution, activation, and pooling layers.In this study, the input of the CNN model is the data of signal time-domain waveform.The difference among the classes of modulation methods is deeply characterized by the stacking of multiple convolutional layers and nonlinear activation.Different from the CNN models in the image domain, we use a series of one-dimensional convolution kernels to process the signals.
Each convolution layer is composed of a number of kernels with the same size.The convolution kernel is common in each sample; thus, each kernel can be called a feature extraction unit.This method of sharing parameters can effectively reduce the number of learning parameters.
Moreover, the feature extracted from convolution remains in the original signal position, which preserves the temporal relationship well within the signal.In this paper, ReLU is used as the activation function.We do not use the pooling layer for dimensionality reduction because the amount of signal information is relatively small.

LSTM
Traditional RNNs are unable to connect the information as the gap grows.The vanishing gradient can be interpreted as the forgetting of the human brain.LSTM overcomes this drawback using gate structures that optimize the information transfer among memory cells.The particular structures in memory cells include the input, output, and forget gates.An LSTM memory cell is shown in Figure 2. The iterating equations are as follows: ) = ⋅tanh( ) where W is the weight matrix; b is the bias vector; i , f , and o are the outputs of the input, forget, and output gates, respectively; C and h are the cell activations and cell output vectors, respectively; and mod sig and tanh are nonlinear activation functions.
Standard LSTM usually models the temporal data in the backward direction but ignores the forward temporal data, which has a positive impact on the results.In this paper, a method based on bidirectional LSTM (Bi-LSTM) is exploited to realize AMC.The core concept is to use a forward and a backward LSTM to train a sample simultaneously.Similarly, the architecture of Bi-LSTM network is designed to model the time domain waveforms from past and future.

Fusion model based on CNN and LSTM
The HDMFs are established based on the fusion model in serial and parallel ways to enhance the classification performance.The specific structure of the fusion model is shown in Figure 3.  5: Compute the backpropagation error . 9: End while The serial fusion method (HDMF).It is similar to the encoder-decoder framework.In this study, the encoding process is implemented by CNNs, afterwards LSTM decodes the corresponding information.The features are extracted by the two networks, from simple representation to complex concepts.The upper convolutional layers can extract features locally.Then, the Bi-LSTM layers learn temporal characteristic from these representations.
For both kinds of fusion models, the final feature vectors are the probabilistic output of the softmax layer.The fusion models are trained in the end-to-end way even when different neural networks are used to address the AMC problem.The geographic simulation environment is shown in Figure 4, based on which we collect our datasets.We captured the unmanned aerial vehicle communication signal data set, which is developed by us based on STK, visual studio and MATLAB.We use TensorFlow [33] to design our deep models.The Adam method [34] is used to solve our model with 0.001 learning rate.The iterations are as follows:

Implementation details and backpropagation
(1 ) where t m and t n are the first and second moment estimations of the gradient, which represent the estimation of ( ) respectively, which can be regarded as the unbiased estimation of expectation; θ Δ is the dynamic constraint of learning rate; and μ , ν , ε , and η are constants.
The fundamental loss and the softmax functions are defined as follows: where x is the input, y is the corresponding truth label, and i z is the input for the softmax layer.
The gradient of backpropagation is calculated as follows:

Classification accuracy of CNN and LSTM models
When CNNs and LSTM solve the AMC problem, the classification accuracies of CNNs are reported with varying convolution layer depth from 1 to 4, the number of convolution kernels from suitable for AMC than the CNN model.The average classification accuracy of Bi-LSTM is 77.5%, which is 1.5% higher than that of the CNN model.The performance is better with the number of memory cells from 32 to 128 than others.The Bi-LSTM models with the number of hidden layers more than 2 have essentially the same classification accuracy.

Comparison of classification accuracy between the deep learning models and the traditional method
We have compared five methods, including both traditional and deep learning methods, based on the same data sets.The classification performance is as follows.The classification accuracy of the parallel fusion model is 2% higher than the CNN model and 1% higher than the Bi-LSTM model.Moreover, the average classification accuracy of the serial fusion model is 99% without noise, which is 6% higher than the parallel fusion model.In fact, the fusion methods are more beneficial to the classification accuracy with the SNR from 0 dB to 20 dB compared with the noise-free situation.The average classification accuracy of the serial fusion method is 91%, which is 11% higher than the parallel fusion method.
The performances of the classifiers show that deep learning achieves high classification accuracy for AMC.Waveform local variation and temporal characteristics can be used to identify modulation modes.In comparison with CNN and Bi-LSTM, the performance of the HDMF methods is improved significantly because the classifiers can recognize the two features simultaneously.
However, the performance of the serial fusion is considerably higher than that of the parallel fusion because the parallel method belongs to the decision-level fusion.The fusion can be viewed as a simple voting process for results.The serial method belongs to the feature-level fusion, which combines the two feature information to obtain the classification results.QAM is prone to misclassification within class, which is caused by the subtle differences in M-ary phase mode.Moreover, representing the phase difference by waveform amplitude is not evident.
Furthermore, QAM can be considered as a combination of ASK and PSK in practice.The classifier can detect the different types of changes simultaneously even when the result is incorrect at low SNR.Therefore, only within-class misclassifications occur in the results.

Conclusions
In this study, we proposed the methods on the basis of deep learning to address the AMC problem in the field of communication.The classification methods are end-to-end processes, which reduce the additional steps to extract signal features compared with the traditional methods.First, the communication signal data set system is developed based on the actual geographical environment to provide the basis for related classification tasks.CNNs and LSTM are then used to solve the AMC problem compared with the traditional method.Furthermore, the modified classifiers based on the fusion model in serial and parallel modes are of great benefit to improve classification accuracy with the SNR from 0 dB to 20 dB.The serial fusion mode has the best performance compared with other modes.The confusion matrices significantly reflect the shortcomings of the classifiers in this study.We will overcome these shortcomings and further research on AMC in the future.
exploit spatially-local correlation by enforcing a local connectivity pattern between neurons of adjacent layers.The convolution kernels are also shared in each sample for the rapid expansion of parameters caused by the fully connected structure.Sample data are still retained in the original position after convolution such that the local features are well preserved.Despite its great advance in spatial feature extraction, CNNs could not model the changes in time series well.

Figure 1 .
Figure 1.Illustration of the traditional and classifier methods in this study for AMC.The traditional method needs to extract features as preprocessing and suffers from the perturbation caused by high computational complexity and effective information loss.By contrast, the classifier based on deep learning is used to process signal data directly in this study.AMC is implemented more efficiently with a heterogeneous deep model fusion (HDMF) method.
represents the digital sampling pulse signal.In the case of ASK, FSK, and PSK, s A is zero.In accordance with the digital baseband information, ASK, FSK, and PSK change c A , f , and θ in the range of − 0 M , − 1 M , and π − 0 2 / M , respectively, with time.By contrast, QAM fully utilizes the orthogonality of the signal.After dividing the digital baseband into I and Q channels, the information is integrated into two identical frequency carriers with phase difference of 90° using ASK modulation mode, which significantly improves the bandwidth efficiency.

Figure 3 .
Figure 3. Fusion model structure of HDMF in parallel and series modes.We note that two HDMF models are used separately to solve the AMC problem.

,Algorithm 1 : 1 := . 2 :
of parallel fusion model consists of two parts, which are balanced by the given parameters.Training HDMF(parallel) Initialize the parameters θ c in CNN, θ l in LSTM, W , ω in the loss layer, the learning rate μ , and the number of iteration 0 t While the loss does not converge, do 3:

8Figure 5 .
Figure 5. Classification accuracy of CNN and LSTM models.(a) Classification accuracy of CNN with the number of convolution kernels from 8 to 64; (b) Classification accuracy of CNN with the size of convolution kernels from 10 to 40; (c) Classification accuracy of CNN with the number of convolution layers from 1 to 4; (d) Classification accuracy of Bi-LSTM with the number of memory cells from 16 to 128; (e) Classification accuracy of Bi-LSTM with the number of hidden layers from 1 to 3.

Figure 6 .
Figure 6.Comparison of classification accuracy between the deep learning models and the traditional method.(a) Classification accuracy of different methods without noise; (b) Classification accuracy of different methods with SNR from 0 dB to 20 dB.

Figure 7 .
Figure 7. Confusion matrix of series fusion model.(a) Confusion matrix of series fusion model for 20 dB SNR; (b) Confusion matrix of series fusion model for 10 dB SNR; (c) Confusion matrix of series fusion model for 0 dB SNR.

Table 1 .
Classification accuracy of different methods without noise.

Table 2 .
Classification accuracy of different methods with SNR from 0 to 20dB