Deep Learning for Joint MIMO Detection and Channel Decoding

We propose a deep-learning approach for the joint MIMO detection and channel decoding problem. Conventional MIMO receivers adopt a model-based approach for MIMO detection and channel decoding in linear or iterative manners. However, due to the complex MIMO signal model, the optimal solution to the joint MIMO detection and channel decoding problem (i.e., the maximum likelihood decoding of the transmitted codewords from the received MIMO signals) is computationally infeasible. As a practical measure, the current model-based MIMO receivers all use suboptimal MIMO decoding methods with affordable computational complexities. This work applies the latest advances in deep learning for the design of MIMO receivers. In particular, we leverage deep neural networks (DNN) with supervised training to solve the joint MIMO detection and channel decoding problem. We show that DNN can be trained to give much better decoding performance than conventional MIMO receivers do. Our simulations show that a DNN implementation consisting of seven hidden layers can outperform conventional model-based linear or iterative receivers. This performance improvement points to a new direction for future MIMO receiver design.


I. INTRODUCTION
Multiple-antenna technology, also known as multiple-input multiple-output (MIMO), is one of the most important techniques for advanced wireless communications systems. It has already been incorporated into many wireless standards, e.g., 802.11n/ac [1] and LTE 4G [2]. It has also been shown theoretically that MIMO can increase spectrum efficiency linearly with the numbers of transmit and receive antennas [3]. Of much interest are low-complexity MIMO functional units that have good performance.
A MIMO transmitter transmits multiple data streams, one on each transmit antenna. A MIMO receiver receives a multiplexed copy of the multiple data streams plus noise on each receive antenna. A MIMO detector demultiplexes and decodes the multiplexed data on all the receive antennas into the originally transmitted multiple data streams plus noise and interference.
To achieve near-capacity performance, advanced channel coding schemes, such as LDPC and polar codes, have been suggested for 5G systems [4], [5]. These channel codes protect the data streams against channel fading, interference, and noise. The output of a MIMO detector consists of a noisy version of the codeword transmitted by the transmitter. The T. Wang is with College of Information Engineering, Shenzhen University and the Department of Information Engineering, The Chinese University of Hong Kong (ttwang@szu.edu.cn). Lihao Zhang and Soung Chang Liew are with the Department of Information Engineering, The Chinese University of Hong Kong (zl018@ie.cuhk.edu.hk, soung@ie.cuhk.edu.hk) function of channel decoding is to map the noisy codeword to the original information bits at the transmitter.
For optimal MIMO decoding, MIMO detection and channel decoding need to be performed in a joint manner. The conventional MIMO decoding schemes all use a model-based approach. However, due to the complex MIMO signal model, the optimal solution to the joint MIMO detection and channel decoding problem (i.e., the maximum likelihood decoding of the transmitted codewords from the received MIMO signals) is computationally infeasible.
As a practical measure, the current model-based MIMO receivers all use suboptimal MIMO decoding methods with affordable computational complexities. For example, instead of joint MIMO detection and channel decoding, [6]- [8] proposed to perform MIMO detection and channel decoding sequentially and separately, where MIMO detection is realized by linear equalizations with zero forcing (ZF) or minimum mean square error (MMSE) criteria. By contrast, [9]- [11] proposed to perform MIMO decoding and channel decoding iteratively with soft information exchanges between the two components. Thus, MIMO detection and channel decoding are performed in a joint manner. However, to contain complexity, the original MIMO signal model has been relaxed and replaced by an approximate model (i.e., it separately models the MIMO signal and the channel code). As a result, the solutions are still suboptimal. This leaves a gap for further performance improvement with better MIMO decoder designs.
To narrow the performance gap, this work applies the latest advances in deep learning for the design of MIMO receivers. In particular, we leverage deep neural networks (DNN) with supervised training to solve the joint MIMO detection and channel decoding problem. We show that DNN can be trained to give much better decoding performance than conventional MIMO receivers do. Our simulations show that a DNN implementation consisting of seven hidden layers can outperform conventional model-based linear or iterative receivers.

A. Related Work
Many MIMO detection schemes have been proposed [12]. Linear MIMO detection can first be used to cancel multipleantenna interference with low complexities; after that channel decoding is performed [6]- [8]. In these schemes, linear MIMO detection and channel decoding operate in a sequential manner. Since linear MIMO detection introduces noise amplification and correlation, such sequential linear MIMO detection and channel decoding schemes typically result in large perfor-mance loss due to the mismatch between the noise models at the output of the MIMO detector and the input of the channel decoder.
To enhance the performance of MIMO detection, nonlinear MIMO detectors have also been proposed, e.g., MIMO detectors based on sphere decoding [13]- [15], semi-definite relaxation [16], [17], and lattice reduction [15], [18]. Unfortunately, these nonlinear MIMO detectors can only output hard estimates of channel symbols, making them incompatible with modern channel decoders that require soft input to achieve superior decoding performance.
Sphere decoding and list decoding algorithms were used for soft MIMO detection [9]- [11], [19] that produces soft output. This soft information can then be fed to a channel decoding. Moreover, information exchange can be performed iteratively between soft MIMO detection and channel decoding to improve the overall performance of MIMO decoding. Although these iterative MIMO decoding schemes have better performance than the sequential schemes, their solutions are still approximate and suboptimal, due to the mismatch between the noise model of the soft output of the MIMO detector and the assumed noise model at the input of channel decoder. Furthermore, iterative information exchange introduces large decoding latencies. Unlike the above model-based approaches, [20] proposed a deep learning approach for MIMO detection. Specifically, the method approximates MIMO detection using deep neural networks (DNN). The method progressively improves the approximation by adjusting the weights of a DNN based on a series of training MIMO signals. Compared with model-based MIMO detection, deep-learning MIMO detection achieves similar detection accuracies with faster detection speed. However, this deep-learning MIMO detection scheme can only perform hard MIMO detection and cannot be combined with a soft channel decoding scheme.
DNN is used to perform channel decoding for the first time in [21], followed by further work in [22], [23]. It was shown that DNN channel decoding can approach the MAP performance with lower decoding latency than traditional channel decoding. Work [24] employed a neural network constructed by unfolding the factor graph of linear codes to improve the performance of belief propagation decoding when the factor graph of the linear codes contains many samll loops. Work [25] investigated DNN-based joint equalization and channel decoding problem for non-MIMO systems. A survey on the applications of deep learning to wireless systems can be found in [26].
The remainder of this paper is organized as follows. Section II presents the system model of MIMO systems. Section III reviews the existing model-based MIMO receivers. Section IV presents our deep learning MIMO receiver. Section V provides the simulation results. Finally, Section V concludes the paper.

II. SYSTEM MODEL
This section presents the system model of MIMO systems and the format of the received MIMO signals. Consider a MIMO system where the transmitter is equipped with M T antennas and the receiver is equipped with M R antennas. The channel between each transmit-receive antenna pair is assumed to incur frequency-flat fading and the channel state remains constant within one transmitted packet. We assume M T ≤ M R and M T parallel data streams are transmitted, one on each transmit antenna. Figure 1 shows the block diagram of the MIMO transmitter. At the transmitter side, a vector of K information bits is the code rate. The valid set of codewords is denoted by C and thus c ∈ C. The coded bits in vector c are modulated to a vector of complex data symbols, where B is the number of code bits per complex data symbol. The modulation constellation is scaled so that the modulated symbols inx have unit average power. Through serial-to-parallel conversion, the vectorx is partitioned into L is the M T ×L pilot matrix that contains the pilot vectors. We assume L ≥ M T to facilitate the channel estimation [27]. The signal matrix X represents one transmitted packet. The M T symbols of the t-th column vector in the signal matrix X are simultaneous transmitted on the M T transmit antennas in the t-th time slot.
At the receiver side, the received signals are written into an M R × (L + L) matrix, Y = [y 1 , y 2 , · · · , y L +L ], where the t-th vector y t contains the received signals on the M R receive antennas in the t-th time slot. The received signal matrix can be written as where H is an M R × M T complex channel matrix with zero-mean and σ 2 -variance independent complex Gaussian entries, and W is the M R × (L + L) additive white Gaussian noise (AWGN) matrix that has zero-mean and unit-variance independent complex Gaussian entries. We also divide the received signal matrix and the AWGN matrix into two sub-  A symbol-wise optimal MIMO receiver decodes each information bit, b k , from the received signal matrix Y d by minimizing the symbol error probability or equivalently maximizing the a posteriori probability (APP): whereb k denotes the estimate of the information bit b k , and k ∈ {1, 2, · · · , K}. The problem as expressed in (2) is in fact a joint MIMO detection and channel decoding problem, since data symbol detection and the channel decoding are implicitly performed in (2). We point out that joint MIMO detection and channel decoding as in (2) require the knowledge of the channel matrix H. In practice, the channel matrix is typically estimated from the received pilot signals Y p , e.g., the least square (LS) estimate of the channel matrix is given by: [27]; then, the channel matrix estimateĤ is substituted back to (2) to replace the real channel matrix H. Even with the above approximation which replaces H byĤ, the exact computation of APP, p b k Y d ,Ĥ, C , is difficult and highly complex. The computation difficulty is due to: i) the correlation among the data symbols introduced by channel encoding; ii) the parallel signal interference caused by the MIMO channel. Therefore, suboptimal MIMO detection and channel decoding schemes with manageable implementation complexities are typically used in practice. We overview two suboptimal schemes in the following.

A. Linear MIMO Receivers
One suboptimal MIMO detection and channel decoding approach is to cancel the parallel signal interference with a linear MIMO detection first and then perform channel decoding next. We refer to this approach as linear MIMO receivers. For example, the zero-forcing (ZF) detection [6] removes the interference bỹ whereỸ d is the post-cancellation signals andW d = The block diagram for the linear MIMO receiver parallel signal interference is already removed in (3), the postcancellation signals,Ỹ d , can be fed to a traditional channel decoder to recover data symbols. Figure 2 shows the block diagram for this linear MIMO receiver.
There is no loss of information in (3) since one can get back Y d fromỸ d . The suboptimality in the linear MIMO decoding arises from the fact that the traditional channel decoder assumes the transformed noiseW d is white, but it is actually not after the transformation in (3). Although the complexity of this linear MIMO receiver is low, its performance is far from optimal.

B. Iterative MIMO Receivers
The second MIMO detection and channel decoding approach performs iterative soft-in soft-out MIMO detection and channel decoding. UsingĤ, Y d and the prior information about the data symbols, a soft MIMO detector computes the extrinsic information about the data symbols [9] and delivers the soft information to a soft channel decoder. The soft channel decoder then computes the new extrinsic information about the data symbols and send the computed new extrinsic information back to the soft MIMO detector for further iteration.
In the next round of iteration, the soft MIMO detector replaces the prior information about the data symbols with the information sent from the soft channel decoder and recomputes its extrinsic information about these data symbols again. Several rounds of such iterations are performed to ensure the convergence of the overall MIMO detection and channel decoding process. We refer to such iterative MIMO detection and channel decoding schemes as iterative MIMO receivers. It yields an approximate solution to the joint MIMO detection and channel decoding problem expressed in (2). Figure 3 shows the block diagram for the iterative MIMO receiver. The soft MIMO detection often used is the sphere algorithm [11] and the soft channel decoding often used is the belief propagation algorithm. The complexity of the iterative MIMO receiver is much higher than that of the linear MIMO receiver. Although the iterative MIMO receiver has better performance than the linear MIMO receiver does, there is still a large performance gap with respect to the optimal MIMO receiver. Moreover, the iterative information exchange introduces large decoding latency.

IV. DEEP-LEARNING MIMO RECEIVERS
We propose to employ deep neural networks (DNN) to solve the joint MIMO detection and channel decoding problem We consider the training of DNN at the MIMO receiver after the channel matrix estimateĤ has already computed from the received pilot signals. Using this channel matrix estimateĤ at the MIMO receiver, we generate a set of training signals to train a DNN to solve the joint MIMO detection and channel decoding problem (2) under the framework of supervised learning. The training and deployment framework of DNN for MIMO is illustrated in Figure 4. We describe the associated procedures in the following.
The receiver generates the training data by calling a functional block that mimics the operation at the MIMO transmitter. Specifically, for training purposes, the receiver randomly generates many length-K binary vectors, b (i) , i = 1, 2, · · · , Z. Each binary vector b (i) is transformed into a data matrix X (i) d using the functional block of the MIMO transmitter as described in Section II. Then, with the channel matrix estimatê H given by the channel estimator, the receiver generates a training signal by multiplyingĤ with X (i) d followed by adding AWGN: Y is the i-th training signal and W (i) d is the corresponding generated AWGN. The training set is given We emphasize that the training set is dependent on the channel matrix estimateĤ.
We use the generated training set to train a DNN, f θ (·), that approximates the solution to problem (2), where θ is the set containing all the weights of the edges in the DNN. When we feed the training signals Y to the inputs of the DNN, we also feed the channel matrix estimateĤ to the DNN (as illustrated in Figure 4). We optimize the DNN weights by miming the cross entropy loss function [28]: given by the DNN. The training algorithm used to minimize (4) for DNN is the so called stochastic gradient descent (SGD) algorithm [28]. After the training is finished, the weights of the DNN are fixed toθ and we can use the trained DNN fθ (·) to decode the received signals asb = fθ (Y d ). We have the following remarks on this DNN for MIMO: leftmargin=*,labelsep=5.8mm • The variables of interests to the DNN are the data symbols in X (i) d . The size of the variable space is thus 2 K , where K is the length of b (i) (Note that we have the one-to-one mapping: According to the results shown in [21], if the DNN can see all possible codewords, the decoding performance of the DNN is the best. Like the investigation in [21], we also adopt short codes and train the DNN with all different codewords. • The training of the DNN is quite time-consuming. Therefore, the training procedure will introduce a large decoding latency and it cannot be deployed for applications with stringent latency requirements, such as voice transmissions; it is, however, suitable for data transmissions with relaxed latency requirements.

V. SIMULATION RESULTS
In this section, we present simulation results for the evaluation of the proposed DNN MIMO receiver. The modulations used are BPSK and QPSK. The channel code used is the polar code [5] with code rate 1/2. We assume that that each packet consists of K = 16 bits in the simulations. The adoption of the short packet length is due to the exponential training complexity when DNN is used to perform channel decoding [21]. 1 Packets of short length are of interest in some practical systems such as the internet of things (IoT). After channel encoding and modulation, K = 16 information bits are transformed to 32 BPSK symbols or 16 QPSK symbols. Our simulations assume MIMO matrices of dimensions M R × M T = 2 × 2, 4 × 4 and 8 × 8.
We implement a DNN consisting of one input layer, six hidden layers and one output layer using the deep-learning software toolkit of Keras. The nonlinear activation function at the neurons of the input layer and the hidden layers is the Rectified linear unit (ReLu) function [28]. The input layer is a densely-connected layer. Each hidden layer is a denselyconnected layer with batch normalization (BN) operations before the operations by ReLu. The output layer is a denselyconnected layer with the sigmoid activation functions. The architecture of the DNN is illustrated in Figure 5. We train our NN over several epochs. In each epoch, the gradient of the loss function is computed over the entire training set using Adam, a method for stochastic gradient descent optimization [29]. Our training set contains all different 2 K codewords, K is the length of information bits. Setting the number of learning epochs to 10 5 , we train the DNN with datasets of different training SNRs (from 0 dB to 6 dB). After the training  is finished, the trained DNN is used to decode the received MIMO signals. For comparison, we treat the following two traditional MIMO receivers as our benchmarks: i) the linear MIMO receiver that employs ZF MIMO detection followed by the MAP polar decoding of [5], ii) the iterative MIMO receiver that iterates between the sphere MIMO detection of [11] and the MAP polar decoding of [5]. We investigate the performance of MIMO receivers with perfect knowledge as well as with imperfect knowledge of the channel matrix. For the latter, we assume LS estimation [27] is used to estimate the channel matrix. For a fixed SNR, we evaluate the average BER results of the MIMO receivers over 100 different MIMO channel realizations. Figure 6 and Figure 7 show the BER of the MIMO receivers with perfect knowledge of the MIMO channel matrix for BPSK and QPSK, respectively. We can observe that our DNN MIMO receiver can indeed outperform the linear and iterative MIMO receivers in terms of BER. For example, the DNN MIMO receiver has around 1 dB and 3.5 dB SNR gain over the linear and iterative MIMO receivers, respectively, at the BER of 10 −4 for BPSK and 8 × 8 MIMO channels. Figure 8 and Figure 9 show the BER of the MIMO receivers with imperfect knowledge of the MIMO channel matrix for BPSK and QPSK, respectively. For the channel matrix es-timation, we place a Hadamard matrix at the beginning of the packets as pilots and use the LS estimation based on the received pilots to estimate the channel matrix at the receivers. In general, the performance trend for the cases of perfect and imperfect channel estimates are the same. The only difference between them is that for the cases of imperfect channel estimates, the gain obtained by our DNN MIMO receiver is even larger. For example, the DNN MIMO receiver now has around 2 dB and 10 dB SNR gain over the linear and iterative MIMO receivers at the BER of 10 −4 for BPSK and 8 × 8 MIMO channels.

VI. CONCLUSIONS
This work used a deep-learning tool, deep neural network, to develop a new solution to the problem of joint MIMO detection and channel decoding. Conventional MIMO receivers perform MIMO detection and channel decoding in a sequential or an iterative manner. The algorithms of these conventional MIMO receivers relax the signal model of coded MIMO. As a result, they are suboptimal solutions to the joint MIMO detection and channel decoding problem, leaving the possibility for further improvement. Our deep learning solution uses a DNN for joint MIMO detection and channel decoding under the framework of supervised learning. The deep-learning MIMO receiver does not separate the MIMO detection and channel decoding into two parts and does not perform sequential or iterative operations on them. It treats the MIMO detection and channel decoding as a joint decoding process and employs a single DNN to approximate the joint decoding process. This joint process improves the overall decoding performance. In our simulations, we trained a DNN consisting of six hidden layers to decode MIMO signals. The simulation results demonstrate notable gains obtained by our deep-learning MIMO receiver over the conventional linear and iterative MIMO receivers.
A drawback of the current proposed deep-learning MIMO receiver is that the DNN needs to be trained for each different channel matrix, introducing a large decoding latency. In general, to train the same DNN for MIMO decoding with