Recursive dictionary learning approach exploiting between-channel correlations for EEG signal reconstruction

In tele-monitoring, wireless body area networks (WBANs), sleep analysis and other applications involving electroencephalogram (EEG) signal, due to the high number of EEG recording channels, long recording time and several repetition of recordings to reach the highest signal-to-noise ratio, the amount of acquired data by the sensors is too large, demanding use of some compression procedure. Compressed sensing can be considered as one of the most effective compression methods in terms of compression ratio, which needs the underlying signal be sparse or have sparse representation in an appropriate domain. EEG signal is not sparse in time domain, therefore, in this paper correlation based weighted recursive least squares dictionary learning algorithm (CBW-RLS) is proposed that uses between-channel correlations to sparsify EEG signals. Due to the low-rank structure of EEG signals, exploiting between-channel correlations increase the sparsity level of the model while reducing the computational cost of dictionary learning procedure. This is done by merely updating the dictionary atoms which are involved in the sparse model of the previous data, reducing the total number of data used at each iteration and speeding up the dictionary learning algorithm. The simulation results show that the proposed method has better performance in terms of both quality of the EEG reconstruction and the computational cost compared to the other methods.


Introduction
Brain is the most complex part of the body and responsible for controlling all organs of the body.The electrical signals generated by the brain show brain functions as well as the body's overall status, thus, providing the motivation to apply advanced digital signal processing methods to brain function data including electroencephalogram. Understanding of neurophysiological properties of the brain along with mechanisms of generating signals and their recordings is essential for those who are involved with these signals in order to detect, diagnose and treat brain disorders and related diseases [1].
EEG is an electrophysiological monitoring method with high temporal resolution that records brain electrical activity.Studying EEG signals makes it possible to detect many neurological disorders and other abnormal conditions in the body and can be exploited to examine clinical problems such as monitoring alertness, coma, investigating epilepsy and sleep disorders and etc. EEG signal acquisition is vital for early diagnosis of various diseases related to brain.In clinical settings, the electrode caps are often used for multi-channel recordings with diverse number of electrodes attached to the cap where can be reached up to 128 ones.The problems that occur in acquiring the EEG signal are that, first, for obtaining the highest signal-to-noise ratio each trial that collected by diverse channels is repeated several times and second long-term records are taken from the patient in order to process the signal.One of the biggest challenges in collecting and analyzing EEG signal is the number of data needed to be stored and processed.For example, in tele-monitoring and other applications including EEG signals in WBANs, the amount of data obtained by sensors is usually too high, while the sensor resources like battery life and processing capacity are very limited.Therefore, it is usually necessary to use some compression method to reduce the amount of data.Proper compression of the signal reduces the number of samples required for storage and allows faster data transfer in the clinical environment and then accurate reconstruction for further processing.
The Shannon-Nyquist sampling theorem suggests that the signal can be accurately reconstructed which is sampled by the Nyquist frequency (at least twice greater than the highest frequency presented in the signal).However, due to the high sampling frequency of the EEG signals, applying the Shannon-Nyquist sampling theorem causes too many samples leading to the previous memory and storage problems.Leveraging the concept of transform coding, compressed sensing suggests that if the signal is sparse or has sparse representation in an appropriate domain, it can be accurately reconstructed with even far fewer samples than what the Shannon-Nyquist theorem needs.
Recursive dictionary learning approach exploiting between-channel correlations for EEG signal reconstruction Masoud Vazifehkhahi 1 , Tohid Yousefi Rezaii 2 Compressed sensing can be considered as one of the most effective compression methods in terms of compression ratio.The fundamental idea of the compressed sensing is that instead of sampling the signal at a high rate and then compressing the sampled data, it looks for the way to sense the signal in the most compressive form [2]. Compressed sensing theory is expressed as: where   Seeking to find the sparse representation of signal in various applications such as classification, compression, source separation, etc., has been widely considered in recent years [3][4][5][6][7], leading to dictionary learning problem.Dictionary is called as a set of atoms where each training data can be written as a linear combination of the atoms.Two types of dictionaries are used in sparse representation of signals.Structured dictionaries, such as Fourier, wavelet, Gabor and etc., do not have the ability to model natural phenomena with complexities.While, in trained dictionaries, atoms are adaptively learned from the data and lead to more precise model of the signal compared to structured dictionaries [8][9][10][11][12][13][14][15][16].
In dictionary learning, there are two parameters which need to be optimized, namely, the atoms of the dictionary and the sparse coefficients that relate the atoms of the dictionary to the training data set.Since the dictionary learning problem is NPhard, dictionary learning algorithms use alternating methods to optimize the parameters.In the first step, called sparse coding, the sparse coefficients are calculated by considering a predefined dictionary.The most conventional algorithms used as the first step are matching pursuit (MP), orthogonal matching pursuit (OMP) [17,18].In the second step, the sparse coefficients that are calculated in the previous step are used to update the atoms of the dictionary.These two steps are repeated until the dictionary learning algorithm converges.The most of the attention in dictionary learning problem is to improve the algorithms used in the second step.Some of the important algorithms that are used in this step are: method of optimal directions (MOD) [19], recursive least squares (RLS) dictionary learning [20], online dictionary learning (ODL) for sparse representation [12] and k-singular value decomposition (K-SVD) method [9].
In MOD, at each iteration of the algorithm, all of the dictionary atoms are updated at once using the whole training data based on least squares optimization method.In RLS-DLA, all of the dictionary atoms are updated at once using one sample of the training data at each iteration based on recursive least squares method.At each iteration of ODL algorithm, the whole dictionary atoms are updated atom by atom using one sample of the training data based on gradient descent method.K-SVD entirely updates the dictionary atom by atom using the whole training data based on singular value decomposition optimization method.
In [21], which is the first paper using compressed sensing theory for compression and reconstruction of EEG signals, the Gabor basis is used for sparsifying the signals and Gaussian basis and OMP algorithm are used in compression and reconstruction steps, respectively.In [22], EEG signals are considered as linear combination of independent components (ICs).Then, using independent component analysis, the ICs are extracted, followed by the algorithm of set partitioning in hierarchical trees (SPIHT) to compress the signals.In [23] by investigating the different channels of the EEG signal it is shown that there is considerable correlation between different EEG channels of a single subject.Therefore, stacking the sparse coefficients of the channels as columns of a matrix, the resulting matrix would have sparse rows.Then, the reconstruction problem changes into row-sparse recovery problem and Bregman algorithm is used to solve it.In [24], to exploit cosparsity and low rank structure of the multi-channel EEG signal recovery, the optimization problem based on and Schatten-0 norm is proposed in which two methods of convex optimization and alternating direction method of multipliers (ADMM) are used separately to solve the reconstruction problem.In [25], for multi-channel EEG, a new vector representation is presented which has a better block sparsity structure than the conventional methods.By calculating the DCT coefficients, this vector would have block sparse structure which results in better reconstruction error exploiting linear and nonlinear dependencies of the EEG data.
Considering the advantage of compressed sensing to Shannon-Nyquist Theorem in using far fewer samples for compression and accurate signal reconstruction, in this paper compressed sensing is used to compress and reconstruct the EEG signals.On the other hand, noticing that EEG signal is not sparse in time domain, so it is necessary to find an appropriate domain that signal has a sparse representation.Due to the superiority of the trained dictionaries on structured ones in the modeling of sparse signals, in this paper the trained dictionary is used to sparsify the EEG signal and this is done by dictionary learning and exploiting between-channel correlations of EEG.The rest of the paper is organized as follows: In Section 2, the dictionary learning problem is given.In Section 3, a brief review of RLS-DLA and generalized adaptive weighted recursive least squares dictionary learning (GAW-RLS) [26] algorithms are provided.The proposed CBW-RLS dictionary learning algorithm, is presented in Section 4. In Section 5 the proposed method is simulated and compared with the existing algorithms.Finally, the conclusion is drawn in Section 6.

Dictionary Learning Problem
Suppose we have the observation/training matrix Y m×L  , where m and L are the length and the number of training data, respectively.The purpose of the dictionary learning problem is to find an over-complete dictionary   where, each column of the dictionary, denoted by m j d  is called an atom of the dictionary and ( ) i j x is the jth entry of ith column of X.Since X is sparse, it has at most S non-zero entries in each column.Therefore, only S atoms of dictionary participate in representation of each data in the training data set.The sparsity level, S, represents the maximum number of nonzero elements in the columns of X and is defined as: where, 0 . denotes the 0 -norm.Considering the training data set Y, the sparsity level S, and the number of dictionary atoms n, the dictionary learning problem can be written as follows [27]: . is Frobenious norm and for a matrix A is defined as, ( ) . Since both D and X are unknown, one approach to solve (4) is using a two-step alternating minimization method.So, the optimization problem in ( 4) can be approximately solved by optimizing the following two subproblems.As the first step, called sparse coding step, the goal is to find X, while the dictionary is assumed known by minimizing the following cost function: As the second step, called dictionary update, the sparse coefficients obtained from the first step are used to update the atoms of the dictionary by minimizing the following cost function: Many approaches have been proposed to solve (6) and update the atoms of dictionary.

RLS-DLA and GAW-RLS
Unlike MOD algorithm that uses the least squares method to update the dictionary, the RLS-DLA uses a recursive least squares method to update the atoms of the dictionary in order to avoid the inversion of a large dimensional matrix.It also uses a forgetting factor 01   to improve the convergence properties by controlling the influence of the participating training data in the solution.The cost function of RLS-DLA is as follows [20]: At each iteration Y and X are related to their previous values as: where i y is the new (current) training data and i denotes the iteration number.In (7), the Frobenious norm part of the cost function acts as a memory and keeps the information of the previous data, where the 2 -norm part is related to the new training data.RLS-DLA needs to repeat the dictionary learning algorithm several times over the whole training data to be converged.On the other hand, due to the adaptive property of the algorithm, it has the ability of dealing with large number of training data since unlike the MOD algorithm which needs matrix inversion; the RLS algorithm needs a few multiplications of the matrices and vectors.
GAW-RLS DLA as its name implies, is a generalized form of RLS algorithm that in addition to the aforementioned forgetting factor, uses a correction weight  for the arrival data to adaptively control the relative consistency between the arrival data and dictionary estimate.It is shown that RLS-DLA is a special case of GAW-RLS when 1  = (  is the weight that is considered for the new training data).Therefore, the cost function in (7) changes to [26]: where at each iteration Y and X are related to their previous values as: The correction weight that is used in the proposed method is the inverse of the mismatch error between the arrival data and the existing solution which can be shown as follows: ( ) where i r is the sparse representation error of the ith data.It is show that weighting the new data increases the convergence rate of the algorithm.In this paper the following terms are defined and used in explaining the proposed method: • Iteration is defined as a stage that the new training data i y are arrived and ( ) between arrival data and the previous ones. • Step is defined as a stage that the columns of ( ) .To determine the correlated data, we identify those that have used at least one common atom in their sparse representation.We show these data by ( ) in which i y is the new data in iteration i and i x is its sparse representation, . is the inner product and ( ) is the index set of the signals which use common atoms with i y in their sparse representation, and ( ) is the subset of Y where i q L  .
i q changes at every iteration of the algorithm according to the number of correlated data.Now, the atoms participating in the formation of ( )  7) and (8) and omitting  , we will have: In (16), the current values of ( ) ( ) X ji y are related to their previous values as: Comparing ( 9) and ( 16), the role of 01   in RLS-DLA is modeled by ( ) Y i y that is formed using between-channel correlation in the proposed method.In RLS-DLA  is used to control the effect of the previous data while in the proposed method, between-channel correlation is used to control the influence of the previous data in which the previous data that are correlated with the new training data are used to form ( ) Y i y and participate in cost function.This will reduce the number of data needed to be processed in each iteration and lead to increase the speed of execution of algorithm and decrease the runtime.
We use the estimated error of each observation at jth step of ith iteration as the correction weight, which is defined as: Considering (17), in order to develop a recursive formulation for the least squares solution, we define: Considering the new data vector of ( ) Y i y , the updated dictionary will be given as: Using Woodbury matrix identity, the inverse of matrix C can be written as follow:  19) and ( 22) in (21), and defining the following variables: we will have: j+1  is a scalar parameter that controls the step size of the dictionary updating.Therefore, ( 22) can be written as: The algorithm continues until j equals i q , then the updated atoms of ( 25) are replaced into ( ) For the next iteration, ( 12)-( 26) are repeated until i reaches L. Algorithm 1 shows the summary of CBW-RLS dictionary learning algorithm.After designing the dictionary using the proposed method, it is necessary to choose an appropriate measurement matrix to efficiently compress the EEG signal so that we have an accurate reconstruction.In this paper, sparse binary matrix, which has only two nonzero elements in each column is used and shown that randomly selection of the nonzero elements of each column results in good compression performance.

Experiments
In this section, the simulation results of the proposed method are given based on two datasets.The first dataset is selected from the BCI Competition IV dataset, which is recorded from 7 healthy subjects by 59 channels at a sampling rate of 1000 Hz [28].The second dataset is recorded from 22 pediatric subject related to seizure data that is gathered from 23 EEG channels at a sampling rate of 256 Hz [29].Simulations are performed in MATLAB 2017b on a PC with 4 GB RAM, and a 2.4 GHz Core i5 Intel CPU.First of all, a random dictionary of size m=2000, n=2500 with i.i.d.entries selected from a Gaussian distribution of zero mean and unit variance is The atoms of dictionary are normalized so that they have unit l2-norm.In order to exploit between-lead correlation of the EEG signal in the proposed method, the channels of the EEG signal are stacked at the columns of the measurement matrix Y. Furthermore, a Gaussian white noise with signal to noise ratio level of 20 dB is added to the measurements.In the sparse coding step, we use OMP due to fast and simple implementation, for which the sparsity level is set to 20.At each iteration, is initialized with identity matrix.
In order to compare the proposed method with the existing methods, the number of updated atoms as well as used data in each iteration of the algorithm and also percentage-root-mean square difference (PRD) are computed.Fig. 1 shows the number of the updated atoms at each iteration for both MOD/RLS and CBW-RLS algorithms using the BCI Competition IV dataset.The comparison of the two graphs shows that in the proposed method, at the beginning, a small set of the atoms are updated (which equals to the sparsity level) and then with the addition of more data, this set becomes larger.Because of the low-rank property of multi-channel EEG signals (due to the correlation between different channels), non-zero elements of the sparse coefficient matrix are placed in the same rows, therefore, a few atoms contribute in the dictionary update stage.As Fig. 1 shows, after 59 iterations, just 45 atoms of the dictionary are updated, while in MOD/RLS, at every iteration all of the 2500 atoms participate in the dictionary update stage.
Figure 2 represents the number of data used at each iteration of CBW-RLS and a sample batch algorithm that is MOD, using the BCI Competition IV dataset.Batch methods such as MOD use the whole dataset to update the dictionary, while in the proposed method, at each iteration, only those EEG data which are correlated with the new one are used in the dictionary update stage.The red graph shows the number of data used at each iteration in CBW-RLS.Increasing the iteration causes a linear increase in the number of needed data.It is obviously seen that at each iteration, CBW-RLS uses much fewer training data to update the dictionary compared to the batch methods such as MOD.
Fig. 2 The number of data used per each iteration in both MOD and CBW-RLS algorithms.This plot is averaged over 10 trials (m=2000, n=2500, L=59, SNR=20) In order to evaluate the quality of the signal reconstruction, we compared the proposed method with RLS (1), RLS (10), ODL, GAW-RLS (1) and GAW-RLS (10).The term (1) means that the algorithm is executed just for one time over the whole dataset, while (10) means that it is executed for 10 times (The obtained dictionary at the end of the first cycle is used as an initial value for the second one and so on).Compression ratio is used to represent the amount reduction in the size of data representation and is expressed as: where m and k are the dimensions of the original and compressed signals, respectively.One commonly used criterion for assessing the quality of the signal reconstruction is PRD; that is the amount of distortion between the matrix of the original EEG signals Y and the matrix of the reconstructed signals , which is defined as: where Y μ is a vector containing the column-wise mean of the original matrix, Y.To compare the quality of the signal reconstruction, we used both of the aforementioned datasets and their results are plotted in figures 3 and 4, which show the PRD versus CR for BCI Competition IV and seizure datasets, respectively.In order to more extensively compare the compression performance of the proposed CBW-RLS algorithm to the existing methods including GAWRLS, ODL and RLS, the PRD versus CR are given in Tables 1 and 2, for the two datasets of BCI Competition IV and seizure, respectively.By investigating Tables 1 and 2, it can be observed that at low compression ratios, GAW-RLS (10) outperforms the other algorithms in term of reconstruction quality although at the cost of higher runtime.At high compression ratios, the proposed method CBW-RLS and ODL, both have the best performance in reconstructing compared to the others (the runtime of the proposed method is the lowest).

Conclusion
In this paper, we introduced a new recursive least squares approach in dictionary learning problem to efficiently sparsify data to be used in compressed sensing scheme.In order to design the dictionary, between-channel correlations of EEG signals are exploited, leading to fewer number of data participate in the dictionary update procedure, which increases the speed of the algorithm, while reducing the computational complexity.After sparsifying the EEG signals using the proposed method, the sparse binary matrix is used to compress the EEG signals and in the reconstruction stage, OMP algorithm is applied.By assessing the simulations results of applying the proposed method on BCI Competition IV and seizure datasets, it can be concluded that the proposed method has better performance in terms of both the quality of the reconstruction and the speed of the algorithm compared to the other algorithms.
DX .Each data in the training dataset can be expressed as a linear combination of dictionary columns as follows: Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 29 January 2020 doi:10.20944/preprints202001.0356.v1 the correlated data, one by one are used in the proposed method.Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted:

Algorithm 1 6 i
CBW-RLS dictionary learning algorithm 1. Initialize D and C 2. For (i=1: L) 3. Get the new training data i y 4. Find i x , sparse representation of i y , using OMP 5. Find ( ) i  y , indices of previous signals which use ..common atoms in their sparse representation with i y Dy , the subset of D which deal with ( )

Fig. 3
Fig. 3 PRD versus CR for BCI competition IV dataset × is the compressed form of data.If Y is not sparse then it should be sparse in an appropriate domain.Most of the signals in the nature are not sparse in time domain, therefore, it is necessary to be transform to the appropriate domain which have spare representation.

Table 1
PRD versus CR for BCI Competition IV dataset (The runtime of each algorithm is given as the last row)

Table 2
PRD versus CR for seizure dataset