Rolling Element Bearing Fault Time Series Prediction Using Optimized MCKD-LSTM Model

: This paper realizes early bearing fault warning through bearing fault time series prediction, and proposes a bearing fault time series prediction model based on optimized maximum correlation kurtosis deconvolution (MCKD) and long short-term memory (LSTM) recurrent neural network to ensure bearings operation reliability. The model is based on lifecycle vibration signal of the bearing, to begin, the cuckoo search (CS) is utilized to optimize the parameter filter length L and deconvolution period T of MCKD, taking into account the influence and periodicity of the bearing time series, the fault impact component of the optimized MCKD deconvolution time series is improved. Then select the LSTM learning rate α depending on deconvolution time series. Finally, the dataset obtained through various preprocessing approaches are used to train and predict the LSTM model. The average prediction accuracy of the optimized MCKD-LSTM model is 26 percent higher than that of the original time series, proving the efficiency of this method, and the prediction results track the real fault data well, according to the XI'AN JIAOTONG University XJTU-SY bearing dataset.


Introduction
Rolling element bearing is called "industrial joint", which is widely used in the fields of transmission and hoisting, wind power generation, aerospace and other areas [1 -3]. As the core component of the equipment, the frequency of bearing fault is extremely common [4][5][6], bearing performance has a direct impact on the reliability and safety of heavy machinery, as a result, reliable bearing fault time series prediction is extremely important for industrial production safety [7][8][9].
When bearing is used as the research object, the bearing outer ring performance monitoring data contains information about bearing reliability [10], but the main problem with relying on bearing performance monitoring data to complete the bearing fault time series prediction is figuring out how to create the bearing fault time series prediction model. Moreover, the bearing fault time series is created by the convolution of vibration signals and different noise signals in the signal transmission process, which will impact the accuracy of the prediction model after training, as a result, the first part of this study looks at how to perform the preprocessing of the original time series. For the research on signal preprocessing, to solve the filtering of nonlinear vibration signals, Dong et al. [11] presented a non iterative denoising method combining spectral wavelet transform and detrended fluctuation analysis. Yan et al. [12] explored the discrete convolution wavelet transform (DCWT) for signal decomposition and reconstruction in signal processing, when the signal changes swiftly, however, the choice of wavelet basis function is severely limited [13]. Many adaptive signal preprocessing approaches have been presented based on wavelet transform denoising, to demodulate the original vibration signal, Vikas et al. [14] employed the variational mode decomposition (VMD) to handle the multi-component modulated non-stationary vibration signal of the transmission. To discover the modal properties of engineering structures, Bagheri et al. [15] proposed a dynamic response decomposition based on variational mode decomposition. VMD was utilized by Mausam et al. [16] to partition the histogram of the input image into many band limited modes, and then the histogram was reconstructed using meaningful modes. Zhang et al. [17] investigated the fractal properties of rolling element bearing vibration signals and developed a bearing defect assessment and diagnosis. Furthermore, the decomposition mode parameter K and penalty coefficient η must be established in order to solve the differing parameter values have a significant impact on the decomposition effect. To calculate the ideal K for water pipe leakage location, Li et al. [18] used the correlation between energy loss coefficient and nearby blade disks as an evaluation index. Zhao et al. [19] employed the envelope nesting approach to direct the potential center frequency, and then used permutation entropy and orthogonality to calculate the K. To alleviate the mode mixing of complicated vibration signals, Zhao et al. [20] presented the based on singleobjective salp swarm algorithm to optimize the penalty coefficient η of the VMD. Feng et al. [21] used the whale optimization algorithm (WOA) to optimize VMD parameters in order to achieve adaptive decomposition and noise reduction of vibration signals. The decomposition parameters of VMD, on the other hand, must be set according to the properties of signal. Mode over-decomposition and under decomposition are caused by incorrect parameter selection [22]. McDonald et al. [23] presented a new signal preprocessing method called maximum correlated kurtosis deconvolution (MCKD), which is ideal for processing early bearing fault signals with low signal-to-noise ratio and periodic impact characteristics [24]. To complete composite fault diagnosis, Hong et al. [25] used adaptive MCKD to decouple fault information and noise reduction signal. Zhang et al. [26] suggested a signal noise reduction method based on the teager energy operator and the MCKD. Many scholars have optimised the filter length L and the order of shift M in MCKD on this basis. Lyu et al. [27] optimized the filter length and deconvolution period of MCKD for composite fault diagnosis of gear tooth wear and bearing outer ring fault using a quantum genetic algorithm (QGA). To complete the bearing composite fault diagnosis, Miao et al. [28] used the autocorrelation of the envelope signal to estimate the prior period T. To make MCKD obtain the best noise reduction performance, Yang et al. [29] employed permutation entropy as the measurement index for filter length L selection.
In order to develop a bearing fault time series prediction model, Pan et al. [30] calculates the upper and lower boundaries of unknown elevation on a terrain profile using a double multiplicative neuron (DMN) model and an modified particle swarm optimization (MPSO) technique. For time series prediction, Raubitzek et al. [31] presented a fractal interpolation method. For long term time series prediction, Liu et al. [32] proposed dualstage two-phase DSTP-based RNN (DSTP-RNN) and DSTP-RNN-Ⅱ. Human fall risk is predicted using the depth neural network model described by Savadkoohi et al. [33]. By input high-level abstract features into an LSTM network, Zhang et al. [34] proposed the CEEMD-PCA-LSTM hybrid prediction model to complete time series prediction. The 1d-CNN model was proposed by Che et al. [35] for regression analysis of time series samples in order to establish a performance deterioration model, finally, bidirectional long short memory (Bi-LSTM) is employed to predict performance decline over time. Dempster-Shafer regression technology was proposed by Niu et al. [36] to perform time series prediction problems.
Based on the previous study, this paper takes advantage of the MCKD strong noise reduction effect in periodic signals to denoise the bearing fault time series and acquire the deconvolution time series. After that, deconvolution time series are used to train the long short-term memory recurrent neural networks, and the optimized MCKD-LSTM prediction model is created to predict the bearing fault time series.

Maximum Correlated Kurtosis Deconvolution
Mcdonald et al. proposed maximum correlated kurtosis deconvolution (MCKD) in 2012 [23]. It successfully applies in gear flaking fault diagnosis by taking into account the impact and periodic characteristics of fault information. This algorithm assumes that y represents the impulse signal, h represents the response of y after transit through the transmission path, and x represents the signal convoluted from various signals on the transmission path, which represents the process as shown in formula (1): The essence of MCKD is to find a FIR filter to solve the input signal y through the output signal x, i.e.
Where f= [f1, f2, …, fL] T is the filter factor of length L. MCKD takes maximum correlation kurtosis as its evaluation criterion and calculates as follows: In order to obtain the optimal inverse filter coefficient f, the first derivative of the objective function is zero, such as formula (4): Therefore, the optimum filter coefficient can be obtained.
The specific steps to realize MCKD are as follows: 1) Determine filter length L, The order of shift M and period T of impact signal. The deconvolution signal y of the actual acquisition signal x can be obtained by substituting the obtained inverse filter coefficients (2).

Cuckoo Search
The cuckoo search [37] is a new heuristic search algorithm that integrates the Lévy flights theory with the parasitic behavior of cuckoos. It has the characteristics of few parameters and fast convergence. Three ideal rules are assumed by the cuckoo algorithm.
1) Each cuckoo lays only one egg at a time and places the eggs in a randomly selected nest, which is also known as a host nest.
2) The parasitic nest with the highest quality eggs will be retained for the next generation.
3) The number of possible nests is fixed, and the chance of discovering host eggs in a nest is p.
When the host bird discovers the host egg, it either throws it out or abandons the nest to establish a new one in a new site.
After randomly generating n nest placements, the Lévy flights search strategy illustrated creates a new nest location using formula (10): Where: Where, X i t+ and X i t signify the ith cuckoo's nest site in the T and T+1 generations, respectively, X b t is the optimal nest location currently searched, the value α0 is used to change the step size, α0=0.01 in this paper, µ and ν are random values generated using the normal distribution, H set to 0.5 be the default, Г is the normal gamma function.
The nest with the higher fitness value is kept when the new nest location is found using the Lévy flight search strategy, then, based on the discovery probability p, a portion of the nest positions are eliminated, and a new nest position is constructed using the preferred random walk search strategy shown in formula (12): Where, r is a random number between 0 and 1, and X j t and X k t are two candidate solutions chosen at random from the current population.

Long Short-term Memory Recurrent Neural Network
The gating mechanism is employed in the long short-term memory (LSTM) recurrent neural network [38], which is frequently used to process time series signals, and specific formula of LSTM is as follows:

Forward Calculation Method of LSTM
For a given time series signal x= (x1, x2, …, xt), and the hidden layer sequence ht-1= (h1, h2, …, ht-1), the candidate state value t c , input gate value it, forgetting gate value ft, output gate value ot, memory cell value ct, hidden layer sequence ht, and output sequence yt=(y1, y2, … , yt) at time t can be determined using the conventional LSTM model (as shown in Figure 1), which is: Where, W and U are weight matrices for time series (for example, Wc represents the candidate state weight matrix from the input layer to the hidden layer, and Uc represents the candidate state weight matrix from the hidden layer at time t-1 to the hidden layer at time t), the offset vector is b. (for example, bc represents the offset vector of the candidate state from the input layer to the hidden layer), σ is the sigmoid function, meanwhile, the sigmoid function is the activation function of gating unit, the tanh function is the activation function of candidate state, and the time is represented by the subscript t.

Reverse Computation Method of LSTM
The three steps of the LSTM training algorithm are as follows: 1) The output value of each neuron is calculated forward f(y)=f(w T x) 2) The cost function is the mean square deviation function J, and the error term δj value of each neuron is calculated inversely, as follows: (1 tanh( ) ) (1 )  (27) Where, α is the learning rate, which in this case is 0.01 in this study.

Metrics
The mean square error (MSE) is used as the measuring standard in this research to assess the accuracy of the prediction model. MSE may be calculated using the following formula:

Parameter Optimization Based on Cuckoo Search
When utilizing the MCKD to minimize the noise in a bearing series signal, the filter length L and deconvolution period T must be set first. The parameter combination [L, T] of the MCKD can be searched by the CS to perform parameter adaptive screening, taking into account the interaction between affecting parameters.
When using the CS to optimize parameters, it is important to choose the right fitness function based on the signal characteristics and take into account the periodic impact signal of the bearing signal, which can differ from the noise signal. In reference [39] proposes a dimensionless crest factor of envelope spectrum (Ec) index that takes into consideration the periodic properties of fault information in vibration signals. The mathematical expression of the index Ec, assuming the signal envelope spectrum amplitude X(j) (j = 1, 2,…, M), and the index Ec is as follows: Where, emax is the highest value in the range [n×fr, fs/2] of the envelope signal obtained after Hilbert Demodulation, fr is the bearing signal's frequency conversion, and fs is the sampling frequency. The effective value is erms, which is defined as the effective value of signal following Hilbert demodulation. In this research, n = 2 is chosen to avoid the influence of fr on Ec.
The envelope spectrum peak factor Ec of the envelope signal acquired by Hilbert demodulation is determined using the MCKD operation on the fault signal at any nest Xi location (i.e. the optimum parameters), and the Ec is the fitness value of the bird nest. The envelope spectrum peak factor Ec is significant when periodic impact occurs in the decomposition results, and the decomposition effect is optimal, on the other hand, if the envelope spectrum peak factor Ec is relatively small, the decomposition effect is not well. As a result, the optimization object is set at the greatest Ec. The flow of the fault diagnosis method proposed in this paper is shown in Figure 2.

Experimental Signal Analysis
The experimental data adopts the LDK UER204 rolling element bearings dataset of XJTU-SY bearing dataset [40] of XI'AN JIAOTONG UNIVERSITY, the bearing accelerated life test bed and outer ring crack of bearing are shown in Figure 3, the two unidirectional acceleration sensor models in the vertical and horizontal directions, PCB 352C33, collect the vibration signals through a DT9837 portable dynamic signals collector, the test sampling frequency is 25.6kHz, the sampling interval is 1min, the number of sampling points is 32769, and the sampling time is 1.28s, the horizontal vibration signals in dataset bear-ing1_1 are selected for analysis.

Data Preprocessing
The vibration signal of the 50th series in the horizontal direction of Bearing1_1 is chosen for analysis in order to anticipate the bearing fault time series, the time-frequency domain diagram of a vibration signal is shown in Figures 4 (a) and (b). In the temporal domain, there are many impact components to consider, and there is no rule to follow, the frequency spectrum shows the frequency conversion 34.38Hz and its frequency doubling components, and in the high frequency band, there are several resonance frequency bands, the frequency components are complex, and the bearing outer ring fault has no distinct frequency, as a result, denoising the original time series is required to retrieve the time series including more impact information.  The MCKD is used to preprocess the original time series, with the order of shift M set to 1 and the iteration termination times G set to 20, the CS is used to optimize the filter length L and deconvolution period T in MCKD, the parameters of CS are set as follows: the dimension of solution D set to 2, the population size N set to 15, the host bird with a probability P set to 0.1, the upper and lower bounds are searched based on L>2fs/fc and T=fs/fc [41], where fs is the sampling frequency and fc is the characteristic frequency, and setting the optimization range as L=[100, 1500] and T=[50, 1000]. Figure 5(a) depicts the results, the peak factor of the local maximum envelope spectrum converges to 10.2478 at the 7th iteration, and the optimization parameter combination [L, T] corresponding to the peak factor of the local maximum envelope spectrum is [600,235]. Denoise the original time series signal using MCKD parameters to obtain the deconvolution series signal and envelope spectrum, as illustrated in Figure 5 (b) and (c), it can be seen that the impact component intensity of the deconvolution series signal is increased in the time domain, the noise interference component is greatly reduced, and the frequency conversion component 34.38Hz, 108.6Hz, and its frequency doubling emerge in the envelope spectrum. This frequency is close to the theoretical value of the bearing outer ring crack characteristic frequency of 107.91Hz, resulting in a significant reduction in noise. Then, the deconvolution time series are taken as one-dimensional time series to train the bearing fault time series prediction model, all at once, one-dimensional vibration signals are selected based on the 50th original time series every ten series, six groups of time series are taken as the training set, and seven groups of time series are taken as the test set from 102 series every two sequences, the optimized MCKD is then utilized to denoise, after establishing the whole data set, it is input the LSTM model, which is used to train the model and predict bearing fault time series.
To demonstrate the benefits of optimized MCKD-LSTM in bearing fault time series prediction, this study denoises the original fault series using EMD and optimized VMD, then completes the LSTM model training and bearing fault time series prediction comparison with it. Because the impact signal will be included in the bearing lifecycle signal, the fault impact information of the bearing will be contained in some IMF components after the signal is processed by EMD, resulting in the kurtosis diagram of ten IMF components IMF1-IMF10 according to the kurtosis criterion shown in Figure 6. The five components with the highest kurtosis, IMF9, IMF6, IMF1, IMF2 and IMF10 are chosen for signal rearrangement as one-dimensional time series. Both the training and test sets are EMD processed at the same time. Similarly, VMD denoises the data set, but when it analyzes the signals, it must take into account the decomposition mode parameter K and the penalty term coefficient α, in general, the central frequency observation method [42] and EMD-VMD can be used to select the decomposition mode parameter K, and it is easy to ignore the relationship between K value and penalty term coefficient α. The CS can search the influence parameter combination [K, α] of VMD to perform adaptive parameter selection, taking into account the interaction between the influence parameters, finally, as shown in Figure 7(a), (b), and (c), the kurtosis diagrams corresponding to the three approaches of VMD-C, VMD-EMD, and VMD-CS are obtained, in Figure 7(a), the five components IMF8, IMF6, IMF5, IMF7, and IMF3 with the highest kurtosis are selected in order for signal reformation as onedimensional time series, in Figure 7(b), the five components IMF10, IMF9, IMF7, IMF6, and IMF4 with the highest kurtosis are selected in order for signal reformation as one-dimensional time series, in Figure 7(c), the three components IMF4, IMF3, and IMF2 with the highest kurtosis are selected one by one for signal reformation as one-dimensional time series. Finally, all data set have undergone noise reduction processing, and the data set listed in Table I have been   series, and to obtain the error loss and model accuracy of the LSTM model shown in Figure 8, the LSTM model shows an over fitting phenomenon at α set to 0.02, resulting in severe swings in prediction accuracy, but the prediction accuracy of the LSTM model is steady when the learning rate between 0.01 and 0.03, as seen in the picture. Table II shows a comparison of mean square error experimental data acquired at various learning rates. The mean square error of the LSTM model on test time series 1, 2, 3, 4, 5, 6, and 7 at α set to 0.01 is a respectively, and the prediction accuracy of each time series is greater than that α set to 0.03, hence this paper selects the learning rate α set to 0.01 as the training rate for the LSTM model.

Prediction Model
The dataset produced from various data preprocessing are input into the LSTM model for prediction, and the error loss changes under various models are calculated as shown in Figure 9(a), the error loss of the prediction result of the original time series input LSTM model is the minimum, as can be seen in the image, the loss obtained by the model is often greater than that acquired by the original time series after different preprocessing procedures are applied. In test time series 1, 3, 4, 5, and 6, the prediction accuracy of MSE obtained by optimizing MCKD-LSTM model is the highest (MSE is the lowest), as shown in the accuracy comparison results of prediction results under different models in Figure  9 Table Ⅲ: on test series 1, 3, 4, 5 and 6, the mean square error of the original time series is 0.02327, 0.02384, 0.01691, 0.0349 and 0.00287 respectively. The prediction results of the optimized MCKD-LSTM model are 0.01544, 0.02019, 0.00986, 0.01002 and 8.95153e-4 respectively, and the average prediction accuracy is improved by 26%.

Conclusion
Optimized MCKD-LSTM model for bearing series prediction is proposed in this research. The model combines optimizing MCKD preprocessing of the original series and time series prediction using deconvolution signals, the effectiveness of this method is verified by XJTU-SY bearing dataset of XI'AN JIAOTONG UNIVERSITY, and the conclusions are as follows: 1) When comparing the results of EMD, VMD-C, VMD-EMD, VMD-CS and MCKD on original time series, it can be seen that the impact component of the deconvolution time series obtained by optimizing MCKD is enhanced, and the fault characteristic frequency of the bearing outer ring is extracted. 2) The accuracy and loss change of the model are affected by the learning rate of the neural network. Overfitting difficulties will occur if the change rate is too high or too low, affecting efficiency and prediction ability of the model, as a result of the experimental investigation, the learning rate of the LSTM prediction model of bearing time series is determined to be α=0.01.

Model
3) After deciding that the learning rate α set to 0.01, the optimized MCKD-LSTM model has the highest prediction accuracy, which is 26% higher than the original time series prediction, and the prediction results track the real fault data well.

Conflicts of Interest:
The authors declare no conflict of interest.