A New Health Assessment Prediction Approach: Multi-Scale Ensemble Extreme Learning Machine

-This work can be considered as a first step of designing a future competitive data-driven approach for remaining useful life prediction of aircraft engines. The proposed approach is an ensemble of serially connected extreme learning machines. The results of prediction of the first networks are scaled and fed to the next networks as an additive features to the original inputs. This feature mapping allows increasing the correlation of training inputs with their targets by holding new prior knowledge about the probable behavior of the target function. The proposed approach is evaluated under remaining useful estimation using a set of “time-varying” data retrieved from the public dataset C-MAPSS (Commercial Modular Aero Propulsion System Simulation) provided by NASA. The prediction performances are compared to basic extreme learning machine and proved the effectiveness of the proposed methodology.


Introduction
Remaining Useful Life (RUL) of complex systems based on conventional prediction paradigms such as physical modeling can be considered as very difficult task due to the need of the deep knowledge of systems components and its interior interactions. Besides, most physical models are constructed under limited conditions which is the appropriate explanation of the poor generalization [1].
Nowadays, the availability of historical data of operating systems due the continuous growth in sensors technologies makes Machine Learning (ML) tools receiving a growing attention. Replacing classical predictions paradigms with training tools is the appropriate solution for complexity and human intervention reduction.
Ensembles learning [2]- [5], Hybrid algorithms [6]- [9] and deep learning families [10]- [13] are the most investigated ML approaches for RUL estimation. However prediction algorithms based on old training algorithms such as backpropagation for neural networks and Lagrange minimization for support vector machines are very expensive and time consumers.
In recent decades, a new fast and accurate training approach named Extreme Learning Machine (ELM) has been spread wide to fit many prediction applications under different architectures [14]. ELM was firstly created to remove the barriers between biological learning and Artificial Neural Networks (ANN) by introducing a new tuning theory. In ELM, the hidden nodes parameters need not to be tuned. In fact, the output weights are the only responsible for universal approximation [15]. Therefore, in this context a new data-driven approach is reconstructed based on ELM in the aim of producing a very fast and The remainder of this work can be addressed as follow: in section 2, the description of the used materials in this study is briefly introduced. Section 3 elaborates the proposed approach adopted in this work. Experiments, results and discussions are showcased in Section 4, Finally section 5 is the conclusion of this work.

Materials
C-MAPSS dataset is a publically provided dataset as a benchmark for constructing and evaluating health state estimators of aircraft engines under operating conditions [16]. Retrieved data is divided into four datasets according to several failure modes and operating conditions. Each subset carries a set of 100 different degradation profiles of the engine defined by time series measurements of 24 different sensors. According to prognostic health management challenge in 2008 [17] the target function for each life cycle, describes a non cumulative degradation as a piecewise stochastic linear function. Figure 1 gives an insight about the behavior of sensors measurements and the RUL target function in one life cycle of the engine. Figure: 1 . a) sensors measurments behavior in one lif cycle. b) RUL target function.

Methods
In 2004 [15], ELM were proposed as the new training scheme for Single hidden Layer Feedforward Neural networks (SLFN), than it was extended to fit a verity of architectures for different applications [14]. Unlike backpropagation algorithm, the basic training rules of ELM can be simply addressed in three steps subject to minimize the objective function in equation (1). β is the output weights matrix, H is feature mapping of the entire training set and T are the desired targets. 1. The one needs to generate the hidden nodes parameters: weights A and biases B, randomly from any probability distribution as full rank and linearly dependent matrices respectively.
2. Calculate and activate the hidden layer for a "single-batch" of data according to any chosen activation function G as shown in equation (2). X refers to the entire training inputs.
3. Determine analytically the output weights β depending on the Pseudo-inverse of the H matrix using equation (3).
Theoretically and without considering over-fitting and other ill posed problems, it is mentioned in ELM theories that the more the number of hidden nodes, the more the accuracy the network may achieved [15]. However, experimentally in higher dimensional problems such as the feature space of the current studied dataset, more hidden nodes can delay the training process or obscure the computer memory.
In the proposed approach and based on [18], we constructed a new multilayer neural network architecture based on ELM learning rules. The new neural network that is showcased in Figure 2, allows the use of small ELM learners to map the training inputs accurately to achieve more satisfaction and fast generalized approximation.
Our contribution attempts to prove that: "By scaling the predicted target of one network the same as inputs and considered it as an additive feature to the original inputs, the reconstructed data at the finale learner will carries a prior knowledge about the probable targets.". The new feature mapping in this case does not consider the hidden layers before the final feature mapping. In fact, the results of last learner are the final decision of the training model.

Experiments, results and discussion
In the present comparative study, the subset that is named FD001 from C-MAPSS dataset is considered. In this case, only one failure mode and one operating condition are investigated. As a preprocessing step of our data, sensors whose indices are {7, 8,9,12,14,16,17,20, 25, 26} are chosen for the application and their measurements are scaled using min-max normalization according to the literature [1].
The results of this comparative study between ELM and proposed MSE-ELM are carried out using the architecture addressed in Table 1. To study the algorithms performances under the same conditions we make the hidden nodes in SLFN architecture equal to the sum of hidden neurons in the entire MSE-ELM network.

MSE-ELM 4 sigmoid
As a first advantage of the new architecture, the training based basic ELM makes the SLFN much sensitive towards the risk of over-fitting before the expected accuracy is achieved. Weakness of SLFN towards empirical risk is already proven by Figure 3 where the networks is well trained based on ELM and over-fits earlier in testing phase. Unlike basic ELM, the proposed architecture allows the networks to extend positively to achieve more accuracy.

Conclusion
The present works attempts to prove that the theory of the new feature mapping based ensemble of small neural networks and feature scaling can enhance the prediction model in term of accuracy and time consumption. After evaluating the proposed approach under higher dimensional "time-varying" data obtained from C-MAPSS simulation software, and the results strongly supports the credibility of the new training scheme.
MSE-ELM is a not complicated offline training approach for neural networks designed to study the accuracy of the proposed feature mapping. Therefore, the future approaches will focus on enhancing this algorithm by integrating online and adaptive learning attitudes attempting to produce a more competitive 'data-driven' approach to other tools in the literature.