Submitted:
28 April 2026
Posted:
29 April 2026
You are already at the latest version
Abstract
Keywords:
I. Introduction
- Probabilistic models can only indicate anomaly especially when the model is trained uniquely using healthy data, due to absence of failure data [8,9]. Physics-informed method is much harder to be effective with realistic data due to oversimplification and assumptions of physical phenomena [10,11]. Meanwhile, ensemble models average prediction measure by integrating several models to obtain a stronger model. Thus, they also act as probabilistic models [12]. In other words, these models lack of transparency make it difficult for any maintenance decision making.
- XAI, on the other hand, is able to make machine learning black-box model transparent. XAI approaches, however, risk giving false explanation if the developed model does not represent well the data. This can happen if the model is not well trained or trained with insufficient data.
- To develop a transparent ML model that predicts and explains GT multistep ahead, continuous RUL prediction.
- To validate the prediction results and variables affecting the failure that match the thermodynamic modelling.
- Link is made between the generated XAI explanation to the thermodynamic performance modelling to discover the defective component which is still unexplored in previous researches.
- The work applied a combination of explanation techniques from ML’s aleatoric and epistemic uncertainty behaviour to global and local Shapley values contributions to ensure comprehensive explanation on prediction quality, model’s optimization level, overall and individual prognostics.
- The framework developed is a combination of all the mentioned error-proof methods, complementing each other weaknesses and taking advantage of their explanation abilities.
II. Literature Review
III. Methodology
A. Data Description
| = 101.3 kPa | = 0.01 | = 0.06 |
| = 0.02 | = 0.03 | = 0.025 |
| Input | Output | Purpose |
| , , , | , , , and | Thermodynamic Modelling |
| , , , | ML Model Modelling |
B. Healthy Data Modelling
- The compressor power, , compressor inlet temperature, , pressure, , humidity and the power turbine speed, were the inputs to the model.
- The values of compressor inlet flow, , compressor rotational speed, , gas generator turbine inlet temperature, and gas generator turbine pressure ratio, were estimated.
C. Failure Data Creation
D. ML Models Development
- structure: As shown in Figure 3(a), this structure add aleatoric uncertainty (AU) layer to LSTM output. AU is uncertainty related to input data and it cannot be reduced with more data. The ‘Dense’ layer performs nonlinear operation. Then, ‘Dense_1’ layer, or fully connected layer, takes the input from the LSTM layer and predicts the mean and standard deviation of the prediction. The mean and standard deviation outputs are used later to construct the prediction distribution. The AU layer then samples from the predicted distribution.
- structure: As shown in Figure 3(b), this structure add epistemic uncertainty (EU) layer to LSTM output. EU is uncertainty related to model’s parameters and it can be reduced with more data. The weights from are transferred to a similar model ( ) without AU layer. 9 copies of with similar parameters are created, forming an ensemble. The copies are initialized randomly with random seeds, trained and tested. EU can be obtained by monitoring the predictions discrepancies of the 10 ensemble models ( and its copies).
E. Models’ Training and Prediction
F. Multi-Step Ahead, Long Term Forecast
| Input Data Hour | Input Data State |
| <10 | Highly accurate prediction |
| 10–20 | Good prediction |
| 20–50 | Reasonable prediction |
| >50 | Inaccurate prediction |
G. Explanation Mechanisms
- Uncertainty trend is exploited to explain the model’s confidence when it predicts certain output.
- Apart from the uncertainty behavior, a quantitative approach to support the prediction is vital. For failure data, the uncertainty bounds at 95% confidence interval (CI) need to be strictly below the healthy data region for clear decision making. The upper and lower bounds can be estimated at 95% confidence interval (CI) using equations (22) and (23).
- Meanwhile SHAP explanation is used to highlight the features responsible for RUL estimation.
H. Shapley Value Explanation
| Input Data Hour | Input Data State | 14 Days Ahead Output Data Hour | Output Data State |
| 0 to 168 (Week 1) |
Healthy and Failure | 336-504 (Week 3) | Healthy and Failure |
| 96 to 264 (Week 1 and 2) | Failure | 432 to 600 (Week 3 and Week 4) | Failure |
IV. Results and Discussion
A. Thermodynamic Healthy Data Modelling
| Parameter | Average | |||||
| MAPE (%) | 0.71 | 1.32 | 2.13 | 0.40 | 1.12 | 1.14 |
| Work | Current Work | Hu et al. 2022 | Sankar et al. 2022 |
| MAPE (%) | 1.13 | 1.50 | 2.72 |
B. Anomaly Threshold Values
| Threshold | Equivalent HI |
| Lower = 12707.78 kW | 0.37 |
| Upper 13882.48 kW | 0.78 |
C. Multi-Step Ahead, Long-Term Forecasting
| Scenario | Healthy Data with AU | Healthy Data with EU | Full Failure Data with AU |
| MAPE (%) | 18.04 | 4.37 | 18.13 |
| Work | Dataset | Maximum Prediction Horizon | MAPE (%) |
| Feng et al. 2020 | Wind Speed | 5 steps | 5-15 |
| Nguyen et al. 2021 | Reactor Coolant Pumps | 18 steps | 13-30 |
| Current Work | Gas Turbine | 336 steps | 18 |
D. Healthy Data Predictions with Uncertainty Explanation
| True HI | HI with AU | AU Level | HI with EU | EU Level |
| 0.591 | 0.487 | 0.022 | 0.565 | 0.002 |
- As can be seen from Figure 7(a) and 7(b), the predictions with AU and EU boundaries are within the healthy data range. The possibility of healthy data prediction is thus high.
- The AU uncertainty bounds considering 95% CI, shows that the sampled values are located +/- 0.07% from the mean HI. This low AU level infers that the testing data is nearly similar to the training data and the model is confident in its prediction.
- The EU uncertainty bounds considering 95% CI, shows that the sampled values are located +/- 0.06% from the mean HI. The very low EU level indicates that the developed model represents well the data.
- Additionally, the low uncertainty levels imply that the prediction and explanation generated from this model can be trusted.
E. Failure Data Predictions with Uncertainty Explanation
| True HI | HI with AU for Partial Failure | AU Level for Partial Failure | HI with AU for Full Failure | AU Level for Full Failure |
| 0.591 | 0.431 | 0.146 | 0.474 | 0.147 |
- For partial failure, the prediction with AU boundaries is within the healthy data range from beginning until around 106th and 444th hour respectively. This indicates that the possibility of healthy data prediction is high initially when the healthy part of the test data is used.
- Then, the prediction passed to the degradation range around 120th and 456th hour respectively until the end, pointing to the high possibility of failure when the failure part of the test data is used.
- The final degradation level’s prediction in Figure 8(b) also approaches the true future values.
- The average AU is higher than the healthy data AU level of 0.022 in Table VIII which points to anomaly.
- The AU uncertainty bounds considering 95% CI, shows that the sampled values are located +/- 0.51% from the mean HI. The low level infers that the testing data is nearly similar to the training data and presents a good prediction confidence.
- The predictions are within the healthy data range from beginning until around 106th and 444th hour respectively and then descended rapidly to the anomaly data range around 120th and 456th hour respectively until the end, pointing to the high possibility of failure.
- The final degradation level’s prediction in Figure 9(b) also approaches the true future values.
- For full failure, the average AU which is 0.147 is higher than the healthy AU. This is an indication that the failure data nature is different from the healthy training data and that anomaly has occurred.
- The AU uncertainty bounds considering 95% CI, show that the sampled values are located +/- 0.62% from the mean HI. The low level infers that the testing data is nearly similar to the training data and presents a good prediction confidence.
- In Figure 9(a), the AU bounds at 95% CI is mostly always below the present true HI level. The possibility of degradation is thus high.
F. Healthy and Failure Data SHAP Explanation
- , , and contribute to prediction according to order of contribution.
- The contributions are almost balanced in both directions, indicating equal total forces that pull the prediction up (healthy side) or pushing it down (failure side).
- From 1st to 10th hour, and trade the most significant contribution spot. Then trades places with and for second and third contributors.
- From 11th to 166th hour, the system enters a period of high volatility where ,, and frequently swap the top two rankings. Meanwhile, remains the fifth contributor.
- The constant polarity shifts and contribution rank trades suggest a system in operational equilibrium which is typical of a healthy state.
- For both failure modes, contributes the most to prediction followed by , and . This explanation is thus different from the healthy data explanation which points to anomaly.
- For both failure modes the contribution of strongest variables are toward negative directions, which indicates degradation.
- For partial failure, and contributions level increase significantly as data evolves from healthy to failure. and contributions also increase slightly. This is because it takes more contributions to change the state from healthy to failure.
- For full failure, and contributions level decrease significantly as data is completely in failure status. and contributions also decrease slightly. This is because it does not take too much contributions to preserve the failure state.


- Before 96th hour, the 1st place position is where and +/ frequently trade places.
- and consistently occupy the 2nd and 3rd spots. Their polarities shift often, suggesting the system is attempting to balance fluctuations within the partial failure state.
- Throughout the entire 96-hour sequence, remains almost exclusively in the 5th place ranking.
- After 96th hour, the contribution trend becomes more fixed and is defined by the consistent dominance of negative features.
- The 1st place position settles almost permanently into . Unlike before, there are far fewer instances where and challenge for the top spot.
- transitions from a rotating secondary feature to the nearly permanent 2nd place contributor. This duo of and - defines the late-cycle trend.
- , which was highly influential before 96th hour, begins to drop in the rankings. It moves from the top three to the 4th position and exhibits polarity flips, such as the shift to in the final hours.
- The trend before 96th hour represents a fluctuating partial failure where positive and negative influences from pressure and ambient temperature compete. After 96th hour, the trend transitions into a consistent downward decline led by negative pressure and gas generator speed, with other features losing their relative importance.
- From 1st to 167th hour, contributes the most with a dominant negative contribution. Then trades places with and for second and third contributors, though maintains a much more stable presence in the top three.
- Throughout the sequence, and trade third and fourth places with consistent negative contributions.
- The stable negative contributions across pressure, speed, and temperature consistently pull the prediction values lower as the system fails.
G. Explanation Validation with Thermodynamic Equations
- , and contribute to prediction according to order of contribution. This explanation is compatible with the thermodynamic modelling. The contributions are almost all balanced in both directions.
- In terms of thermodynamic modelling equation, is proportionate to (7) and values which greatly influenced (8).
- is proportionate to (3) and compressor map) and which influenced .
- , influences and (3) and compressor map) which in turn influence and
- , on the other hand, is inversely proportionate to and (14) which is conditioned by .
- However, influence on is lesser compared to and as there is no direct relationship between and . is just a secondary product of and (7), thus its contribution to is weaker than those features.
- and contribute to prediction according to order of contribution. Again, the explanation is compatible with the thermodynamic modelling.
- The global explanation is different from the healthy data explanation, which points to anomaly.
- The contribution of strongest variables are toward negative directions, which also indicates anomaly.
- It also implies that the compressor discharge health parameter, and as the most contributing anomaly features, while and contributions had fallen to 4th and 5th place. This implies an anomaly coming from the compressor.
- Since values decrease over time, the only plausible cause is the decrease of which increases and reduces over time (8).
- reduction, in turn, can only be caused in this case by reduction in compressor pressure ratio, (7), as compressor efficiency is not part of the ML modelling.
- reduction is of course proportionate to and reduction (3) and compressor map).
- Since is conditioned by , will also have to change (14).
- , while having influence on , does not change due to failure injection, thus the weaker contribution compared to and .
V. Conclusion
| Nomenclature | ||
| Description | Unit | |
| N | Rotational speed | (RPM) |
| P | Pressure | (kPa) |
| PW | Total power output | (MW) |
| T | Temperature | (°K) |
| Specific heat | (kJ/ kg-K) | |
| Mass flow rate | (kg/hr) | |
| Pressure loss | (kPa) | |
| Air bleed | (kg/h) | |
| Greek Letter | ||
| Efficiency | % | |
| Isentropic index | ||
| Superscript | ||
| Bl | Blowoff | |
| C | Compressor | |
| CC | Combustion chamber | |
| Cor | Corrected | |
| Corc | Corrected using curve | |
| Ed | Exhaust duct | |
| GG | Gas generator | |
| GGT | Gas generator turbine | |
| PT | Power turbine | |
| amb | Ambient condition | |
| 1 | Compressor inlet | |
| 2 | Compressor outlet | |
| 3 | Combustion chamber outlet | |
| 4 | Gas generator outlet | |
| 5 | Power turbine outlet | |
References
- Yu, C. et al., Study on the effects of nanobubble-enhanced diesel spray technology on the performance of heavy-duty gas turbines. Therm. Sci. Eng. Prog. 2026, vol. 73, 104677. [Google Scholar] [CrossRef]
- Hwang, R.; Lee, J.; Kim, J.; Moon, I.; Oh, M. Autonomous Digital Twin Framework for gas turbine combined cycle control loops: Comparative study of proportional-integral control, reinforcement learning, and reinforcement learning with agents. Energy AI 2026, vol. 24, 100727. [Google Scholar] [CrossRef]
- Mohd Irwan Shah, B. S.; Ishak, A. J.; Hassan, M. K.; Norsahperi, N. M. Revolutionizing gas turbine performance analysis with Deep Learning Powered Digital Twin. E-Prime – Nexus Electr. Electron. Intell. Eng. 2026, vol. 17, 201178. [Google Scholar] [CrossRef]
- Zeng, J.; Liang, Z. Predictive group maintenance using probabilistic prognostics and deep reinforcement learning. Comput. Amp Ind. Eng. 2026, vol. 212, 111738. [Google Scholar] [CrossRef]
- Ayman; Onsy, A.; Attallah, O.; Brooks, H.; Morsi, I. Feature learning for bearing prognostics: A comprehensive review of machine/Deep learning methods, challenges, and opportunities. Measurement 2025, vol. 245, 116589. [Google Scholar] [CrossRef]
- Machlev, R. Ev battery fault diagnostics and Prognostics using Deep Learning: Review, Challenges & Opportunities. J. Energy Storage 2024, vol. 83, 110614. [Google Scholar] [CrossRef]
- Cheng, W.; et al. Diagnostics and Prognostics in power plants: A systematic review. Reliab. Eng. Amp Syst. Saf. 2025, vol. 255, 110663. [Google Scholar] [CrossRef]
- Regazzoni, C.; Krayani, A.; Slavic, G.; Marcenaro, L. Probabilistic anomaly detection methods using learned models from time-series data for multimedia self-aware systems. In Advanced Methods and Deep Learning in Computer Vision; 2022; pp. 449–479. [Google Scholar] [CrossRef]
- Schummer, P. et al., Machine learning-based network anomaly detection: Design, implementation, and evaluation. AI 2024, vol. 5(no. 4), 2967–2983. [Google Scholar] [CrossRef]
- Naser, M. Z. Fundamental flaws of physics-informed neural networks and explainability methods in Engineering Systems. Comput. Amp Ind. Eng. 2026, vol. 212, 111704. [Google Scholar] [CrossRef]
- Yuan, X.; Bai, T.; Peng, C. Hybrid modeling method for reactor coolant loop combining data-driven and physics-based constraints. Energy 2026, vol. 345, 140177. [Google Scholar] [CrossRef]
- Ensemble Modeling with a Bayesian Maximal Information Coefficient-Based Model of Bayesian Predictions on Uncertainty Data.
- Explainable AI for industrial fault diagnosis: A systematic review.
- Yang, W. et al., “Survey on Explainable AI: From Approaches, Limitations and Applications Aspects”. Hum.-Cent. Intell. Syst. 2023, vol. 3(no. 3), 161–188. [Google Scholar] [CrossRef]
- S. A. and S. R., “A systematic review of Explainable Artificial Intelligence models and applications: Recent developments and future trends”. Decis. Anal. J. 2023, vol. 7, 100230. [CrossRef]
- Gandhudi, M.; P․J․A․, A.; Srinivas, S.; G․R․, G. Causal inference and explainable artificial intelligence based quantum deep learning for remaining useful lifetime prediction. Knowl.-Based Syst. 2026, vol. 340, 115669. [Google Scholar] [CrossRef]
- Soualhi, M.; Nguyen, K. T. P.; Medjaher, K. Explainable RUL estimation of turbofan engines based on prognostic indicators and heterogeneous ensemble machine learning predictors. Eng. Appl. Artif. Intell. 2024, vol. 133, 108186. [Google Scholar] [CrossRef]
- Qaid, H.; et al. Large language models for explainable fault diagnosis of machines; 2025. [Google Scholar] [CrossRef]
- Forest, F.; Rombach, K.; Fink, O. Interpretable prognostics with concept bottleneck models. Inf. Fusion 2025, vol. 124, 103427. [Google Scholar] [CrossRef]
- Priyadarshini, *!!! REPLACE !!!*. An explainable Autoencoder-based feature extraction combined with CNN-LSTM-PSO model for improved predictive maintenance. Comput. Mater. Amp Contin. vol. 83(no. 1), 635–659, 2025. [CrossRef]
- Razak, M. Y. Industrial Gas Turbines: Performance and Operability; CRC Press: Boca Raton, Cambridge, England; Woodhead Pub, 2008. [Google Scholar]
- Lazzaretto; Toffolo, A. Analytical and Neural Network Models for Gas Turbine Design and Off-Design Simulation. Int. J. Thermodyn. 2001, vol. 4(no. 4), 173–182. [Google Scholar] [CrossRef]
- Razmjooei, M.; Ommi, F.; Saboohi, Z. Experimental Analysis and modeling of gas turbine engine performance: Design Point and off-design insights through system of equations solutions; 2024. [Google Scholar] [CrossRef]
- Zwebek, *!!! REPLACE !!!*; Pilidis, P. “Degradation effects on combined cycle power plant performance: Part 1 — gas turbine cycle component degradation effects,”. In Cycle Innovations; Coal, Biomass and Alternative Fuels, Combustion and Fuels; Oil and Gas Applications, Jun 2001; Volume 2. [Google Scholar] [CrossRef]
- Vivas; Allende-Cid, H.; Salas, R. A systematic review of statistical and Machine Learning Methods for electrical power forecasting with reported MAPE score. Entropy 2020, vol. 22(no. 12), 1412. [Google Scholar] [CrossRef]
- Suradhaniwar, S.; Kar, S.; Durbha, S. S.; Jagarlapudi, A. Time series forecasting of Univariate Agrometeorological Data: A comparative performance evaluation via one-step and multi-step ahead forecasting strategies. Sensors 2021, vol. 21(no. 7), 2430. [Google Scholar] [CrossRef]
- Dash, Ch. S.; Behera, A. K.; Dehuri, S.; Ghosh, A. An outliers detection and elimination framework in classification task of Data Mining. Decis. Anal. J. 2023, vol. 6, 100164. [Google Scholar] [CrossRef]
- Sankar, B.; Shah, B. J.; Jana, S.; Satpathy, R. K.; Gouda, G. Modeling of degradation in gas turbine engine by modified off design simulation. Def. Sci. J. 2022, vol. 72(no. 2), 135–145. [Google Scholar] [CrossRef]
- Hu, M. et al., Digital Twin Model of gas turbine and its application in warning of performance fault. Chin. J. Aeronaut. 2023, vol. 36(no. 3), 449–470. [Google Scholar] [CrossRef]
- Asif, *!!! REPLACE !!!*; et al. A deep learning model for remaining useful life prediction of aircraft turbofan engine on C-MAPSS dataset. IEEE Access 2022, vol. 10, 95425–95440. [Google Scholar] [CrossRef]
- Asif, et al. A deep learning model for remaining useful life prediction of aircraft turbofan engine on C-MAPSS dataset. IEEE Access 2022, vol. 10, 95425–95440. [Google Scholar] [CrossRef]










Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).