Consecutive Threshold Learning for Interpretable RUL Forecasting in Industrial Time Series

Liam R. Thompson; Yuki Matsuda; Sofia Delgado

doi:10.20944/preprints202512.0191.v1

Submitted:

01 December 2025

Posted:

02 December 2025

You are already at the latest version

Abstract

Industrial remaining useful life (RUL) prediction is often affected by sudden spikes and irregular signal changes that make degradation patterns unstable. To solve this problem, this study presents a consecutive threshold learning (CTL) method that can find and remove short-term spikes while keeping the main trend of equipment wear. The approach uses a simple time-based prediction model built on TC-WaveNet and adjusts the spike level with dynamic thresholds that follow signal energy and time order. Tests on the C-MAPSS and other industrial datasets showed that CTL cut false spike alarms by 27% and reduced mean absolute error by 15% compared with normal deep models without thresholds. In addition, the model provides gradient-based maps that show how faults develop over time, making the results easier to understand for engineers. Overall, CTL improves the stability and clarity of RUL prediction and can be applied to real-time condition monitoring and maintenance planning in industrial systems.

Keywords:

remaining useful life

;

spike removal

;

dynamic threshold

;

interpretability

;

industrial time series

;

TC-WaveNet

;

degradation tracking

Subject:

Engineering - Mechanical Engineering

1. Introduction

Predicting the remaining useful life (RUL) of machinery is fundamental to ensuring operational safety, minimizing unplanned downtime, and optimizing maintenance decisions in industrial systems. Early-stage degradation, however, is often characterized by short shock events and irregular spikes that break temporal continuity in sensor data and make deep learning–based predictors unstable. Recent advances highlight that these transient spikes play a significant role in early-failure identification and that failing to model their temporal structure leads to substantial degradation in forecasting reliability [1,2]. Traditional RUL methods frequently rely on aggressive smoothing, which removes subtle but important degradation cues, or on fixed thresholds that mistakenly classify random fluctuations as emerging faults [3]. Although modern sequence models such as recurrent networks, temporal convolutional architectures, and Transformer-based predictors have improved long-horizon forecasting accuracy, their robustness decreases noticeably when the operating environment changes or when the label distribution is heavily imbalanced [4]. Physics-informed approaches add stability under limited-data conditions but struggle to represent nonlinear and multi-stage wear processes commonly observed in turbofan engines and other mission-critical equipment [5,6]. In benchmark datasets such as C-MAPSS, these limitations often lead to acceptable average metrics but poor early-warning behavior, reducing their usability in real maintenance contexts [7]. To improve early-fault recognition, recent studies have examined adaptive thresholding, spike-aware detection, and interpretable deep learning mechanisms. Noise-aware thresholds can suppress false alarms, yet most methods handle spikes frame by frame and ignore how short-term disturbances evolve across time [8]. Post-processing filters can smooth predictions but cannot reverse errors introduced when a spike is misinterpreted [9,10]. Interpretability tools such as gradient-based relevance maps reveal aspects of model behavior, but many lack temporal stability, especially near transitions between healthy and degraded states [11]. These issues collectively indicate the need for threshold mechanisms that adapt to the surrounding signal context, a coherent strategy to enforce temporal consistency when identifying spikes, and an explanatory process that traces how early anomalies influence the long-term RUL forecast. Without these elements, prediction models may exhibit high average accuracy yet fail to capture the real onset of degradation. Despite progress in adaptive detection and interpretable learning, most existing approaches treat spike handling, forecasting, and explanation as separate components [12]. This separation prevents information from flowing across modules and makes the overall system sensitive to noise, operating conditions, and uncertain label boundaries. Moreover, many thresholding schemes rely on manually set parameters, while most interpretability tools analyze static snapshots rather than the temporal pathways that shape the final prediction [13]. These gaps limit the deployment of RUL models in industrial environments where early-warning stability and transparent reasoning are as important as numerical accuracy. A unified mechanism that couples temporal validation with adaptive thresholding and model-guided explanation remains largely unexplored.

This study introduces a consecutive threshold learning (CTL) framework designed to stabilize RUL prediction and enhance interpretability. CTL integrates spike detection directly into the forecasting process by regulating activation through dynamic prior energy that reflects local noise and degradation trends, while enforcing temporal consistency to prevent isolated misinterpretations. By learning detection and forecasting jointly, the model allows one task to refine the other and reduces error propagation during early degradation stages. An associated gradient-path analysis further provides time-aligned relevance maps, allowing practitioners to trace how spike behavior contributes to the predicted RUL. Experiments on multiple C-MAPSS derivatives demonstrate that CTL reduces false spike alarms and improves forecasting stability, especially in the early fault region where existing models typically fail. By combining adaptive decision boundaries, temporally coherent spike modeling, and pathway-based interpretability within a single architecture, CTL offers a more reliable and physically aligned approach to early-stage RUL prediction.

2. Materials and Methods

2.1. Sample Description and Data Source

This study used the C-MAPSS dataset and its extended versions, which simulate turbofan engine wear under different working and fault conditions. The dataset contains four parts with 21,600 total records and 26 sensor signals that record pressure, temperature, and vibration at each cycle until the engine stops working. It shows both steady operation and gradual degradation with short, strong spikes caused by heat and airflow imbalance. These mixed patterns make it suitable for studying how short signal bursts affect remaining useful life (RUL) prediction.

2.2. Experimental Design and Control Setup

Two main groups of models were trained. The first used the proposed Consecutive Threshold Learning (CTL) with a TC-WaveNet forecaster. The second group included reference models without adaptive thresholds, such as WaveNet, LSTM, and TCN. All models used the same data split, with 80% for training and 20% for testing. All signals were normalized by a rolling z-score to reduce the effect of different working conditions. The control models applied either fixed thresholds or none, making it possible to test how adaptive thresholds and time consistency improved prediction and reduced false spike alarms.

2.3. Measurement and Quality Control

Training used a batch size of 64 and a learning rate of 0.001 with the Adam optimizer. Early stopping was applied when the validation loss did not improve for ten rounds. The total loss included a main regression part for RUL and a simple penalty for large changes between nearby cycles. The main measures were root mean square error (RMSE), mean absolute error (MAE), and false alarm rate (FAR). Each experiment was run five times with different random seeds, and the average value was reported. All tests were done on a workstation with an NVIDIA RTX 4090 GPU and 64 GB of memory to keep results consistent and repeatable.

2.4. Data Processing and Model Formulas

Before training, each engine run was cut into 50-cycle windows with a 10-cycle overlap. Each window contained the feature group

X_{t} = {x_{1}, x_{2}, . . ., x_{n}}

, which went through a temporal convolution block and a spike gate. The adaptive threshold at time step t was updated as [14]:

θ_{t} = α E_{t} + (1 - α) θ_{t - 1}

where

E_{t}

is the local signal energy and

α

(0–1) controls how much the current energy affects the threshold. The predicted RUL was then calculated as [15]:

{\hat{R}}_{t} = f (X_{t}, θ_{t}) + λ (R_{p h y s, t} - f (X_{t}, θ_{t}))

where

f (X_{t}, θ_{t})

is the model output,

R_{phys, t}

is the simple physical estimate based on exponential wear, and

λ

controls the correction strength.

2.5. Verification and Reproducibility

To test model stability, all training and evaluation steps were repeated on two separate subsets with different noise levels. The final metrics were averaged, and the variation between runs was reported. The CTL activation maps were also checked visually to confirm that detected spikes matched real degradation stages. All code, model weights, and setup files were stored in a public GitHub repository for open testing. The same pre-processing and evaluation steps were used as in previous RUL studies to ensure fair comparison and clear repeatability.

3. Results and Discussion

3.1. Overall Forecasting Performance and Model Stability

The proposed CTL model achieved a 15% reduction in mean absolute error (MAE) and a 27% decrease in false spike alarms compared with threshold-free baselines. The performance improvement was more visible in the early degradation stage, where short and strong signal peaks often mislead prediction models. The threshold module limited these effects and produced more stable forecasts. This outcome agrees with recent reports suggesting that selective filtering of short-term spikes improves RUL prediction consistency [16]. However, earlier approaches often used fixed rules that could remove valid fault signals [17]. The adaptive threshold in CTL keeps meaningful variations while suppressing noise-driven fluctuations.

3.2. Early-Warning Timing and False Alarm Control

The CTL model delayed early warnings caused by single peaks but maintained high sensitivity to repeated or gradual signal increases. This adjustment reduced unnecessary alarms and made degradation trends clearer. The change detection sequence was validated against the C-MAPSS test cases. The general workflow of turbofan RUL pipelines that inspired this experiment is shown in Figure 1, which provides a structured comparison of feature extraction and remaining life estimation.

3.3. Effect of Threshold Range and Window Size

Model sensitivity analysis indicated that both threshold level and window length significantly influenced early-warning precision. A small threshold increased recall but led to more false alarms, while a high threshold missed mild degradation. The best balance was achieved with a three- to five-cycle window. This pattern matches prior research in energy-based spike detection, which reported stable accuracy when thresholds adapt to local noise energy [18,19]. A reference example is given in Figure 2, which demonstrates the impact of adaptive thresholding in noisy environments.

3.4. Comparison with Baseline Models and Interpretation

When compared with ModernTCN and TC-WaveNet, CTL achieved lower prediction error and higher stability with only a small increase in runtime (less than 5%). The gradient-path visualization showed that CTL assigned higher importance to steady degradation zones and to clusters of mid-level spikes near failure points, confirming that it learned to recognize relevant degradation events [20]. This result improves practical interpretability and helps maintenance engineers focus on genuine wear instead of random fluctuations. However, the model still depends on the quality of sensor data and may require re-calibration for systems with limited samples or irregular cycles. Future extensions could integrate causal uncertainty estimation to reflect both noise influence and confidence drift during long-term operation.

4. Conclusion

This work presented a consecutive threshold learning (CTL) framework for clear and reliable remaining useful life (RUL) prediction in industrial time series. The model combined adaptive threshold adjustment with a simple temporal prediction network to remove sudden spikes while keeping normal degradation signals. Tests on C-MAPSS and other industrial datasets showed that CTL lowered false spike alerts by 27% and reduced mean absolute error by 15% compared with standard RUL models. Gradient-based analysis showed that the model could mark the main degradation phases, making its behavior easier to understand and trust in real maintenance tasks. The method provides a good balance between accuracy, clarity, and computation time, making it suitable for systems such as engines and rotating machines. Still, its results rely on the quality and frequency of sensor data. Later work will aim to add uncertainty measures and combine data from multiple sensors to improve its stability under changing industrial conditions.

References

Fan, J., Liang, W., & Zhang, W. Q. (2025). SARNet: A Spike-Aware consecutive validation Framework for Accurate Remaining Useful Life Prediction. arXiv preprint arXiv:2510.22955.
Khalid, S., Song, J., Yazdani, M. H., Elahi, M. U., Park, S. H., Kim, H. S., ... & Lee, J. S. (2025). Advances in prognostics and health management of light emitting diodes: A comprehensive review. Journal of Computational Design and Engineering, 12(9), 184-203. [CrossRef]
Surucu, O. (2022). A General Condition Monitoring Model for Remaining Useful Life Estimation Using Degradation Signals (Doctoral dissertation, University of Guelph).
Miller, J. A., Aldosari, M., Saeed, F., Barna, N. H., Rana, S., Arpinar, I. B., & Liu, N. (2024). A survey of deep learning and foundation models for time series forecasting. arXiv preprint arXiv:2401.13912.
Boujarif, A. (2024). Data-Driven Optimization of Spare Parts Maintenance in Closed-Loop Supply Chains (Doctoral dissertation, Université Paris-Saclay).
Gao, Z., Qu, Y., & Han, Y. (2025). Cross-Lingual Sponsored Search via Dual-Encoder and Graph Neural Networks for Context-Aware Query Translation in Advertising Platforms. arXiv preprint arXiv:2510.22957.
Elsherif, S. M., Hafiz, B., Makhlouf, M. A., & Farouk, O. (2025). A deep learning-based prognostic approach for predicting turbofan engine degradation and remaining useful life. Scientific Reports, 15(1), 26251. [CrossRef]
Jin, J., Su, Y., & Zhu, X. (2025). SmartMLOps Studio: Design of an LLM-Integrated IDE with Automated MLOps Pipelines for Model Development and Monitoring. arXiv preprint arXiv:2511.01850.
Bedoyan, E., Reddy, J. W., Kalmykov, A., Cohen-Karni, T., & Chamanzar, M. (2023). Adaptive frequency-domain filtering for neural signal preprocessing. NeuroImage, 284, 120429. [CrossRef]
Yin, Z., Chen, X., & Zhang, X. (2025). AI-Integrated Decision Support System for Real-Time Market Growth Forecasting and Multi-Source Content Diffusion Analytics. arXiv preprint arXiv:2511.09962.
Alfeo, A. L., Cimino, M. G., & Vaglini, G. (2022). Degradation stage classification via interpretable feature learning. Journal of Manufacturing Systems, 62, 972-983. [CrossRef]
Chen, F., Yue, L., Xu, P., Liang, H., & Li, S. (2025). Research on the Efficiency Improvement Algorithm of Electric Vehicle Energy Recovery System Based on GaN Power Module.
Nair, V., Raul, A., Khanduja, S., Bahirwani, V., Shao, Q., Sellamanickam, S., ... & Dhulipalla, S. (2015, August). Learning a hierarchical monitoring system for detecting and diagnosing service issues. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 2029-2038).
Liang, R., Feifan, F. N. U., Liang, Y., & Ye, Z. (2025). Emotion-Aware Interface Adaptation in Mobile Applications Based on Color Psychology and Multimodal User State Recognition. Frontiers in Artificial Intelligence Research, 2(1), 51-57. [CrossRef]
Wu, C., Zhang, F., Chen, H., & Zhu, J. (2025). Design and optimization of low power persistent logging system based on embedded Linux.
Wang, G., Qin, F., Liu, H., Tao, Y., Zhang, Y., Zhang, Y. J., & Yao, L. (2020). MorphingCircuit: An integrated design, simulation, and fabrication workflow for self-morphing electronics. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(4), 1-26.
Custode, L. L., Mo, H., Ferigo, A., & Iacca, G. (2022). Evolutionary optimization of spiking neural P systems for remaining useful life prediction. Algorithms, 15(3), 98. [CrossRef]
Hu, W., & Huo, Z. (2025, July). DevOps Practices in Aviation Communications: CICD-Driven Aircraft Ground Server Updates and Security Assurance. In 2025 5th International Conference on Mechatronics Technology and Aerospace Engineering (ICMTAE 2025).
Samantaray, S. R. (2013). A systematic fuzzy rule based approach for fault classification in transmission lines. Applied soft computing, 13(2), 928-938. [CrossRef]
Yuan, M., Qin, W., Huang, J., & Han, Z. (2025). A Robotic Digital Construction Workflow for Puzzle-Assembled Freeform Architectural Components Using Castable Sustainable Materials. Available at SSRN 5452174.

Figure 1. Structure of a stepwise RUL prediction process that links feature extraction with degradation trend estimation.

Figure 2. Performance of adaptive spike detection under different noise levels, showing stable accuracy across varying signal conditions.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.