Spike-Guided Multi-Stage RUL Estimation via Physics-Constrained Temporal Networks

Emma L. Carter; Hiroshi Yamamoto; Amira Hassan; David R. Collins

doi:10.20944/preprints202512.0249.v1

Submitted:

01 December 2025

Posted:

02 December 2025

You are already at the latest version

Abstract

Predicting the remaining useful life (RUL) of machines is important for safe and efficient operation, but short signal spikes near the start of faults often disturb prediction results. This study proposes a spike-guided, multi-stage method that combines data learning with a simple physical update. A convolution block filters spike signals using envelope energy and basic variance checks, and a two-path predictor joins time features with a physical correction term. Tests on the NASA turbofan and PRONOSTIA bearing datasets showed that the method cut RMSE by 14–21% and raised the early-warning score by about 18% compared with other deep learning models. The spike check step also reduced false alarms and kept the trend of wear smoother over time. These results show that short bursts hold key signs of early faults and that adding physical rules helps keep forecasts more stable. The method can help with early maintenance planning in factory systems, though wider tests under more working conditions are still needed.

Keywords:

remaining useful life

;

spike detection

;

fault prediction

;

physical update

;

early warning

;

degradation monitoring

Subject:

Engineering - Mechanical Engineering

1. Introduction

Predicting remaining useful life (RUL) is essential for maintenance planning in aviation, semiconductor tools, and energy infrastructures. However, real-world degradation signals often contain short, high-energy spikes near fault onset that distort trend estimation. These spikes can push models to overfit noise, delay early-warning detection, and inflate predictive uncertainty [1,2]. Classic sequence models—such as RNNs, temporal convolutional networks (TCNs), and attention-based forecasters—tend to either oversmooth local bursts or mistake them for structural transitions, causing early-stage degradation cues to be lost. Signal processing methods reduce variance through band-limiting, filtering, or wavelets, but they risk discarding weak degradation signatures or shifting onset timing [3]. Physics-guided approaches provide additional structure by incorporating health indices or simplified degradation laws, and they improve extrapolation when data are sparse. Yet they often underfit multi-regime and spike-dominated signals because the physical priors are typically too coarse or too weakly enforced [4].

Recent studies attempt to combine these ideas through hybrid pipelines that mix learned features, spike detectors, and priors. However, reported gains vary widely across datasets, and performance is sensitive to breakpoint handling and the stability of extracted spike features [5]. Moreover, the literature on noisy-sequence modeling has increasingly emphasized the importance of context construction, including how learned signals and physically constrained updates should interact. Work on context reconstruction for time-dependent inference—such as the plug-in reconstructor introduced in [6]—shows that reorganizing and refining evidence before prediction can substantially stabilize model behavior. This aligns with observations in predictive maintenance: even when local features are identified correctly, poorly structured inputs or weakly integrated priors can propagate errors through the forecasting horizon. Across existing studies, three gaps remain clear. First, most deep RUL models still treat spikes as noise and rely on global smoothing operations, which weakens detection of short pre-failure bursts and delays alarms [7,8]. Second, breakpoint checks are often performed after forecasting rather than integrated into the prediction loop, allowing regime shifts to propagate into the later stages of the trend [9]. Third, physical priors are frequently used indirectly—appearing as regularization terms or post-hoc filters—rather than as an explicit residual path capable of correcting learned trends under domain-specific constraints [10,11]. These limitations consistently appear in turbofan run-to-failure datasets and industrial pump systems with intermittent surges, where early-warning metrics degrade even when overall RMSE remains competitive [12].

This study develops a spike-guided, multi-stage RUL prediction framework that explicitly targets onset preservation, regime correction, and interpretable integration of learned and physics-based components. A convolutional encoder isolates spike candidates using envelope energy and kurtosis–variance coupling, preserving weak onset cues while filtering transient fluctuations. A dual-branch forecaster blends a ModernTCN path with a physics-prior residual, allowing learned trends and constrained updates to interact through a clear and interpretable pathway. A lightweight validation block performs structural breakpoint checks before the forecast is finalized, reducing regime carry-over across steps. Experiments on turbofan engines and semiconductor pump datasets show that the proposed method reduces RMSE, improves early-warning scores, and maintains transparent component behavior compared with recent deep learning baselines. Overall, the goal is to provide an onset-aware front end, a principled coupling of physics priors with learned forecasts, and an integrated breakpoint mechanism embedded directly within the prediction loop—thereby improving both reliability and interpretability in RUL estimation.

2. Materials and Methods

2.1. Sample Description and Study Area

This study used two public datasets that represent typical industrial equipment degradation. The first is the C-MAPSS turbofan dataset from NASA, which includes 100 simulated engine life cycles under four working conditions and six fault types. Data were recorded once per cycle from sensors monitoring temperature, pressure, and vibration. The second is the PRONOSTIA bearing dataset from the FEMTO-ST Institute, containing 17 full run-to-failure records collected at 25.6 kHz under constant load and speed. These datasets cover the entire process from normal operation to final failure, providing both steady and transient signals suitable for detecting early fault spikes.

2.2. Experimental Design and Control Comparison

The proposed spike-guided model was compared with four baseline methods: LSTM, GRU, CNN–TCN hybrid, and a physics-regularized RULNet. In each dataset, 80% of the data were used for training and 20% for testing, following standard splits in prior studies. To ensure fair comparison, all models used the same preprocessing and normalization methods. A control setup without spike detection was used to check how the spike filtering and residual correction parts improved early fault detection and RUL accuracy.

2.3. Measurement Procedure and Quality Control

Before feature extraction, all sensor data were normalized using z-score scaling. The envelope of each signal was calculated with the Hilbert transform, and local peaks were identified using a simple kurtosis–variance threshold. The model was trained with five-fold cross-validation, and training stopped early if validation loss did not improve after ten epochs. The loss combined mean squared error and a penalty for breaking physical constraints. Performance was measured using RMSE, MAE, and the early warning score (EWS), which evaluates how early the model detects a fault.

2.4. Data Processing and Model Equations

Data were processed in overlapping windows of 50 cycles for the turbofan data and 2560 samples for the bearing data. The model used a convolutional encoder and a temporal prediction block with a residual correction term. The predicted RUL at time t was calculated as [13]:

{\hat{R}}_{t} = f_{θ} (X_{t}) + λ (R_{p h y, t} - f_{θ} (X_{t}))

where

f_{θ} (X_{t})

is the learned output,

R_{phy, t}

is the value from an exponential degradation model, and

λ

controls the correction strength. The total loss was defined as [14]:

L = \frac{1}{N} \sum_{t = 1}^{N} ({\hat{R}}_{t} - R_{t})^{2} + β (1 - S_{v})

where

S_{v}

is the spike validation output, and

β

adjusts the influence of false detections. Training used PyTorch 2.0 with the Adam optimizer on an NVIDIA RTX 4090 GPU.

2.5. Model Verification and Reproducibility

Each experiment was repeated with three random seeds, and the mean and standard deviation of each metric were reported. Sensitivity tests were done for kernel size, learning rate, and residual weight

λ

. All runs were checked for consistent results, and any deviation above 5% from the mean was re-evaluated. The model code and configuration files were shared through GitHub for open verification.

3. Results and Discussion

3.1. Spike Detection and Early Warning Response

The spike detection module successfully separated short, high-energy bursts that appeared near fault onset while reducing random noise. Compared with a simple band-pass filter, the method kept about 80% of meaningful onset spikes and lowered false detection by 12–14%. The early-warning score improved by 18%, showing that preserving these transient bursts supports faster recognition of early degradation [15]. The effect was clear on bearing data, where vibration spikes often signal surface cracks. A comparable visualization of time-domain feature extraction in turbofan datasets is shown in Figure 1,

3.2. Forecasting Accuracy and Model Comparison

Across both datasets, the spike-guided model reduced RMSE by 14–21% and MAE by 11–16% compared with common baselines such as LSTM, GRU, and CNN–TCN. Accuracy remained stable under different load conditions, while false alarms decreased. In test cases with abrupt changes, the model maintained smooth degradation curves and avoided sharp errors caused by data drift [16,17]. The dual-branch structure helped align physical constraints with data-driven forecasts, which improved prediction stability. An illustration of the complete RUL evaluation workflow is shown in Figure 2,

3.3. Effect of Residual Path and Spike Validation

When the residual path was removed, RMSE increased by nearly 9%, and prediction delay reached about one cycle. Excluding the spike validation step caused 13% more false alarms and irregular prediction updates near regime changes. Using both modules together gave the most consistent results. The residual term corrected small trend drifts during stable periods, and the validation block reduced overreaction to sudden peaks [18,19]. This outcome suggests that the combination of simple signal validation and physical correction is more reliable than pure learning-based smoothing.

3.4. Stability, Limitations and Application Insights

The method stayed stable across noise levels up to +3 dB, and performance variation under parameter changes was within ±2%. However, datasets with repeated shocks that were not linked to wear showed smaller improvements because many short bursts were retained unnecessarily. In early operation stages, when degradation is weak, the system added little benefit. These results show that the model works best when spike frequency grows with damage, not when spikes are random. In future work, the spike filter can be combined with adaptive weighting to reduce overreaction to unrelated bursts, and further tests on large industrial systems will help confirm robustness.

4. Conclusion

This study presented a spike-guided, multi-stage method for remaining useful life prediction that joins data-based learning with a simple physical update. The model kept short, high-energy bursts linked to early wear that are often removed by smoothing. On turbofan and bearing data, RMSE fell by 14–21%, and the early-warning score increased by about 18% compared with common deep models. The spike check step lowered false alarms and made the predicted trend steadier when system conditions changed. These results show that short bursts carry useful signs of early faults and that adding physical rules helps produce clearer and steadier forecasts. The approach can help plan maintenance in machines where brief vibration surges point to material stress. Still, its benefit drops when data contain many random shocks or few signs of gradual damage. Later work will test more equipment types, combine more sensors, and add ways to measure long-term reliability under field use.

References

Ali, Z. M.; Ćalasan, M.; Jurado, F.; Aleem, S. H. A. Complexities of power quality and harmonic-induced overheating in modern power grids studies: Challenges and solutions; IEEE Access, 2024. [Google Scholar]
Imran, M. M. A.; Che Idris, A.; De Silva, L. C.; Kim, Y. B.; Abas, P. E. Advancements in 3D printing: directed energy deposition techniques, defect analysis, and quality monitoring. Technologies 2024, 12(6), 86. [Google Scholar] [CrossRef]
Khan, A.; Malik, K. M.; Ryan, J.; Saravanan, M. Voice spoofing countermeasures: Taxonomy, state-of-the-art, experimental analysis of generalizability, open challenges, and the way forward. arXiv 2022, arXiv:2210.00417. [Google Scholar]
Gao, Z.; Qu, Y.; Han, Y. Cross-Lingual Sponsored Search via Dual-Encoder and Graph Neural Networks for Context-Aware Query Translation in Advertising Platforms. arXiv 2025, arXiv:2510.22957. [Google Scholar]
Ardelean, E. R.; Coporîie, A.; Ichim, A. M.; Dînșoreanu, M.; Mureșan, R. C. A study of autoencoders as a feature extraction technique for spike sorting. Plos one 2023, 18(3), e0282810. [Google Scholar] [CrossRef] [PubMed]
Fan, J.; Liang, W.; Zhang, W. Q. SARNet: A Spike-Aware consecutive validation Framework for Accurate Remaining Useful Life Prediction. arXiv 2025, arXiv:2510.22955. [Google Scholar] [CrossRef]
Hwang, J. H.; Balazinska, M.; Rasin, A.; Cetintemel, U.; Stonebraker, M.; Zdonik, S. High-availability algorithms for distributed stream processing. In 21st International Conference on Data Engineering (ICDE'05); IEEE, April 2005; pp. 779–790. [Google Scholar]
Jin, J.; Su, Y.; Zhu, X. SmartMLOps Studio: Design of an LLM-Integrated IDE with Automated MLOps Pipelines for Model Development and Monitoring. arXiv 2025, arXiv:2511.01850. [Google Scholar]
Bury, T. M.; Sujith, R. I.; Pavithran, I.; Scheffer, M.; Lenton, T. M.; Anand, M.; Bauch, C. T. Deep learning for early warning signals of tipping points. Proceedings of the National Academy of Sciences 2021, 118(39), e2106140118. [Google Scholar] [CrossRef] [PubMed]
Yin, Z.; Chen, X.; Zhang, X. AI-Integrated Decision Support System for Real-Time Market Growth Forecasting and Multi-Source Content Diffusion Analytics. arXiv 2025, arXiv:2511.09962. [Google Scholar]
Murdoch, W. J.; Singh, C.; Kumbier, K.; Abbasi-Asl, R.; Yu, B. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences 2019, 116(44), 22071–22080. [Google Scholar] [CrossRef] [PubMed]
Yuan, M.; Qin, W.; Huang, J.; Han, Z. A Robotic Digital Construction Workflow for Puzzle-Assembled Freeform Architectural Components Using Castable Sustainable Materials. Available at SSRN 5452174.
Chen, F.; Yue, L.; Xu, P.; Liang, H.; Li, S. Research on the Efficiency Improvement Algorithm of Electric Vehicle Energy Recovery System Based on GaN Power Module. 2025. [Google Scholar]
Liang, R.; Feifan, F. N. U.; Liang, Y.; Ye, Z. Emotion-Aware Interface Adaptation in Mobile Applications Based on Color Psychology and Multimodal User State Recognition. Frontiers in Artificial Intelligence Research 2025, 2(1), 51–57. [Google Scholar] [CrossRef]
Wu, C.; Zhang, F.; Chen, H.; Zhu, J. Design and optimization of low power persistent logging system based on embedded Linux. 2025. [Google Scholar]
Wang, G.; Qin, F.; Liu, H.; Tao, Y.; Zhang, Y.; Zhang, Y. J.; Yao, L. MorphingCircuit: An integrated design, simulation, and fabrication workflow for self-morphing electronics. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2020, 4(4), 1–26. [Google Scholar] [CrossRef]
Hu, W.; Huo, Z. DevOps Practices in Aviation Communications: CICD-Driven Aircraft Ground Server Updates and Security Assurance. 2025 5th International Conference on Mechatronics Technology and Aerospace Engineering (ICMTAE 2025), July; 2025. [Google Scholar]
Patchipala, S. Tackling data and model drift in AI: Strategies for maintaining accuracy during ML model inference. International Journal of Science and Research Archive 2023, 10(2), 1198–1209. [Google Scholar] [CrossRef]
Tian, Y.; Yang, Z.; Liu, C.; Su, Y.; Hong, Z.; Gong, Z.; Xu, J. CenterMamba-SAM: Center-Prioritized Scanning and Temporal Prototypes for Brain Lesion Segmentation. arXiv 2025, arXiv:2511.01243. [Google Scholar]

Figure 1. Short vibration spikes near the first stage of equipment wear.

Figure 2. Comparison of remaining-life prediction accuracy between baseline and spike-guided models.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.