Submitted:
12 August 2025
Posted:
13 August 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A synthetic KPI-based dataset generation pipeline tailored for modeling temporal degradation in TSN networks.
- A bidirectional LSTM model that accurately anticipates faults using time-series KPIs.
- A predictive FRER activation and deactivation mechanism integrated with the BiLSTM inference output.
- A complete runtime environment including a simulator and real-time dashboard for system state visualization and performance evaluation.
- A comprehensive evaluation of the framework in terms of redundancy efficiency, fault anticipation accuracy, and link utilization improvement.
2. Related Work
2.1. Frame Replication and Elimination for Reliability in TSN
2.2. Fault Detection and Anticipation in Industrial Networks
2.3. ML for TSN and Resilient Networking
2.4. Gaps in Literature
- Monitoring Module: Continuously collects real-time network KPIs, such as jitter, retransmission counts, latency, and congestion indicators, forming the basis for fault evaluation.
- Fault Score Module: Implements a trained bidirectional LSTM model that transforms historical KPI sequences into a fault likelihood score, estimating the probability of imminent failure on a given path.
- FRER Controller: Acts on the fault score to selectively activate or deactivate FRER for critical traffic streams, applying redundancy only when needed to reduce overhead during stable conditions.
- Safe Window Timer: Enforces a minimum protection duration once FRER is triggered, avoiding oscillations and ensuring redundancy persists through transient disturbances. Upon timer expiry and KPI recovery, FRER is withdrawn.
3. System Overview
4. Fault Score Modeling Using BiLSTM
4.1. KPI Feature Representation and Labeling
- normal: No degradation or fault signatures present.
- degraded: Pre-fault behavior caused by injected fault conditions.
- recovery: Post-fault window where KPIs return to nominal levels.
4.2. Time-Series Windowing and Sequence Construction
- is a sequence of KPI vectors comprising F features,
- is the class label of the final time step in the sequence, representing the next system state.
4.3. BiLSTM Model Architecture
- Stacked BiLSTM Layers: Two layers of bidirectional LSTM units with hidden size . These layers process the input sequence in both forward and backward directions to capture anticipatory and residual fault patterns. Dropout is applied between layers to mitigate overfitting.
- Feature Aggregation: The final hidden states from both directions of the last BiLSTM layer are concatenated:
-
Output Layer: A fully connected layer maps the aggregated representation to a 3-class logit vector:, where , and are learnable parameters.
- Softmax Prediction: During inference, the predicted class is determined by:
4.4. Training Procedure and Loss Weighting
4.5. Fault Scoring in Simulation
4.6. Runtime Inference Integration
5. Predictive FRER Control Logic
5.1. State Machine
- Normal (S1): The default state. No redundancy is applied; the system operates under nominal conditions.
- FRER Active (S2): Entered when sustained degradation is detected. Redundant transmission is enabled for critical traffic flows.
- Recovery Hold (S3): A transitional state entered during recovery, where FRER remains active for a fixed duration to ensure network stability before deactivation.
- Normal → FRER Active (T1): Triggered when the model predicts the degraded state for at least DEGRADATION_CONFIRM_WINDOW consecutive steps, confirming that the fault condition is persistent rather than transient.
- FRER Active → Recovery Hold(T2): Triggered when the model no longer predicts degraded. A fault_free_steps counter is started to monitor the duration of fault-free operation.
- Recovery Hold → Normal(T3): Triggered when both of the following conditions are met: (i) The system has remained in a fault-free state for at least RECOVERY_OVERRIDE_WINDOW steps, and (ii) FRER has remained active for at least SAFE_WINDOW steps. These conditions jointly ensure that deactivation occurs only after sufficient stability has been observed.
- Recovery Hold → FRER Active (fallback) (T4): If the model predicts degraded again before exiting Recovery Hold, the FSM immediately transitions back to FRER Active and resets all timers and counters.
- Any State → FRER Active (T5): If a fault occurs during a cooldown period or early in Recovery Hold, the system overrides the current state and re-enters FRER Active.
5.2. Safe Window Timer
- The number of steps since FRER activation exceeds SAFE_WINDOW, and
- The model has predicted a non-degraded state for at least RECOVERY_OVERRIDE_WINDOW consecutive steps.
5.3. Activation/Deactivation Thresholds
- Activation is conservative, requiring persistent signs of degradation.
- Deactivation is cautious, contingent on both elapsed time and continued recovery.
- Cooldown adds robustness, suppressing premature re-entry into redundancy mode.
6. Implementation and Runtime Integration
6.1. Python-Based Simulation
6.2. Runtime Inference Integration
6.3. Live Logging and Interpretability
- Fault anticipation accuracy (true positives vs. false alarms),
- FRER activation delay (latency between fault onset and redundancy enablement),
- Redundancy overhead vs. link utilization improvement.
7. Evaluation and Results
- FRER activation delay: Time elapsed between fault injection and redundancy activation.
- False positives and missed protections: Instances where FRER was unnecessarily triggered or failed to activate during a fault.
- Redundancy duty cycle: The percentage of simulation time during which FRER was active.
- Link utilization: Measured improvement in effective bandwidth usage due to adaptive redundancy control.
- Fault coverage: The proportion of injected faults that were successfully mitigated by timely FRER activation.
7.1. Experimental Setup
- Degradation confirmation window: 3 steps
- Safe activation window: 10 steps
- Recovery override window: 5 steps
- Cooldown period after FRER deactivation: 1 step
7.2. Evaluation Scenarios
7.3. Activation Timing and Protection Accuracy
- True Positives (TP): Correct FRER activation before or during a fault event.
- False Positives (FP): Redundancy activated in the absence of a fault.
- False Negatives (FN): Missed opportunities where no redundancy was applied during a fault.
| Scenario | TP | FP | FN | Avg. AT |
|---|---|---|---|---|
| No Faults | 0% | 2.3% | 0% | N/A |
| Rare Faults | 93.7% | 4.8% | 1.5% | -0.4 |
| Base Faults | 96.2% | 3.1% | 0.7% | -0.8 |
| Complex Faults | 95.4% | 4.2% | 0.9% | -0.9 |
7.4. Redundancy Efficiency and Link Utilization
7.5. Parameter Sensitivity
7.6. Summary of Findings
- Redundancy efficiency: Reduces FRER activation time by up to 97.7% in fault-free scenarios, and by over 60% under moderate fault conditions.
- Proactive protection: Activates FRER with an average lead time of 0.8 simulation steps prior to fault injection, enabling preemptive fault mitigation.
- High fault coverage: Maintains >95% protection coverage across all degraded scenarios, with negligible false negatives.
- Low false activation rate: Minimizes unnecessary redundancy, even under transient or noisy conditions, demonstrating high model specificity.
- Improved link utilization: Significantly enhances bandwidth efficiency, particularly in environments with sparse or intermittent faults.
8. Discussion
8.1. Interpretation of Results
8.2. Trade-Off Analysis
8.3. Generalizability
8.4. Limitations
8.5. Future Considerations
9. Conclusion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| TSN | Time-Sensitive Networking |
| BiLSTM | Bidirectional Long Short-Term Memory |
| FRER | Frame Replication and Elimination for Reliability |
| FSM | Finite State Machine |
| KPI | Key Performance Indicator |
| SDN | Software-Defined Networking |
| TP | True Positive |
| FP | False Positive |
| FN | False Negative |
| MC | Monte Carlo |
Appendix A. KPI Feature Set Used for Model Input
| Feature | Type | Description |
|---|---|---|
| Latency (ms) | Float | End-to-end delay |
| Jitter (ms) | Float | Inter-arrival variation |
| Packet loss ratio | Float | Loss fraction over window |
| Bit error rate | Float | Raw BER observed on path |
| Retransmissions | Int | Count of retransmitted frames |
| Queue delay (ms) | Float | Time in buffer queue |
| Congestion flag | Bool | Indicates queuing congestion |
| CRC failure | Bool | True if frame failed CRC check |
| Link failure | Bool | True if link reported fault |
| Switch failure | Bool | True if switch experienced outage |
| Sync offset (ns) | Float | Clock offset from reference |
| Sync jitter (ns) | Float | Timebase deviation (stability) |
| Sync state | Categorical | Clock status (e.g., locked, lost) |
References
- Tianyu Zhang, Gang Wang, Chuanyu Xue, Jiachen Wang, Mark Nixon, and Song Han. 2024. Time-Sensitive Networking (TSN) for Industrial Automation: Current Advances and Future Directions. ACM Comput. Surv. 57, 2, Article 30 (February 2025), 38 pages. [CrossRef]
- IEEE Standard for Local and metropolitan area networks–Frame Replication and Elimination for Reliability, in IEEE Std 802.1CB-2017 , vol., no., pp.1-102, 27 Oct. 2017. [CrossRef]
- Ahmed Nasrallah, Akhilesh S. Thyagaturu, Ziyad Alharbi, Cuixiang Wang, Xing Shao, and Martin Reisslein, "Ultra-Low Latency (ULL) Networks: The IEEE TSN and IETF DetNet Standards and Related 5G ULL Research," in IEEE Communications Surveys and Tutorials, vol. 21, no. 1, pp. 88-145, Firstquarter 2019. [CrossRef]
- L. Maile, D. Voitlein, K. -S. Hielscher and R. German, "Ensuring Reliable and Predictable Behavior of IEEE 802.1CB Frame Replication and Elimination," ICC 2022 - IEEE International Conference on Communications, Seoul, Korea, Republic of, 2022, pp. 2706-2712. [CrossRef]
- Seliem, M., Pesch, D., Roedig, U., and Sreenan, C. (2025). Resilient Time-Sensitive Networking for Industrial IoT: Configuration and Fault-Tolerance Evaluation. arXiv preprint arXiv:2507.11250. [CrossRef]
- K. Zanbouri, M. Noor-A-Rahim, J. John, C. J. Sreenan, H. V. Poor and D. Pesch, "A Comprehensive Survey of Wireless Time-Sensitive Networking (TSN): Architecture, Technologies, Applications, and Open Issues," in IEEE Communications Surveys and Tutorials. [CrossRef]
- Hu, S.; Cai, Y.; Wang, S.; Han, X. Enhanced FRER Mechanism in Time-Sensitive Networking for Reliable Edge Computing. Sensors 2024, 24, 1738. [CrossRef]
- K. Murphy, A. Lavignotte and C. Lepers, "Fault Prediction for Heterogeneous Telecommunication Networks Using Machine Learning: A Survey," in IEEE Transactions on Network and Service Management, vol. 21, no. 2, pp. 2515-2538, April 2024. [CrossRef]
- M. Ye, Z. Wang and F. Li, "Research on Time Series Anomaly Detection Algorithm Based on Transformer Coupled with GAN," 2024 IEEE 12th International Conference on Information, Communication and Networks (ICICN), Guilin, China, 2024, pp. 395-401. [CrossRef]
- Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., and Zhang, W. (2021, May). Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 12, pp. 11106-11115).
- Caetano, R.; Oliveira, J.M.; Ramos, P. Transformer-Based Models for Probabilistic Time Series Forecasting with Explanatory Variables. Mathematics 2025, 13, 814. [CrossRef]
- D. L. Marino, C. S. Wickramasinghe, C. Rieger and M. Manic, "Self-Supervised and Interpretable Anomaly Detection Using Network Transformers," in IEEE Transactions on Industrial Informatics, vol. 21, no. 5, pp. 4252-4261, May 2025. [CrossRef]
- Lingqiang Xie, Dechang Pi, Xiangyan Zhang, Junfu Chen, Yi Luo, Wen Yu, Graph neural network approach for anomaly detection, Measurement, Volume 180, 2021, 109546, ISSN 0263-2241. [CrossRef]
- Y. Wu, H. -N. Dai and H. Tang, "Graph Neural Networks for Anomaly Detection in Industrial Internet of Things," in IEEE Internet of Things Journal, vol. 9, no. 12, pp. 9214-9231, 15 June15, 2022. [CrossRef]
- "IEEE Standard for Local and metropolitan area networks – Frame Replication and Elimination for Reliability - Amendment 1: Information Model, YANG Data Model, and Management Information Base Module," in IEEE Std 802.1CBcv-2021 (Amendment to IEEE Std 802.1CB-2017) , vol., no., pp.1-157, 18 Feb. 2022. [CrossRef]
- "IEEE Standard for Local and metropolitan area networks–Frame Replication and Elimination for Reliability Amendment 2: Extended Stream Identification Functions," in IEEE Std 802.1CBdb-2021 (Amendment to IEEE Std 802.1CB-2017 as amended by IEEE Std 802.1CBcv-2021) , vol., no., pp.1-90, 23 March 2022. [CrossRef]
- Danielis, P., Parzyjegla, H., Mühl, G., Schweissguth, E., and Timmermann, D. (2021). Frame replication and elimination for reliability in time-sensitive networks. arXiv preprint arXiv:2109.13677. [CrossRef]
- A. Aijaz, "5G Replicates TSN: Extending IEEE 802.1CB Capabilities to Integrated 5G/TSN Systems," 2024 IEEE Conference on Standards for Communications and Networking (CSCN), Belgrade, Serbia, 2024, pp. 108-112. [CrossRef]
- Fejes, F., Orosi, F., Varga, B., and Farkas, J. (2023). Lightweight implementation of per-packet service protection in ebpf/xdp. arXiv preprint arXiv:2312.07152. [CrossRef]
- Klea Elmazi, Donald Elmazi, and Jonatan Lerga, A Survey on Fault Detection in Industrial IoT: A Machine Learning Approach with Emphasis on Federated Learning and Intrusion Detection Systems, 24 June 2024, PREPRINT (Version 1) available at Research Square. [CrossRef]
- Leite, D.; Andrade, E.; Rativa, D.; Maciel, A.M.A. Fault Detection and Diagnosis in Industry 4.0: A Review on Challenges and Opportunities. Sensors 2025, 25, 60. [CrossRef]
- B. S. Panigrahi, T. T, M. Tamilselvi, S. B. G. Tilak Babu, P. G and B. Shaik, "Deep Learning Techniques for Fault Detection in Industrial Machinery," 2024 5th International Conference on Recent Trends in Computer Science and Technology (ICRTCST), Jamshedpur, India, 2024, pp. 221-226. [CrossRef]
- Shaalan, A.A., Mefteh, W. and Frihida, A.M. Review on deep learning classifiers for faults diagnosis of rotating industrial machinery. SOCA 18, 361–379 (2024). [CrossRef]
- Guo, L.; Li, R.; Jiang, B. A Data-Driven Long Time-Series Electrical Line Trip Fault Prediction Method Using an Improved Stacked-Informer Network. Sensors 2021, 21, 4466. [CrossRef]
- Matos-Carvalho, J. P., Stefenon, S. F., Leithardt, V. R. Q., and Yow, K. C. (2025). Time series forecasting based on optimized LLM for fault prediction in distribution power grid insulators. arXiv preprint arXiv:2502.17341. [CrossRef]
- El Mrabet, Z.; Sugunaraj, N.; Ranganathan, P.; Abhyankar, S. Random Forest Regressor-Based Approach for Detecting Fault Location and Duration in Power Systems. Sensors 2022, 22, 458. [CrossRef]
- E, G.; Gao, H.; Lu, Y.; Zheng, X.; Ding, X.; Yang, Y. A Novel Attention Temporal Convolutional Network for Transmission Line Fault Diagnosis via Comprehensive Feature Extraction. Energies 2023, 16, 7105. [CrossRef]
- H. Khorasgani, A. Hasanzadeh, A. Farahat and C. Gupta, "Fault Detection and Isolation in Industrial Networks using Graph Convolutional Neural Networks," 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), San Francisco, CA, USA, 2019, pp. 1-7. [CrossRef]
- N. Desai and S. Punnekkat, "Enhancing Fault Detection in Time Sensitive Networks using Machine Learning," 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS), Bengaluru, India, 2020, pp. 714-719. [CrossRef]
- Khorram, A., Khalooei, M. and Rezghi, M. End-to-end CNN + LSTM deep learning approach for bearing fault diagnosis. Appl Intell 51, 736–751 (2021). [CrossRef]
- Cassano, F.; Crespino, A.M.; Lazoi, M.; Specchia, G.; Spennato, A. An EWS-LSTM-Based Deep Learning Early Warning System for Industrial Machine Fault Prediction. Appl. Sci. 2025, 15, 4013. [CrossRef]
- Xin Wang, Zhijun Shang, Changqing Xia, Shijie Cui, Shuai Shao, and Deepak Kumar Jain. 2021. TSN Switch Queue Length Prediction Based on an Improved LSTM Network. Wirel. Commun. Mob. Comput. 2021 (2021). [CrossRef]
- Benjamin Lindemann, Benjamin Maschler, Nada Sahlab, Michael Weyrich, A survey on anomaly detection for technical systems using LSTM networks,Computers in Industry,Volume 131, 2021,103498,ISSN 0166-3615. [CrossRef]
- Govind Vashishtha, Sumika Chauhan, Mert Sehri, Radoslaw Zimroz, Patrick Dumond, Rajesh Kumar, Munish Kumar Gupta, A roadmap to fault diagnosis of industrial machines via machine learning: A brief review, Measurement,Volume 242, Part D,2025,116216,ISSN 0263-2241. [CrossRef]
- Je-Gal, H.; Park, Y.-S.; Park, S.-H.; Kim, J.-U.; Yang, J.-H.; Kim, S.; Lee, H.-S. Time-Series Explanatory Fault Prediction Framework for Marine Main Engine Using Explainable Artificial Intelligence. J. Mar. Sci. Eng. 2024, 12, 1296. [CrossRef]
- Rahman Dashti, Mohammad Daisy, Hamid Mirshekali, Hamid Reza Shaker, Mahmood Hosseini Aliabadi, A survey of fault prediction and location methods in electrical energy distribution networks, Measurement, Volume 184, 2021, 109947, ISSN 0263-2241. [CrossRef]
- Filonov, P., Lavrentyev, A., and Vorontsov, A. (2016). Multivariate industrial time series with cyber-attack simulation: Fault detection using an lstm-based predictive data model. arXiv preprint arXiv:1612.06676. [CrossRef]
- Mihale-Wilson, Cristina; Cordes, Anjana; and Lowin, Maximilian, "Synthetic Data Generation for Predictive Maintenance Services" (2025). SIG SVC Pre-ICIS Workshop 2024. 5. https://aisel.aisnet.org/sprouts_proceedings_sigsvc_2024/5.
- M. Seliem, A. Zahran and D. Pesch, "TSN-based Industrial Network Performance Analysis," 2022 IEEE 8th World Forum on Internet of Things (WF-IoT), Yokohama, Japan, 2022, pp. 1-7. [CrossRef]
- Guo, M.; Shou, G.; Liu, Y.; Hu, Y. Software-Defined Time-Sensitive Networking for Cross-Domain Deterministic Transmission. Electronics 2024, 13, 1246. [CrossRef]
- M. Seliem and D. Pesch, "Software-Defined Time Sensitive Networks (SD-TSN) for Industrial Automation," 2022 14th International Conference on Computational Intelligence and Communication Networks (CICN), Al-Khobar, Saudi Arabia, 2022, pp. 1-7. [CrossRef]
- A. Dehlaghi-Ghadim, M. H. Moghadam, A. Balador and H. Hansson, "Anomaly Detection Dataset for Industrial Control Systems," in IEEE Access, vol. 11, pp. 107982-107996, 2023. [CrossRef]
- Kehl, P.E.; Ansari, J.; Lovrin, M.; Mohanram, P.; Liu, C.-C.; Yeh, J.-L.; Schmitt, R.H. 5G-TSN Integrated Prototype for Reliable Industrial Communication Using Frame Replication and Elimination for Reliability. Electronics 2025, 14, 758. [CrossRef]







| Scenario | Fault Rate | Description |
|---|---|---|
| No Faults | 0% | Baseline case with fully stable network conditions. Used to assess false activations and measure unnecessary redundancy under static FRER. |
| Rare Faults | Low | Infrequent and short-lived degradation events. Predictive FRER is expected to remain mostly inactive unless faults persist. |
| Base Faults | Moderate | Regular transient faults representing common industrial conditions such as periodic congestion or wireless interference. Requires active FRER engagement and correct recovery logic. |
| Complex Faults | High/Bursty | Overlapping or closely spaced faults. Designed to test safe window enforcement, reactivation stability, and fault recovery tracking. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).