Submitted:
03 December 2025
Posted:
22 December 2025
You are already at the latest version
Abstract
Accurately distinguishing true hardware failures from false alarms is a critical requirement in large-scale optical networks, where unnecessary Return Material Authorizations (RMAs) result in significant operational and financial overhead. This paper presents a novel AI-driven predictive framework that integrates multi-domain telemetry fusion, Transformer-based temporal modeling, and a domain-aware hybrid ensemble to deliver carrier-grade hardware failure detection in optical embedded systems. Unlike prior works that rely on single-sensor or threshold-based diagnostics, the proposed approach jointly analyzes optical power fluctuations, laser bias-current drift, TEC thermal instability, voltage dynamics, and DSP-layer soft metrics, enabling the model to capture degradation signatures that emerge only through cross-sensor interactions. A customized ensemble combining Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN)-LSTM, and TimeSeriesBERT is introduced to fuse complementary pattern-recognition capabilities--including long-term drift modeling, high-frequency anomaly detection, and global multi-sensor attention--resulting in superior robustness and generalization. Evaluation of real-time telemetry from optical devices demonstrates the effectiveness of the proposed system, achieving high accuracy with a high F1-score and significantly reducing unnecessary RMAs. These results highlight the novelty and practical value of the presented framework, establishing it as the first comprehensive AI solution tailored for reliable hardware-failure prediction in optical embedded systems.