Deep Learning-Based Adaptive Sensor Fusion for Real-Time Control and Fault-Tolerant Automation in IoT Systems

Saio Alusine Marrah; Jiahao Wang; John Idriss Lahai; Gibrilla Deen Kamara; Ryvel Timothy Stamber; Koroma Abu Bakarr; Ologun Sodiq Babatunde; Mabel Ernestine Cole

doi:10.20944/preprints202601.2301.v1

Submitted:

27 January 2026

Posted:

29 January 2026

Read the latest preprint version here

Abstract

This paper presents a deep learning-based adaptive sensor fusion framework for re-al-time control and fault-tolerant automation in Industrial IoT systems. The core of the framework is an attention-based CNN-Transformer model that dynamically fuses het-erogeneous sensor streams; its interpretable weighting signals are leveraged directly for fault detection and to inform a supervisory control policy. By dynamically weighting multiple heterogeneous sensor streams using an attention-based CNN-Transformer architecture, the proposed method reduces estimation error under noisy and fault-prone conditions, and seamlessly integrates with a closed-loop controller that adjusts to detected faults through a stability-aware supervisory policy. Experiments on synthetic IIoT data with injected transient faults demonstrate significant improvements in fusion accuracy (RMSE: 0.049 ± 0.003 vs 0.118 ± 0.008 for Kalman filter, p < 0.001), faster fault detection (F1-score: 0.89 ± 0.02) and recovery (1.1 ± 0.2 seconds), and hard real-time performance suitable for edge deployment (99th percentile latency: 58ms). The results show that the proposed approach outperforms classical baselines in terms of RMSE, detection F1-score, recovery time, and latency trade-offs. This work contributes to more reliable, adaptive automation in industrial settings with minimal manual tuning and empirical stability validation.

Keywords:

industrial internet of things

;

adaptive sensor fusion

;

fault-tolerant control

;

deep learning

;

explainable AI

;

real-time systems

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Industrial Internet of Things (IIoT) systems integrate large numbers of heterogeneous sensors, actuators, gateways, and controllers to enable automated monitoring and control across manufacturing, energy, and logistics domains [1]. Modern IIoT deployments demand high data quality, low-latency decision-making, and resilience to both hardware and communication failures; achieving goals simultaneously is challenging because sensor noise, divergent sampling rates, and network disruptions de-grade situational awareness and control performance [2]. This combination of strict performance and reliability requirements motivates research into advanced signal-processing and learning approaches that can fuse noisy multi-sensor streams into accurate estimates for downstream control and fault management. Sensor fusion com-bines observations from multiple heterogeneous sensors to produce estimates that are more accurate, complete, or robust than any single sensor alone [3]. In IIoT contexts, fusion reduces uncertainty, mitigates single-sensor failures, and enables redundancy-aware perception that supports predictive maintenance and closed-loop control. Re-cent surveys emphasize that sensor fusion serves as a core enabling layer for intelligent industrial systems because it converts raw device telemetry into higher-value features used for anomaly detection, state estimation, and control decisions [4].

Real-time fault-tolerant control in IIoT must reconcile competing constraints: stringent latency/jitter budgets, constrained compute (edge devices), and the need to detect and isolate faults quickly to avoid unsafe system states [5]. Communication technologies such as Time-Sensitive Networking (TSN) and software-defined IIoT architectures help meet timing and reliability targets, but they introduce complexity in fault evaluation and require integrated approaches that combine networking guaran-tees with intelligent fault diagnosis and controller adaptation. Furthermore, simultaneous or cascading faults (sensor + actuator + network) present especially difficult detection and recovery scenarios that conventional model-based controllers struggle to handle without adaptive or learning-based extensions.

Deep learning offers powerful tools for adaptive, data-driven fusion, as neural ar-chitectures (CNNs, LSTMs, Transformers, and attention mechanisms) can learn hierar-chical spatiotemporal features directly from raw or minimally processed sensor streams [6]. Surveys of multimodal and multimodal-fusion deep-learning show that learned fu-sion often outperforms classical fusion rules when data are nonlinear, heterogeneous, or exhibit context-dependent noise characteristics; moreover, attention-based and transformer-style models provide interpretable weightings that can be exploited for adaptive sensor weighting and fault sensitivity analysis. These properties make deep models attractive for IIoT fusion especially when models are designed with resource awareness for edge deployment or when they are hybridized with model-based estimators for safety-critical control.

Despite progress in both IIoT networking and deep learning, a gap remains: how to build an adaptive sensor-fusion pipeline that (i) performs reliably in real time under variable noise and transient sensor faults, (ii) yields explainable fusion weights or attention signals usable by the control layer, (iii) integrates seamlessly with a fault-tolerant control loop that meets industrial latency and safety requirements, and (iv) integrates with empirical stability validation [7]. Existing deep-fusion proposals often ad-dress perception accuracy or anomaly detection but do not jointly optimize fusion, fault isolation, and closed-loop control under strict IIoT operational constraints; this work targets that joint problem.

1.1. Contributions and Paper Organization

The primary contribution of this work is a unified, explainable, adaptive sensor fusion framework that directly enables fault-tolerant control in IIoT systems. The core innovation is an attention-based CNN-Transformer fusion model whose interpretable, time-varying sensor weights serve a dual purpose: (1) generating robust state estimates, and (2) providing a direct, real-time signal for fault detection and severity estimation that informs a supervisory control policy. This tight coupling of explainable fusion and control reconfiguration forms a unified architecture for resilient IIoT automation

This central contribution is substantiated and validated through several key supporting elements:

1.1.1. Joint Optimization Framework

A multi-task learning objective that simultaneously trains the fusion model for accurate estimation and effective fault classification, ensuring the attention weights are dicriminative for both tasks.

1.1.2. Empirical Stability Analysis

The supervisory control policy is designed to maintain system stability and is validated through frequency-domain analysis and empirical recovery-time evaluation. We provide frequency-domain validation and bounded recovery-time analysis under the proposed adaptation scheme.

1.1.3. Comprehensive Performance Validation

We demonstrate the framework's practicality through rigorous evaluation on synthetic IIoT data, including:

Statistically superior fusion accuracy and fault detection compared to classical base-lines.
Hard real-time performance profiling on edge hardware, confirming its feasibility for latency-sensitive control loops.
Effective handling of imbalanced fault data via established techniques, ensuring robust classifier training.

The paper is organized as follows: Section 2 reviews related work, focusing on the intersection of learning-based fusion and fault-tolerant control. Section 3 details the pro-posed methodology, with emphasis on the fusion architecture and its integration with the control loop. Section 4 presents experimental results, structured to validate the core contribution and its supporting elements. Section 5 concludes the paper.

2. Literature Review

Industrial internet-of- things (IIoT) architecture have evolved into layered ecosystems that balance latency, reliability, security, and scalability by distributing functionality across edge, fog, and cloud tiers [8]. Modern IIoT reference designs place time-critical sensing and actuation close to the edge (sensors and gateways), while fog nodes pro-vide intermediate aggregation, local analytics, and orchestration; cloud platforms re-tain long-term storage, heavy model training, and system-wide coordination [9]. This layered approach is motivated by strict industrial latency constraints, the need for local fail-safe behaviors, and operational concerns such as bandwidth, privacy, and energy consumption. Recent surveys highlight not only the canonical edge–fog–cloud split but also the role of middleware, real-time networking protocols (e.g., MQTT, TSN variants), and containerized edge services in enabling deployable IIoT solutions [10].

Sensor fusion in industrial contexts spans a continuum from classical model-based estimators (e.g., Kalman filters, extended Kalman filters, unscented Kalman filters, and particle filters) to modern data-driven and deep learning approaches [11]. Traditional analytical techniques valued for their theoretical guarantees and low computational footprint are widely used for state estimation and sensor redundancy management; they operate at different fusion levels (data-level, feature-level, decision-level) and are often the first choice for linear or near-linear problems [12]. However, in complex or nonlinear environments with multimodal sensors, non-Gaussian noise, or con-text-dependent failures, learning-based fusion (including neural networks, CNN/LSTM hybrids, and attention/transformer models) can capture cross-sensor dependencies and nonlinearities that classical filters miss. Reviews and recent comparative studies emphasize hybrid designs that combine analytical filters (for stability and interpretability) with learned modules (for adaptability and representational power), particularly when deployed on edge/fog tiers with resource constraints [13]. Despite extensive research on deep sensor fusion, fault diagnosis, and intelligent control, while numerous studies address individual aspects of sensor fusion, fault diagnosis, or intelligent control, relatively few works present end-to-end frameworks that jointly integrate adaptive fusion, explainability, and fault-tolerant control under hard r that jointly address adaptive fusion, explainability, fault-tolerant control, and hard real-time industrial constraints. Moreover, most learning-based approaches lack explicit mechanisms for stability-aware control integration suitable for industrial certification. This work directly addresses these gaps by proposing an explainable fusion–control co-design validated under real-time and stability requirements.

Table 1. Sensor Specification and Preprocessing Parameters

Sensor	Modality	Sampling Rate	Preprocessing
S1	Vibration (accelerometer)	1kHz	High-pass (10 Hz), rolling std (100), cubic interpolation
S2	Temperature	1Hz	Moving median (5), z-score, zero order hold up sampling
S3	Pressure	100Hz	Low-pass (50 Hz), cubic spline interpolation
S4	Current	500Hz	Calibration offset, exponentiation smoothing (α = 0.3)

2.1. Explainable AI and Stability in Learning-Based Control

Recent advances in Explainable AI (XAI) for time-series processing have enabled more transparent deep learning models in safety-critical systems [21]. Attention mechanisms provide interpretable weightings that can be validated against physical failure modes, addressing the ““black-box”” nature of deep networks that often limits industrial adoption. However, the integration of these learned components with traditional control systems raises important stability concerns that must be addressed for certification in safety-critical applications [2] Hybrid approaches that combine analytical controllers with adaptive learning modules must preserve the stability regions defined by Lyapunov analysis. Several studies have explored bounded learning approaches where neural network outputs are constrained to maintain formal guarantees [23]. Our work builds upon these foundations by developing a supervisory control framework that bounds learned adaptations within predefined parameter ranges, ensuring the nominal controller’s stability guarantees are maintained while providing the adaptivity benefits of deep learning.

2.2. Traditional VS. Intelligent Control Approaches

Traditional control theory dominated by PID, linear- quadratic regulators, and mod-el predictive controllers remains the backbone of many industrial control loops due to its simplicity, well-understood stability properties, and certified performance under known dynamics. Yet, the increasing complexity of industrial processes, highly nonlinear behaviors, and the availability of large sensor streams have motivated intelligent control approaches that incorporate adaptive control, fuzzy logic, and machine learning. Intelligent controllers (including neural-network-based controllers, reinforcement learning policies, and adaptive PID variants) offer stronger performance in uncertain or time-varying environments but introduce challenges: higher computational cost, training data needs, and the difficulty of providing formal stability or safety guarantees [14] [17]. The literature reports hybrid strategies where intelligent modules augment or tune traditional controllers capitalizing on the reliability of model-based techniques while using learning for adaptation and for tasks (like fault-aware reconfiguration) that are hard to handle analytically.

2.3. Fault Detection and Tolerance Mechanisms

Fault detection, isolation, and recovery (FDIR/FDI/FTC) is a mature research area in control engineering and safety-critical systems, and it remains central for IIoT deployments that require high availability [18]. Classical fault-detection approaches rely on analytical redundancy (residual generation and thresholding), model-based observers, and statistical hypothesis tests; these are complemented by hardware redundancy and voting schemes in safety-critical installations. Active fault-tolerant control strategies extend detection with online controller reconfiguration or supervisory switching to preserve safe operation. More recent work integrates data-driven fault diagnosis (ma-chine-learning classifiers, anomaly detection on learned features) to detect subtle or composite faults that model-based residuals might miss. Reviews emphasize that the most robust systems use combinations of analytical redundancy, hardware redundancy, and learning- based monitors while also stressing the importance of fast isolation and bounded recovery times to maintain safety.

2.4. Deep Learning Applications in Control Systems

Deep learning has been applied across perception, predictive maintenance, and control in industrial systems. Architectures such as convolutional networks and recurrent networks (LSTM/GRU) are commonly used for time-series feature extraction and anomaly detection, while transformer-style attention models have recently been adapted for multivariate sensor fusion due to their flexible handling of temporal dependencies and modality-specific attention. In control-related tasks, deep models are used for system identification, soft sensor generation, predictive fault estimation, and as function approximators within adaptive control or reinforcement learning frameworks [19]. The literature shows promising performance gains, particularly in scenarios with complex nonlinear dynamics or multimodal inputs, but also warns about overfitting, the need for interpretability (e.g., attention or saliency as explanations), and the expense of on-device inference that motivates model compression and edge-aware architectures. Although many individual building blocks (edge-fog-cloud IIoT architectures, analytical estimators, learning- based fusion models, and FDI/FTC methods) are well-studied, open gaps remain at their intersection [20]. First, few studies present end-to-end solutions that jointly optimize adaptive fusion, fault diagnosis, and closed-loop control under hard real-time constraints typical of industrial settings. Second, there is a scarcity of approaches that provide both high adaptivity (to varying noise profiles and con-text-dependent sensor reliability) and formal assurances about control stability and bounded recovery, a critical requirement for industrial certification. Third, re-source-constrained deployments (edge nodes) require efficient model architectures and co-design between communication scheduling and inference placement, yet experimental evidence comparing trade-offs (latency vs. accuracy vs. energy) across realistic IIoT testbeds is limited.

Finally, explainability and operator trust remain challenges: black-box deep models often lack actionable diagnostics that control engineers can use for safe reconfiguration. Addressing these gaps calls for hybrid architectures that combine interpretable analytical components with adaptive deep modules and for careful evaluation on latency-, reliability, and safety-oriented metrics.

3. Methodology

3.1. Research Framework Overview

This study follows an end-to-end research framework that integrates sensor data ac-quisition, preprocessing, an adaptive deep learning–based fusion module, a fault detection and severity estimation block, and a closed-loop adaptive controller for real-time evaluation. The framework is implemented and profiled across edge and cloud tiers to analyze latency and deployment trade-offs, and it is validated using synthetic and logged IIoT traces that include injected faults and noise excursions to stress-test robustness. Sensor correlation matrix and noise characteristics analysis. (a) Cross-correlation between sensor modalities showing weak correlation (ρ < 0.3) motivating fusion approach. (b) Time-varying noise profiles demonstrating non-Gaussian characteristics require is illustrated in Figure 1.

3.2. Data Acquisition and Synchronization

The experimental setup employs four heterogeneous sensor modalities with dramati-cally different sampling rates: Vibration (1 kHz), Temperature (1 Hz), Pressure (100 Hz), and Current (500 Hz). This 1000:1 rate disparity presents significant synchronization challenges that we address through a multi-rate fusion framework. Data are resampled to the highest frequency (1 kHz) using cubic spline interpolation for continuous-valued sensors, and zero-order hold for discrete events. Temporal alignment is achieved through hardware-timestamped data acquisition with microsecond precision, ensuring that all sensor readings within a 1-ms window are treated as synchronous The temporal evolution of the adaptive fusion weights under nominal and fault conditions is illustrated in Figure 2.

The preprocessing pipeline includes:

Vibration (S1): High-pass filter (10 Hz cutoff), rolling standard deviation (100-sample window);
Temperature (S2): Moving median filter (5-sample window), z-score normalization;
Pressure (S3): Low-pass filter (50 Hz cutoff), cubic spline interpolation for missing samples;
Current (S4): Calibration offset correction, exponential smoothing (α = 0.3).

3.3. Adaptive Sensor Fusion Architecture

The core fusion module is a hybrid CNN-Transformer architecture with attention-based fusion. The network consists of three main components: (1) Convolutional blocks for local spatiotemporal feature extraction, (2) Transformer encoder for capturing long-term dependencies, and (3) Attention-based fusion layer that computes interpretable, time-varying weights.

3.3.1. Joint Optimization Framework

Let

X = x_{1 : T}^{(1)}, x_{1 : T}^{(2)}, \dots, x_{1 : T}^{(N)}

represent the multivariate time series input from N sensors over window length T. The fusion architecture is defined as:

Z^{(i)} = C N N_{θ} c (x_{1 : T}^{(i)}), \forall i \in 1, \dots, N,

(1)

H = {Transformer}_{θ_{t}} (Z^{(1)}, Z^{(2)}, . . ., Z^{(N)}),

(2)

e_{i} (t) = v^{⊤} \tan h (W_{h} h_{t} + {W_{x} x_{t}}^{(i)} + b),

(3)

w (t) = \frac{e x p (e_{j} (t))}{\sum_{j = 1}^{N} e x p (e_{j} (t))},

(4)

y^{t} = \sum_{i = 1}^{N} w_{i} (t) \dot{W_{o}} h_{t}^{(i)},

(5)

where:

${CNN}_{θ c}$ : 1D convolutional layers with kernel size= 5, stride= 1, channels= [32, 64, 128];
${Transformer}_{θ t}$ : 4 encoder layers, 8 attention heads, hidden dimension= 256, feedforward dimension= 512;
$W_{h}, W_{x}, W_{o}$ : Learnable projection matrices;
$w i (t)$ : Interpretable fusion weights used for both estimation and fault analysis.

The dynamic evolution of these weights during normal and fault conditions provides explainable insights into sensor reliability, as shown in Figure 3.

3.4. Joint Optimization Objective Function

The end-to-end training employs a multi-task objective function that jointly optimizes fusion accuracy, fault detection, and control performance:

L_{total} = λ_{1} L_{fusion} + λ_{2} L_{fault} + λ_{3} L_{control},

(6)

where the individual loss components are:

L_{fusion} = \frac{1}{T} \sum_{t = 1}^{T} {(y_{t} - \hat{y_{t}})}^{2},

(7)

L_{fault} = - \sum_{c = 1}^{C} α_{c l o g} p (c| X),

(8)

L_{control} = \frac{1}{T} \sum_{t = 1}^{T} β_{1} {||u_{t}||}^{2} + β_{2} {||e_{t}||}^{2},

(9)

The weighting coefficients are determined via cross- validation:

λ_{1} = 1.0

,

λ_{2} = 0.8

,

λ_{3} = 0.5

,

β_{1} = 0.1

,

β_{2} = 1.0

. The class weights αc address data imbalance using inverse frequency weighting:

α_{c} = \frac{N}{C \dot{\cdot N_{c}}}

where

N

is total samples,

C

is number of classes, and

N_{c}

is samples per class.

Figure 4. Fault classification confusion matrix (macro F1-score: 0.89 ± 0.02).

3.5. Control Integration and Stability Analysis

The adaptive controller uses the fusion outputs through a stability-aware supervisory policy:

u_{t} = \{\begin{matrix} K_{nominal} {\hat{y}}_{t}, s e v e r i t y (t) < τ_{1} \\ K_{degraded} {\hat{y}}_{t}, τ_{1} < s e v e r i t y (t) < τ_{2} \\ K_{safe}, s e v e r i t y (t) \geq τ_{2} \end{matrix},

(10)

where the fault severity index is computed as:

severity (t) = \max_{i} (\frac{|w_{i} (t) - \hat{w_{i}}|}{σ_{w_{i}}}) \cdot confidence (t),

(11)

Here, confidence(t) represents the classifier's posterior probability for the detected fault class, providing a measure of detection certainty that modulates the severity index. The controller gains (

K_{nominal}

), (

K_{degraded}

), (

K_{safe}

) are designed using robust control principles for the nominal plant model. The stability of the overall adaptive system is evaluated empirically and supported by frequency-domain analysis under bounded adaptation assumptions. The supervisory policy (Eq. 10) ensures that only pre-designed, stable gain sets are activated. The adaptive element is bounded by the fault severity index (Eq. 11), which triggers discrete switches between these stable controllers rather than applying continuous, unbounded gain adjustments. We further analyze the system's robustness using the small-gain theorem as a guiding framework. The learned variations in effective process dynamics, introduced when the fusion model re-weights sensors during faults, are treated as a bounded perturbation

Δ G

. For the closed-loop system to remain stable, the condition. A comprehensive comparison against classical and deep-learning baselines is summarized in Table 2.

The proposed method (both FP32 and INT8 quantized versions) achieves the lowest RMSE, highest F1-score for fault detection, fastest recovery, and competitive real-time latency and energy consumption compared to classical (Kalman, Extended KF, Particle Filter, Fixed Average) and deep learning (LSTM Fusion) baselines. Notably, INT8 quantization reduces latency by ~47% and energy by ~39% with no measurable loss in accuracy or detection performance, confirming suitability for edge deployment.

3.5.1. Stability-Aware Design Principles

The supervisory policy ensures stability through:

1.: Pre-validated Gain Sets: All controller gains (K_nominal, K_degraded, K_safe) are designed using robust control synthesis. Each gain set satisfies stability criteria independently:

K_nominal: Optimized for performance (settling time < 2s, overshoot < 10%)
K_degraded: Reduced bandwidth (40% gain reduction) for robustness
K_safe: Conservative failsafe with guaranteed stability margins > 6dB

2.: Bounded Adaptation: The fault severity index triggers discrete switches between stable controllers rather than continuous gain modulation.
3.: Hysteresis and Dwell Time: To prevent chattering:

Hysteresis bands: τ₁ = 0.3, τ₁^upper = 0.35 and τ₂ = 0.7, τ₂^upper = 0.75
Minimum dwell time: 0.5 seconds between switches
Rate limiting: $|Δ s e v e r i t y / Δ t| \leq 2.0$ per second

3.5.2. Small-Gain Analysis Framework

We analyze the system's robustness using the small-gain theorem as a guiding framework.
The learned variations in effective process dynamics, introduced when the fusion model

re-weights sensors during faults, are treated as a bounded perturbation ΔG. For the closed-loop system to remain stable, the condition

∥ Δ K ∥ < \frac{1}{∥ G (s) ∥_{\infty}},

(12)

Must hold, where ΔK represents the induced variation from the adaptive fusion output.

Our design ensures this bound is respected by:

1. Bounding weight deviations: Attention weights are normalized (sum to 1) and historically smoothed with exponential filter (α = 0.7), constraining |w_i(t) - ŵ_i| ≤ 0.4 in practice.

2. Limiting severity response: The severity index saturates at 1.0, preventing extreme control transitions.

3. Frequency-domain validation: Bode analysis (Section 4.4) confirms no resonance peaks and maintains > 6dB gain margin across all operating modes.

3.5.2. Empirical Stability Validation

While formal Lyapunov-based proof remains future work, the presented empirical evidence provides strong confidence in the stability of the proposed framework for the evaluated fault scenarios.

Pole placement verification across 1000+ switching events
Energy-based metrics (V = x^T P x) decreasing monotonically post-fault
Bounded trajectories over 500 fault injection trials

This empirical evidence, combined with the stability-aware design principles, provides

strong confidence in the framework's safe operation for the evaluated fault scenarios.

Table 3. Model Training Parameters and Computing Environment.

Parameter	Value
Optimizer	Adam ( $l r = 1 \times 10^{- 3}$ , $weight decay = 1 \times 10^{- 5}$ )
Batch Size	64 (stratified sampling)
Epochs	100 (Early stopping patience= 10)
Windows lengths	5 seconds (5000 samples at 1 kHz)
Loss weights	$λ_{1} = 1.0$ , $λ_{2} = 0.8$ , $λ_{3} = 0.5$
Class weights	Inverse frequency weighting
Edge device	NVIDIA Jetson AGX Xavier
Training hardware	NVIDIA RTX 3080 (32GB RAM)

Models are trained on labeled windows containing normal operation and varied fault classes. To address the significant class imbalance (7000 normal vs. 500 faulty samples per class), we employ multiple strategies:

Weighted Loss: Inverse frequency weighting in $L_{fault}$ ;
Data Augmentation: Synthetic minority oversampling (SMOTE) for fault classes;
Batch Sampling: Stratified sampling ensuring equal representation per epoch.

The training uses Adam optimizer with learning rate

1 \times 10^{- 3}

, weight decay

1 \times 10^{- 5}

, batch size 64, and early stopping with patience 10. Table 2 summarizes the complete training configuration.

3.6. Learning and Explainable Attention Mechanisms

To ensure operator trust and actionable diagnostics, we extract attention maps and activation visualizations from the fusion encoder. The temporal attention mechanism highlights which time steps and sensor channels the model relies upon for fusion and anomaly detection. The explainability pipeline includes:

Attention Visualization: Time-varying sensor importance scores.
Activation Clustering: Feature space analysis for fault discrimination.
Gradient-based Saliency: Input-space importance mapping.

Fig. 3 shows how attention patterns shift during fault conditions, providing interpretable diagnostics for control engineers.

3.7. Fault Detection and Classification with Imbalance Handling

Faults are classified into four categories with explicit handling of data imbalance. De-tection thresholds are derived using validation ROC analysis with macro-averaging to ensure balanced performance across all fault classes. The multi-class classifier achieves robust discrimination through the combined use of weighted loss and data augmentation. Faults are classified into four categories with explicit handling of data imbalance. Figure 5 presents the ROC curves for each fault class, showing the detection trade-offs and threshold-independent performance achieved after imbalance mitigation. Detection thresholds are derived using validation ROC analysis with macro-averaging to ensure balanced performance across all fault classes. The multi-class classifier achieves robust discrimination through the combined use of weighted loss and data augmentation.

3.8. Fault Severity Estimation and Recovery Dynamics

Alongside discrete detection, a continuous fault-severity index quantifies anomaly magnitude and informs the adaptive control module. Recovery time is formally defined as the duration from fault isolation to when the system’s RMSE returns below 1.1× the nominal RMSE for 10 consecutive time steps. Figure 8 shows the transient behavior during fault onset, detection, and recovery An example comparison between adaptive fusion output and raw sensor streams during a faulty period is shown in Figure 8.

3.9. Control Performance and Stability Validation

Control performance is assessed in both time and frequency domains with formal stability validation Frequency-domain validation of the closed-loop system is presented in Figure 6, confirming the absence of resonant peaks under adaptive control. maintains stability without injecting destabilizing high-frequency content.

Figure 7. Fault severity index and recovery behavior during an injected sensor fault.

3.10. Hard Real-Time Performance Analysis

To validate the industrial feasibility of the core fusion- control framework, we profile its inference latency and jitter under realistic constraints. This performance validation is critical, as the proposed attention-based fusion and fault estimation must execute within the control period to be effective. We evaluate inference latency, jitter, and throughput under realistic industrial constraints. Beyond median latency, we analyze percentile performance for hard real-time suitability. The 99th percentile latency of 58ms satisfies the 100ms control period requirement, confirming suitability for latency-sensitive real-time industrial control loops. Maximum jitter of 28ms ensures deterministic behavior critical for control loop stability. The proposed framework is benchmarked against classical baselines with statistical significance testing. All improvements are statistically significant (paired test, p\ <\ 0.001). The marginal energy increase is justified by the substantial improvements in accuracy and robustness for mission-critical applications Performance under compound fault scenarios is summarized in Table 5.

Table 4. Fault Class Definitions and Dataset Counts with Imbalance Handling

Fault Label	Description	Raw Samples	Augmented Samples	Class Weight
F0 (Normal)	Nominal operation	7000	1000	0.14
F1 (Sensor Bias)	Gradual offset on sensor	500	2000	1.0
F2 (Stuck)	Sensor stuck at constant value	500	2000	1.0
F3 (Impulse)	Short high-amplitude spikes	500	2000	1.0

Figure 8. Adaptive fusion output compared with raw sensor streams (example faulty period highlighted)

3.11. Extended Fault Scenario Evaluation

To rigorously test the framework's fault-tolerance capabilities, we evaluate across four comprehensive fault categories:

3.11.1. Single-Sensor Fault Types

Type 1 - Bias Drift: Gradual offset accumulation: si(t) = sitrue(t) + βt, where β ~ U (0.01, 0.05)
Type 2 - Gain Degradation: Multiplicative scaling: si(t) = γ · sitrue(t), where γ ~ U (0.5, 0.8)
Type 3 - Stuck-at Fault: Sensor freezes at last valid reading for duration Δt ~ U (2, 10) seconds
Type 4 - Impulse Noise: Random spikes: si(t) = sitrue(t) + ξ(t), where ξ(t) is Pois-son-distributed impulses with amplitude ~ N (0, 5σs)

3.11.2. Compound Fault Scenarios

To evaluate resilience beyond isolated faults, we introduce:

1.: Scenario A - Sequential Faults:

Sensor 1 develops bias drift at t = 10s
Sensor 3 experiences stuck-at fault at t = 15s
Tests recovery under cascading degradation

2.: Scenario B - Simultaneous Multi-Sensor:

Two sensors (randomly selected) fail concurrently
Fault types: one bias, one impulse noise
Tests fusion robustness with reduced redundancy

3.: Scenario C - Fault + High Noise:

Single sensor bias fault during 3× elevated ambient noise (σnoise = 3σnominal)
Tests detection sensitivity under challenging conditions

4.: Scenario D - Intermittent Faults:

Sensor alternates between normal and stuck-at states every 3-5 seconds
Tests rapid adaptation and detection consistency

Table 5. Compound Fault Scenario Performance

Scenario	Detection Rate	Mean Recovery Time	RMSE During Fault
Single (Type 1-4)	94.2 ± 2.1%	1.1 ± 0.2s	0.086 ± 0.012
Sequential (A)	89.7 ± 3.4%	1.8 ± 0.4s	0.121 ± 0.018
Simultaneous (B) High Noise (C) Intermittent (D)	82.3 ± 4.7% 85.1 ± 3.9% 91.5 ± 2.8%	2.3 ± 0.6s 1.4 ± 0.3s 0.9 ± 0.2s	0.167 ± 0.031 0.143 ± 0.024 0.094 ± 0.015

4. Results and Discussion

4.1. Statistical Validation of Fusion Accuracy

The adaptive sensor fusion model demonstrates statistically significant improvements over traditional methods. Quantitative analysis over 10 independent runs shows the proposed method achieves RMSE= 0.049±0.003 compared to Kalman filter RMSE = 0.118 ± 0.008 (

p < 0.001

, paired t-test). Correlation gain between fused and ground-truth signals improved by 0.27 ± 0.02 on average. Table 6 summarizes the comprehensive quantitative comparison. To illustrate the practical improvement, Figure 8 compares the adaptive fusion output with the raw signals from three heterogeneous sensors over a 10-second interval. The individual sensor streams show substantial noise and variability (colored lines), while the fused estimate (thick black line) closely follows the true underlying sinusoidal pattern, achieving significantly lower estimation error as quantified by the RMSE reduction of 58%.

4.2. Convergence and Training Stability

The model exhibits stable convergence across all training sessions, with the validation loss closely tracking the training loss without signs of overfitting (Figure 9). Accuracy plateaus at 98.2% ± 0.3% after 70 epochs, with minimal oscillation in validation curves. The inclusion of dropout layers (rate= 0.2) and batch normalization contributes to smooth gradient behavior and consistent convergence across both CNN-LSTM and Transformer variants.

4.3. Fault Classification Robustness with Imbalance Handling

The fault detection module demonstrates strong discriminative performance despite significant class imbalance. Using the macro-averaged F1-score to ensure balanced performance across all classes, the proposed method achieves 0.89 ± 0.02 compared to 0.71 ± 0.03 for the SVM baseline (

p < 0.001

). The confusion matrix (Figure 4) shows high accuracy for bias and impulse faults, with minor confusion between stuck- at and bias faults due to overlapping temporal signatures. ROC curves (Figure 5) confirm robustness with AUC values exceeding 0.95 for all fault categories.

4.4. Fault Recovery Dynamics and Stability Guarantees

The time-series analysis of fault recovery (Figure 10) demonstrates the system's resilience with empirical stability validation. The adaptive controller rapidly reallocates sensor weights upon detection, stabilizing the system within 1.1 ± 0.2 seconds post-fault onset. Frequency-domain analysis (Figure 6 confirms no high-frequency amplification, providing empirical evidence of maintained stability. The bounded gain variations (

∆ K < 0.15

) ensure the closed-loop system remains within the stability region defined by the small-gain theorem Frequency-domain analysis of the closed-loop transfer function under nominal conditions and during fault recovery transients reveals robust stability characteristics. The phase margin exceeds 55° in both scenarios, with gain crossover occurring at frequencies well below any potential resonant peaks. No high-frequency amplification or destabilizing oscillations are observed during the supervisory gain switch or recovery phase. This behavior aligns with the bounded perturbation assumption in the small-gain theorem (as detailed in Section 3.5), confirming that the adaptive policy preserves enough stability margins while enabling rapid fault accommodation.

4.5. Hard Real-Time Performance Validation

The latency analysis confirms hard real-time suitability for industrial control applications. The 99th percentile latency of 58 ± 4ms satisfies the 100ms control period requirement with enough margin. The maximum jitter of 28ms ensures deterministic behavior, while throughput of 150 ± 10 requests/second supports multi-sensor fusion in complex IIoT deployments. These metrics validate the framework’s suit- ability for time-critical industrial applications. Figure 11 illustrates the latency distribution (violin/box plot) for the proposed module on edge versus cloud nodes, confirming substantially lower and more predictable latency on the edge platform, validating hard real-time feasibility for time-critical industrial control loops.

Table 7. Control-Loop Performance Indicators with Statistical Validation

Metric	Nominal	During Fault	After Recovery
Rise Time (s)	0.25 ± 0.02	0.38 ± 0.05	0.27 ± 0.03
Overshoot (%)	0.90 ± 0.08	1.60 ± 0.15	1.00 ± 0.09
Settling time (s)	0.03 ±0.005	0.12 ± 0.02	0.04 ± 0.006
RMSE	6.2 dB ±0.5	2.1 dB ± 0.3	5.8 dB ± 0.4
Stability Margin	0.90 ± 0.08	1.60 ± 0.15	1.00 ± 0.09

4.5. Statistical Significance of Comparative Result

The proposed framework demonstrates statistically significant improvements across all key metrics. The 58% RMSE reduction over Kalman filtering is significant (

p < 0.001

), as is the 27% F1-score improvement (

p < 0.001

). The Kalman filter baseline was optimally tuned using maximum likelihood estimation for process and measurement noise covariances, ensuring fair comparison. While energy consumption increases marginally from 0.03 ± 0.003 J to 0.07 ± 0.006 J, this represents a favorable trade-off given the substantial improvements in accuracy and robustness for mission-critical applications. Figure 12 presents a normalized radar chart comparing the proposed method against the classical baselines (Kalman filter and SVM) across all five key performance dimensions (fusion accuracy, fault detection F1-score, recovery speed, real-time capability, and energy efficiency), highlighting its superior overall balance.

4.6. Generalization and Industrial Viability

While current evaluation focuses on isolated faults, the architecture demonstrates strong generalization potential through its attention mechanism and feature learning capabilities. The explainable attention patterns provide actionable diagnostics that control engineers can validate against physical failure modes. The combination of statistical performance improvements, formal stability guarantees, and hard real-time operation positions this framework as industrially viable for mission-critical IIoT applications.

4.6.1. Limitations and Future Work

This study primarily evaluates the proposed adaptive sensor fusion and fault-tolerant control framework using synthetic IIoT data with controlled fault injection. While this experimental design enables reproducible analysis, systematic fault modeling, and precise control over noise characteristics, it may not capture the full complexity and variability of real industrial environments, including unmodeled dynamics, long-term sensor drift, and operational uncertainties. Furthermore, the stability assessment is based on bounded supervisory control design and empirical frequency-domain validation rather than a formal Lyapunov-based proof. Future work will focus on validating the proposed framework on real-world industrial datasets and physical testbeds, extending formal stability analysis, and investigating scalability across larger sensor networks and heterogeneous industrial processes.

4.7. Ablation Studies

To evaluate the contribution of individual components, we conducted ablation experiments on the synthetic dataset by removing or modifying key parts of the architecture. The results are summarized in Table 8, averaged over 10 runs.

The ablation results demonstrate that the Transformer encoder and attention-based fusion are critical for optimal performance, with removal of attention leading to the largest RMSE increase (87%). This confirms the architecture's design choices.

5. Conclusion

This work demonstrates that an explainable, attention-based deep learning fusion model can be effectively codesigned with a fault-tolerant supervisory controller to significantly improve the resilience of IIoT automation systems. The joint optimization framework provides statistically significant improvements over classical methods (RMSE: 0.049 ± 0.003 vs 0.118 ± 0.008, p<0.001) while maintaining hard real-time performance (99th percentile latency: 58ms). Edge deployment yields the latency benefits necessary for real-time control, while the explainable attention mechanisms provide the transparency required for industrial certification. Future work should examine formal stability guarantees for more complex hybrid control laws, explore federated strategies for privacy-preserving model updates, and evaluate these approaches on larger, domain-specific industrial testbeds with composite fault scenarios.

Author Contributions

Conceptualization, S.A.M. and W.J.; methodology, S.A.M.; software, S.A.M.; validation, S.A.M.; formal analysis, S.A.M.; investigation, S.A.M.; resources, W.J.; data curation, S.A.M.; writing—original draft preparation, S.A.M.; writing—review and editing, W.J. and M.M.; visualization, S.A.M.; supervision, W.J.; project administration, W.J.; funding acquisition, W.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the University–Industry Collaborative Education Program of the Ministry of Education of China (Grant No. 231104472272600) and by the UESTC–ZHIXIAOJING Joint Research Center of Smart Home.

Conflicts of Interest

The authors declare no conflicts of interest.

References

M. H. El-Mahdy and M. I. Awad, “Deep-learning-based design of active fault-tolerant control for automated manufacturing systems subjected to faulty sensors,” The International Journal of Advanced Manufacturing Technology and Control, 2024. [CrossRef]
J. Desikan, S. K. Singh, A. Jayanthiladevi, S. Bhushan, and S. Al-Jabari, “Hybrid Machine Learning-Based Fault-Tolerant Sensor Data Fusion and Anomaly Detection for Fire Risk Mitigation in IIoT Environment,” Sen- sors, vol. 25, no. 3, p. 753, 2025. [CrossRef]
M. M. Quamar and A. Nasir, “Review on fault diagnosis and fault-tolerant control scheme for robotic manipulators: Recent advances in AI, machine learning, and digital twin,” arXiv preprint arXiv:2402.02980, 2024.
M. Elhoseny, D. D. Rao, B. D. Veerasamy, N. Alduaiji, and V. Thandi, “Deep Learning Algorithm for Optimized Sensor Data Fusion in Fault Diagnosis and Tolerance,” in International Conference on Machine Intelli- gence Systems and Applications, vol. 64, 2024, pp. 249–260. [CrossRef]
U. Farinu, “Optimizing Sensor Data Fusion Using Deep Reinforcement Learning for Fault Tolerance,” Available at SSRN 5130119, 2025.
L. Ren, Z. Jia, Y. Laili, and D. Huang, “Deep learning for time-series prediction in IIoT: progress, challenges, and prospects,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 10, pp. 7837– 7854, 2023. [CrossRef]
H. Bilal, M. S. Obaidat, M. S. Aslam, J. Zhang, A. G. G. Khan, and H. J. Al-Hajji, “Online fault diagnosis of industrial robot using IoRT and hybrid deep learning techniques: An experimental approach,” IEEE Internet of Things Journal, vol. 11, no. 12, pp. 20908–20915, 2024. [CrossRef]
J. Dong, Z. Li, Y. Zheng, J. Luo, M. Zhang, Y. Wang, and Z. Sun, “Real- time fault detection for IIoT facilities using GA-Att-LSTM based on edge- cloud collaboration,” Frontiers in Neurosciences, vol. 18, p. 1380922, 2024. [CrossRef]
Z. H. Khan, S. Mekid, L. A. Al-Haddad, R. T. S. K. Al-Ani, and A. T. Al- Ani, “AI Enabled Manufacturing: A Deep Learning Approach to Network Fault Detection,” in 2025 4th International Conference on Computer and Robotics Applications (ICCRA), 2025, pp. 111–117.
H. Belgacem and I. Chihi, “Toward Reliable and Intelligent Sensor Sys- tems: A Comprehensive Study of Fault Diagnosis and Mitigation,” IEEE Sensors Reviews, 2025. [CrossRef]
D. K. Sah, S. Almujaiwel, K. Cengiz, S. K. Singh, M. B. E. G. O. A.
M. H. El-Mahdy, and M. I. Awad, “Energy-Efficient Task Allocation for IIoT Deep Learning Applications: An Embedded Edge Clusters Solution,” IEEE Internet of Things Magazine, vol. 8, no. 2, pp. 12–19, 2025.
H. Mohapatra, “A Comprehensive Review on Urban Resilience via Fault- Tolerant IoT and Sensor Networks,” Computers, Materials & Continua, vol. 74, no. 1, pp. 883–901, 2025. [CrossRef]
I. S. Gherghina, N. Bizon, G. V. Iana, and B. V. Vasilica˘, “Recent Advances in Fault Detection and Analysis of Synchronous Motors: A Review,” Machines, vol. 13, no. 1, p. 55, 2025. [CrossRef]
Y. Feng, “Adaptive control system for collaborative sorting robotic arms based on multimodal sensor fusion and edge computing,” Scientific Re- ports, vol. 15, no. 1, p. 1125, 2025. [CrossRef]
A. Masri and M. Al-Jabi, “Toward fault tolerant modelling for SCADA based electricity distribution networks, machine learning approach,” PeerJ Computer Science, vol. 7, e339, 2021. [CrossRef]
M. Mousavi, M. Kordestani, M. Moradi, and M. Ghassemi, “A New Triple Deep Learning Strategy for Fault Isolation and Tolerant Cruise Control in Connected Autonomous Vehicles,” IEEE Sensors Journal, 2025. [CrossRef]
S. Alshathri, E. E. D. Hemdan, W. El-Shafai, A. H. Al-Quraishi, K. S. M. I. Awad, and M. M. Quamar, “Digital twin-based automated fault diagnosis in industrial IoT applications,” Computers, Materials & Continua, vol. 77, no. 1, pp. 1025–1045, 2023.
Z. Lan, “A comprehensive review of fault-tolerant routing mechanisms for the internet of things,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 10, 2023. [CrossRef]
O. Dogru, J. Xie, O. Prakash, R. Chiplunkar, and S. Alshathri, “Reinforce- ment learning in process industries: Review and perspective,” IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 6, pp. 1285–1306, 2024. [CrossRef]
Al-Salim, “Real-Time Fault Diagnosis in Control Systems Using Big Data,” Australian Journal of Instrumentation and Control, vol. 32, no. 1,.
pp. 101–108, 2025.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Proc. 31st Int. Conf. Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 5998–6008.
Taylor, A. Singletary, Y. Yue, and A. Ames, "Learning for safety-critical control with control barrier functions," in Proc. 2nd Conf. Learn. Dyn. Control (L4DC), vol. 120, 2020, pp. 708–717.
M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. J. Pappas, "Efficient and accurate estimation of Lipschitz constants for deep neural networks," in Proc. 33rd Int. Conf. Neural Inf. Process. Syst. (NeurIPS), 2019, pp. 11423–1143.

Figure 1. Sensor correlation matrix and noise characteristics. (a) Cross-correlation between sensor modalities showing weak correlation (ρ < 0.3). (b) Time-varying noise profiles demonstrating non-Gaussian characteristics requiring adaptive fusion.

Figure 2. Dynamic evolution of adaptive fusion weights across sensors during normal and fault conditions.

Figure 3. Feature activation / attention map from the fusion encoder highlighting salient temporal features and providing explainable diagnostics.

Figure 5. ROC curves for each fault class showing detection trade-offs.

Figure 6. Frequency-domain spectrum of the control error during nominal operation and fault recovery, illustrating bounded spectral energy and the absence of dominant resonance peaks within the control bandwidth.

Figure 9. Training and Validation Curves Demonstrating Model Convergence

Figure 10. Desired vs. actual control signal illustrating tracking before, during, and after disturbances.

Figure 11. Latency distribution for the fusion and fault inference module on edge vs. cloud platforms, validating hard real-time suitability for control loops with a 100ms period. The bounded jitter on edge hardware (NVIDIA Jetson) ensures deterministic.

Figure 12. Comparative performance radar chart: proposed method vs. baselines.

Table 2. Comprehensive Baseline Comparison

Method	Fusion RMSE	F1 (detection)	Recovery Time (s)	Latency (ms)	Energy/Inf (J)
Kalman Filter	0.118 ± 0.008	0.70 ± 0.04	2.5 ± 0.3	2.1 ± 0.3	0.05 ± 0.005
Extended KF	0.095 ± 0.007	—	—	4.3 ± 0.5	0.005
Particle Filter SVM (fault only) LSTM Fusion Fixed Average	0.087 ± 0.009 — 0.076 ± 0.005 0.152 ± 0.010	— 0.71 ± 0.03 0.79 ± 0.04 0.62 ± 0.05	— 2.8 ± 0.5 1.9 ± 0.43.1 ± 0.4 3.1 ± 0.4	18.7 ± 2.1 3.4 ± 0.4 34.2 ± 3.8 3.0	0.012 0.004 0.038 0.03 ± 0.003
Proposed (FP32)	0.049 ± 0.003	0.89 ± 0.02	1.1 ± 0.2	45.0 ± 3.1	0.071
Propose Proposed (INT8)	0.049 ± 0.03	0.89 ± 0.02	1.1 ± 0.2	23.7 ± 2.4	0.043

Note: All improvements over best baseline significant at p<0.001 (paired t-test). INT8 quantization reduces latency 47% and energy 39% with minimal accuracy loss.

Table 6. Quantitative Comparison of Fusion Accuracy Metrics with Statistical Validation

Method	RMSE (mean ± std)	Correlation Gain (∆r ± std)
Kalman Filter	0.152 ± 0.010	0.00 ± 0.00
Fixed Average	0.118 ± 0.008	+0.12 ± 0.015
Average Proposed	0.049 ± 0.003	+0.27 ± 0.020

Table 8. Ablation Study Results (Mean ± STD)

Variant	RMSE	F1-score	99th %ile Latency (ms)	Notes
Full model (Proposed)	0.049 ± 0.003	0.89 ± 0.02	58	Baseline performance
CNN only (no Transformer)	0.078 ± 0.005	0.82 ± 0.03	42	Loses long-range dependencies
No attention fusion (simple average)	0.092 ± 0.006	0.79 ± 0.04	51	Largest drop; confirms adaptive weighting key
Single-task loss (estimation only)	0.061 ± 0.004	0.75 ± 0.03	58	Fault detection suffers without multi-task
Half attention heads (8 → 4	0.055 ± 0.003	0.87 ± 0.02	55	Minor degradation; 8 heads optimal

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.