An Interpretable Anomaly Detection and Identification Framework for Onboard ADCS Fault Management in Nanosatellites

Karen Wendy Vidaurre Torrez; Franklin Josue Ticona Coaquira; Christian Ricardo Conchari Cabrera; Andres Fernando Aguirre Velez; Litzy Ximena Conde Alvarado; Sol Maria Chamorro Armoa; Jose Rodrigo Cordova Alarcon; Akitoshi Hanazawa

doi:10.20944/preprints202606.0930.v1

Submitted:

10 June 2026

Posted:

11 June 2026

You are already at the latest version

Abstract

Anomaly signals in the Attitude Determination and Control System (ADCS) of nanosatellites can significantly degrade mission performance, especially in the absence of robust fault detection, isolation, and recovery (FDIR) mechanisms. Thus, traditional threshold-based approaches, while portable and compact, may overlook subtle faults, whereby abnormal sensor signals or current spikes within the threshold may compromise the operation of the entire ADCS as a subsystem. Furthermore, the lack of interpretable detection methods further limits the development of reliable machine learning (ML) FDIR solutions. To address these limitations, this work presents a wavelet-based anomaly detection framework that introduces a two-stage hybrid architecture combining a lightweight Convolutional Neural Network (CNN) for fault detection with logistic regression for fault classification, both based on Discrete Wavelet Transform (DWT) detail coefficients extracted from sensor and actuator data. The framework was validated using a statistics-based anomaly dataset for a 1U CubeSat ADCS simulated in MATLAB, in which anomalies are introduced at the component level with controlled variations in magnitude, frequency, and waveform, ensuring 99% statistical significance. Additionally, to demonstrate operational feasibility, constraints for onboard implementation were considered by executing the proposed framework in a Processor-in-the-Loop (PIL) environment. For benchmarking, lightweight detection and classification algorithms were compared, including Out-Of-Limits (OOL) and compact machine learning approaches. Finally, to identify the framework’s limitations and trace faulty events to physical phenomena, Grad-CAM, SHAP, and impurity analysis were performed on the proposed algorithms as primary interpretability tools. Consequently, the results demonstrate accurate fault detection and identification to support both autonomous FDIR actions and ground operator decision-making. The proposed validation framework and dataset provide a reproducible basis for advancing anomaly detection onboard nanosatellites.

Keywords:

ADCS

;

anomaly

;

FDIR

;

nanosatellite

;

explainability

;

processor-in-the-loop

Subject:

Engineering - Aerospace Engineering

1. Introduction

In the current New Space Era, nanosatellite projects have gained significant relevance in the industry, promoting the inclusion of swarm technologies and the democratization of space [1]. Nanosatellites are characterized by short development cycles and low-cost designs. This approach, despite being an advantage for countries and companies seeking to enter the space sector, also introduces considerable risk due to the reliance on Commercial-Off-The-Shelf (COTS) components, which, while affordable and easily accessible, are more susceptible to failures and degradation [2]. Consequently, fault detection, isolation, and recovery (FDIR) systems were introduced to maximize mission lifetime and have become essential in nanosatellite missions. FDIR is an essential component of the Flight Software (FSW) for different space systems, from satellites to lunar rovers, ensuring the reliability, fault tolerance, and autonomy of the entire system. Despite their limited computational capabilities, these systems have direct access to real-time health data, allowing faster and more autonomous response to faults during in-orbit operations. In contrast, for ground station monitoring systems through housekeeping (HK) data [3] their capability to respond to failures is inherently limited by the lack of real-time troubleshooting. Nevertheless, ground systems benefit from powerful computational resources and the expertise of human operators.

For nanosatellites, the increase in complexity in nanosatellite missions reflects the technological maturity that nanosatellites have achieved in this sector. Therefore, current requirements are more challenging for applications such as constellations and swarms that require ADCS not only for pointing stability but also for orbital maneuvers that allow for precise formations [4]. However, the ADCS is also considered one of the most failure-prone subsystems, accounting for 17.75% of the causes of infant mortality [5].

Despite recent advances in this area, most of the reviewed approaches, including those applied to NASA, JAXA, and KARI missions, focus on the offline analysis domain. These methods, although highly accurate in controlled environments, rely on historical data and do not consider the real computational, energy, or communication constraints characteristic of nanosatellites. Without using flight hardware, their validations are carried out only on the ground or by human operator reviews [6,7,8]. Furthermore, current explainability models offer XAI support for telemetry analysis; however, they have not been tested on platforms with limited resources or targeted at critical subsystems like ADCS, which restricts their applicability to embedded CubeSat systems [9].

Consequently, this research generated a dataset based on a hybrid approach, leveraging physics-based (model-based) and knowledge-based simulation to obtain a statistically grounded dataset of ADCS anomalies. The anomalies are derived from realistic and physically plausible sensor signal failures injected into the ADCS control loop using a Model-In-the-Loop (MIL) simulation in MATLAB. Thus, this methodology aims to establish a meaningful correlation between signal-level anomalies and actual ADCS failure conditions in nanosatellites to improve onboard fault detection and system resilience.

The resulting dataset was evaluated with various lightweight detectors, including traditional OOL and machine learning (ML) algorithms, using two different data representations for statistical feature-based and discrete wavelet transform-based algorithms. A two-stage framework is proposed that integrates detailed coefficients from the Discrete Wavelet Transform with detection and classification algorithms selected based on an analysis of their effectiveness and feasible on-board implementation on an embedded system.

To assess the feasibility of its real-time implementation under the typical computational limitations of nanosatellite platforms, a Processor-in-the-Loop (PIL) validation was performed on an STM32F446RE board, testing the complete framework.

Finally, Explainable Artificial Intelligence (XAI) methodologies were integrated to enhance the traceability and interpretability of detected anomalies. In this way, the identified anomalies were associated with physically significant failure patterns in typical ADCS.

Building upon these motivations and methodological foundations, the main contributions of this work are summarized as follows:

This work proposes a complete embedded framework for fault detection and isolation in ADCS. The framework introduces Wavelet CNN architecture for detection, and Wavelet Energy Logistic Regression for isolation. To select the pipeline a benchmarking analysis of lightweight machine learning algorithms was performed, prioritizing resource efficiency in typical 1U–3U CubeSat microcontrollers.
Additionally, a statistically validated dataset specifically designed for fault detection in nanosatellite attitude control subsystems is constructed. Unlike datasets based solely on telemetry or generic simulations, the proposed set integrates the statistical injection of six physical fault types (spike, drift, bias, stuck, data loss, and erratic) using the Statistical Fault Injection (SFI) methodology [10]. The dataset and complete framework are validated through a Processor-in-the-Loop (PIL) setup on an STM32F446RE microcontroller, demonstrating the real-time execution of detection models under realistic low-power hardware conditions.
Finally, the Gradient-weighted Class Activation Mapping (Grad-CAM) and SHapley Additive exPlanations (SHAP) methods are applied to the highest-performing detection models to link anomaly decisions to physically meaningful sensor patterns. This provides operator-understandable evidence linking detected faults to the specific sensor and actuator channels of the ADCS, thereby supporting both autonomous FDIR actions and ground-based decision-making workflows.

2. Related Work

A major constraint specific to small satellites relates to the limited resources available to implement advanced anomaly detection approaches, despite their high accuracy. In this way, two main approaches offline from telemetry data and onboard are considered for testing fault detection and identification algorithms, as summarized in Table 1. However, when evaluated offline, the detection is not validated in flight constrained hardware, and when evaluated onboard, the majority of the methods don’t provide explainability support for embedded systems, which this paper targets.

Lightweight methods based on thresholds and statistical context descriptors such as mean and variance continue to be proposed to characterize the temporal structure of samples for anomaly detection, as seen in related works such as [4] and [40]. For instance, [39] explored a classic approach named Variance based detection. This method implements a threshold-based anomaly detector operating on sliding windows. The underlying assumption is that anomalous behavior manifests as an unexpected increase in signal variability within a short time interval. An anomaly is declared if the variance of at least one feature exceeds a predefined threshold. A variant named Adaptive Variance Based Anomaly Detector method extends the fixed-threshold variance-based detector by introducing an adaptive thresholding mechanism derived from the statistical distribution of signal variability within each window. Both methods will be considered as baselines to compare the performance of the proposed detection framework.

3. Anomaly Characterization

The approach adopted for synthetic fault generation in this work is based on reproducing various waveform morphologies that represent the behavior of anomalies over time. These anomalies are represented mathematically using a default standard for simple equations to engage all necessary features.

The representation for a specific anomaly fault in a sensor component can be rather inaccurate. So this approach is based also in the idea of modulation for any sensor modeled.

3.1. Anomaly Classification

From a physical perspective at the component level in ADCS systems, anomalous signals may be associated with failures in the sensors or actuators integrated into nanosatellites. Thus, [41] classified failures as: erratic faults, drift faults, hard-over faults, spike faults, and stuck faults as in Table 2. Subsequently, [4] complemented this typology by incorporating data loss faults.

In order to relate these anomalies to realistic component-level failures, this study is based on the physical consistency of the system. In particular, inconsistencies introduced in the signals from sensors and actuators that interact coherently in the kinematics and dynamics of the nanosatellite were considered, similarly to [4].

The failure categories adopted for this study aim to reflect physically plausible failure signals consistent with the ADCS sensor dynamics to ensure that the dataset maintains physical consistency with the overall system and its interaction with the space environment within the simulation framework. Table 3 lists the nature and modeling of anomalies with their possible physical interpretations of failure at the component level, based on existent literature. [39,42,43,44]. The mathematical modeling of these failures is summarized in Table A1.

3.2. Magnitude-Based Anomaly Levels

For this approach, three groups are considered based on their severity, which reflects the magnitude of the relative error caused on the output compared to the fault-free output. In order to generate failures significant enough to be detected, three levels of magnitude were introduced, considering the specific characteristics of the sensors in terms of the variance introduced for the sensor’s noise. In this way, these levels where characterized by

5 σ

,

7 σ

and

9 σ

. These values were then multiplied as a gain for modifying the magnitude of the fault. The approach aligns with prior work based on statistical thresholding for anomaly detection, in where k-sigma criteria to differentiate between normal variations and rare events [45].

3.3. Fault Injection

To ensure the statistical validity and representativeness of the generated dataset, the fault injection process follows a four-phase methodology grounded in the principles of Statistical Fault Injection (SFI) [46], achieving 99% confidence with a 1% margin of error.

Phase 1: Error Population Definition. The total population N is defined by the cartesian product of several key dimensions. Fault models, channels corresponding to the 11 ADCS sensor and actuator outputs, 400000 discrete time steps, severity levels, and duration windows of 1 to 20 samples (0.1 to 2.0 seconds) were considered for each applicable fault.
Phase 2: Sample Size estimation (n). The minimum number of injections (n) is estimated for a large population N, setting the parameters to a confidence level of 99% and a maximum margin of error of 1%, and a conservative proportion estimate of $p = 0.5$ , resulting in $n = 16590$ fault injections.
Phase 3: Fault Matrix Generation. The n faults are selected from the unified population N following a uniform random sampling protocol without replacement. This ensures that every single one of the millions of possible unique error configurations has an equal probability of being selected, and that no single configuration can be selected more than once. To achieve the highest quality of randomness and minimize sampling bias, a robust pseudo-random number generator, such as the Mersenne-Twister algorithm, is employed. For each of the injections, the algorithm randomly selects a complete fault vector (channel, type, magnitude, initial time, and duration) from the population N. This fault is then systematically injected into the simulation environment, and the resulting signals are recorded as shown in Figure 1.
Phase 4: Dataset generation. As shown in Figure 3, fault injection is performed on each channel. For the sensors, faults were injected after quantization, before they were encoded and used by the controller. Meanwhile, for the channels corresponding to the magnetorquers, faults were injected after the magnetic dipole ^$ℬ$ $m$ was computed, prior to quantization and encoding, since the embedded controller does not need to decode this variable. The signals collected from all channels comprise the dataset shown in Table 4. For the columns $f (t) [1 : 6]$ corresponding to the sun sensors, the injector was modified to prevent negative voltages.

Figure 2 shows the fault injector in greater detail, where each channel is modified with a precomputed fault, resulting in an anomalous signal

f_{Fault (t)}

. This block retrieves information from the fault matrix to ensure proper synchronization of the faults. In particular, for the Stuck fault, the last read value is retained. To achieve this, a memory control subprocess is integrated that is activated when the fault is injected.

The generated dataset was used to explore multiple anomaly detection and classification approaches in the following sections.

4. Simulation Environment for ADCS Modeling in CubeSats

The simulation framework is focused on a 1U CubeSat in a

400 k m

orbit with an inclination of

30^{\circ}

, which performs detumbling maneuvers using magnetic actuators (magnetorquers), since this is a commonly employed architecture for resource-constrained controllers in nanosatellites. The initial conditions and description of the simulated controller are described in [47]. Digital signals are generated within a Model-in-the-Loop (MIL) environment to replicate the interaction between the space environment, the CubeSat dynamics, its sensors, actuators, and its controller. As shown in Figure 3, the simulation framework includes the World Magnetic Model (WMM) 2020-2025 and the DE405 solar system ephemerids model to provide realistic magnetic and sun vector references for the sensors. Additionally, this framework includes a conic eclipse model to determine the sunlight condition (sunlit, umbra or penumbra) during orbital transitions as reference for the sun sensors.

4.1. Component Level Modeling

The sensor and actuator modeling stage was implemented within the MIL simulation scheme, these were parameterized based on the data sheets of each component for computing their variance and operational voltages.

Each sensor takes into account its quantization resolution, sampling time, dynamic range and Gaussian additive noise, which makes it possible to reproduce the deterioration of the real signals due to electronic limitations and environmental disturbances from Low Earth Orbit (LEO).

Modeled components are summarized in Table 5 and correspond to those used in [47]. It must be highlighted that each signal is corrupted by Gaussian noise and processed through quantization, sampling, encoding, and decoding steps in order to replicate real onboard conditions. Each of these signals serves as an injection point for the modeled anomalies.

The signal sensors with the fault injection are set in a 32-byte array. The attitude controller in Figure 3 obtains a magnetic moment for the magnetorquers. Where, this three-axis array is configured as a PWM signal. Also, the special function blocks for this system are the encoding/decoding blocks for digital signals. The spatial environment allows the framework to work with communication protocols for accurate, real-time execution.

5. Anomaly Detection and Classification Framework

5.1. Signal Segmentation and Window Formulation

A critical design choice for ADCS fault detection is the temporal granularity of the input. Instant-based methods, which evaluate the system using only a single time step

x_{t}

, lack the context to resolve ambiguities. To address this, a Sliding Window Strategy was employed following the methodology proposed in [39]. The framework’s input is a matrix

X_{t - W : t} \in R^{W \times F}

covering a history of

W \in {1, 5, 10, 15, 20}

samples, across F sensor channels which are described in Table 4. This temporal context allows the model to implicitly learn system derivatives enabling the distinction between "nominal instability" (commanded maneuvers) and "anomalous instability" (sensor degradation). Each window is defined as:

X_{i} \in R^{W \times F}

(1)

The labeling convention defines the ground truth as a binary indicator based on the manifestation of any anomaly at the terminal timestep of the respective window, ensuring the model is trained for real-time, online detection. Thus, each window is tagged with a binary label

{\hat{y}}_{b} = 0

for nominal and

{\hat{y}}_{b} = 1

for anomaly window. Additionally, considering that multiple anomalies can occur simultaneously, each window is tagged with a vector that indicates the faulty channels. The multilabel target is defined as

y_{c h} \in {0, 1}^{F}

. In this way, multiple channels may be active simultaneously. To prevent temporal data leakage, a purge gap of W samples is discarded at each split boundary, ensuring that the first window of each downstream split does not overlap with the last window of the preceding one. Prior to model training, normalization is applied during the dataset preparation stage via z-score standardization.

5.2. Data Representation Strategy

A dual approach is proposed to assess whether domain knowledge, encoded via engineered statistical features, provides a computational advantage over learned representations extracted directly from raw telemetry sequences. In this way, two distinct data representation strategies were formulated:Handcrafted Feature Extraction, and Raw Sequence Representation.

5.2.1. Handcrafted Feature Extraction (HFE)

The first strategy transforms the raw multivariate time series into a tabular feature matrix via statistical rolling-window descriptors applied at each timestep. For each sliding window a set of scalar descriptors is computed independently per sensor channel. When

W > 1

, the extracted features include: the raw signal value, first and second order temporal differences (

Δ

and

Δ^{2}

), and nine rolling-window statistics: mean, variance, standard deviation, minimum, maximum, median, skewness, kurtosis, range (

max - min

), and deviation from the rolling mean (

raw - \bar{x}

). For

W = 1

, only the instantaneous features (raw,

Δ

,

Δ^{2}

) are retained, as rolling statistics are undefined over a single sample. This configuration therefore constitutes a purely instantaneous descriptor baseline.

These descriptors were selected based on their physical relevance to ADCS sensor anomalies. Kurtosis is sensitive to the heavy tails produced by impulsive noise or SEUs in the magnetometer. Skewness captures the asymmetrical onset of gradual drift, such as gyroscope bias instability. Range and min/max descriptors detect saturation and clipping events. Standard deviation flags sudden increases in measurement noise. The first and second order temporal differences detect abrupt signal changes and spike onset/offset edges, respectively.

To ensure that all window configurations are evaluated on a strictly identical temporal span, the initial

W_{max} = 20

rows, where rolling features are undefined for the largest window, are removed from all datasets prior to splitting, regardless of the window size being processed. The resulting feature matrix is then partitioned chronologically into training (70%), validation (15%), and test (15%) splits. The resulting fixed-length tabular representation is designed to accommodate tree-based ensemble classifiers, which lack native sequence-awareness but offer deterministic and computationally efficient inference suitable for edge hardware.

5.2.2. Raw Sequence Representation (RSR)

The second strategy preserves the inherent temporal topology of the signals. The windowed dataset is split into training (70%), validation (15%), and test (15%) splits. In each window, for each channel, a Discrete Wavelet Transform (DWT) is applied using a level 1, db4 configuration, extracting 13 Detail coefficients per window:

x_{w, f} \overset{DWT}{\to} \{A_{f}, D_{f}^{(1)}\},

where

A_{f}

are the approximation (low-frequency) coefficients and

D_{f}^{(1)}

are the detail (high-frequency) coefficients. The detail coefficients

D_{f}^{(1)}

are stacked to form a single high-frequency tensor

D \in R^{K \times F}

with

K = 13

and

F = 15

channels per window.

High-frequency wavelet coefficients capture abrupt disturbances, such as spikes, impulse noise, and erratic sensor behavior, while low-frequency approximation coefficients are sensitive to slow dynamics, signal saturation, or lock-up failures. Thus, the RSR representation serves as a common input for two families of deep learning architectures: long short-term memory (LSTM) and convolutional neural networks (CNN).

5.3. Supervised Anomaly Detection and Identification Algorithms

While traditional OOL methods rely on fixed thresholds applied on raw data, it dismisses relevant information abstracted into features in both time and frequency domain, as described in Section 5.2. To exploit these features, a two-stage supervised learning framework is proposed to compare and combine complementary algorithms described in this subsection capable of modeling non-linear decision boundaries informed by physical context, while incorporating relationships between them. In this way, the framework consists of fault detection in the first stage and fault identification/classification in the second.

5.3.1. Tree-Based Classification Models

Three Tree-Based supervised architectures are evaluated to characterize the trade-off between detection performance and onboard computational cost.

Decision Tree (CART). A single Classification and Regression Tree (CART) is implemented as an interpretable and computationally efficient baseline. The tree recursively partitions the feature space $R^{D}$ using axis-parallel splits that maximize the Gini impurity reduction at each node:

$i (t) = 1 - \sum_{k = 0}^{1} p {(k ∣ t)}^{2}$

where $p (k ∣ t)$ is the proportion of class k samples at node t. To bound the model size for Flash memory constraints, Cost-Complexity Pruning is applied, minimizing the objective $R_{α} (T) = R (T) + α | T |$ , where $| T |$ denotes the number of terminal nodes and $α$ is the complexity parameter. The resulting model achieves $O (depth)$ inference complexity.
Extreme Gradient Boosting (XGBoost). To capture non-linear fault signatures beyond the capacity of a single tree, an additive ensemble of K regression trees is constructed, where each tree $f_{k}$ corrects the residual errors of the preceding ensemble. The prediction is given by:

${\hat{y}}_{i} = σ (\sum_{k = 1}^{K} f_{k} (x_{w}))$

The model minimizes a regularized log-loss objective $L = \sum_{i} ℓ (y_{i}, {\hat{y}}_{i}) + \sum_{k} Ω (f_{k})$ , where the regularization term $Ω (f) = γ T + \frac{1}{2} λ {∥ w ∥}^{2}$ penalizes the number of leaves T and the leaf weight magnitudes w, promoting sparse and memory-efficient representations suitable for embedded deployment.
Random Forest. A bagged ensemble of M independently trained decision trees is evaluated as a third architecture. Each tree is trained on a bootstrap sample of the training data, and predictions are obtained by majority vote across all M estimators. Random feature subsampling at each split reduces inter-tree correlation, improving generalization relative to a single CART while retaining the interpretability advantages of tree-based representations.

5.3.2. Recurrent Neural Network Classifier

A supervised classifier based on Long Short-Term Memory (LSTM) recurrent neural networks was implemented to model sequential dependencies through internal memory mechanisms. The model input corresponds to the RSR, the input tensor is arranged as a sequence along the coefficient dimension, enabling its processing by the recurrent architecture. Additionally, two state-of-the-art architectural variants were evaluated for anomaly classification. The first is a Bidirectional LSTM (BiLSTM) with temporal attention, which processes the sequence in both forward and backward directions and assigns adaptive weights to each position, emphasizing relevant components of the sequence [48]. The second is a hybrid CNN+LSTM architecture, in which 1D convolutional layers extract local patterns over the coefficient representation prior to sequential modeling [49].

5.3.3. Wavelet Energy Logistic Regression (WELR)

The goal of this method is to identify the set of channels responsible for the detected fault (

{\hat{y}}_{c h} = {0, 1}^{F}

), allowing simultaneous faults in multiple channels. This algorithm is based on the RSR data representation, however, rather than using the full coefficient tensor of detail coefficients

D_{f}

, a compact feature vector

E \in R^{F}

is computed for each channel as the energy of the detail coefficients:

E_{f} = \sum_{k = 1}^{K} D_{f, k}^{2}

(2)

Using these features, 15 independent logistic regression models were trained to estimate the probability of anomaly per channel

{\hat{p}}_{c h}

in a One-vs-Rest strategy through a sigmoid function as seen in equation 3. This algorithm was selected due to it’s low computational cost and required trainable parameters per channel (only 15 weights and 1 bias 128 bytes).Finally, a threshold-based decision is applied such that

{\hat{y}}_{c h} = 1

if

{\hat{p}}_{c h} > τ_{c h}

, where

τ_{c h}

is a tunable threshold.

{\hat{p}}_{c h} = σ (w_{c h}^{⊤} E + b_{c h})

(3)

5.3.4. Wavelet Convolutional Neural Network (WCNN)

This proposed lightweight CNN was implemented using the RSR input tensor described in Section 5.2.2. Subsequently, the coefficients were normalized using the mean and standard deviation calculated over the training set to ensure numerical stability and improve convergence. These normalized tensors were used as input to the proposed lightweight 1D CNN, whose output is the anomaly probability

{\hat{p}}_{b} \in [0, 1]

.The model was trained minimizing binary cross-entropy using the ADAM optimizer over 30 epochs with a batch size of 64. The complete architecture is shown in Figure 5 Finally, a threshold decision then yields

{\hat{y}}_{b} = 1

if

{\hat{p}}_{b} > τ

. The resultant model comprises 665 trainable parameters and is designed to learn local patterns between adjacent DWT coefficients, considering all 15 sensors/channels.

Figure 4. CNN 1D architecture for binary classification.

5.3.5. Wavelet Multilabel CNN (WMCNN)

The proposed method is a classifier based on the RSR representation, similar to the previous method. The input to the proposed WMCNN is the full tensor of detail coefficients

D_{f}

, followed by three convolutional layers used to learn local patterns, abstract combinations of features, and dependencies between sensors. The output is a vector of estimated fault probabilities

{\hat{p}}_{c h} = [{\hat{p}}_{1}, {\hat{p}}_{2}, \dots, {\hat{p}}_{F}] \in {[0, 1]}^{F}

, naturally supporting concurrent faults. Finally, independent thresholds were established for each channel, such as

{\hat{y}}_{c h, f} = 1

if

{\hat{p}}_{c h, f} > τ_{f}

.

Figure 5. CNN 1D architecture for multilabel classification.

Among the evaluated algorithms, only WCNN for detection (stage 1) and WELR for identification (stage 2) were selected for deployment, as seen in the complete framework in Figure 6. The selection criteria is related with a trade-off between performance and on-board implementation feasibility.

6. Embedded Implementation and Validation

Following the flowchart shown in Figure 7, the MIL-simulated framework has undergone several modifications to validate fault detection and identification within the simulation environment. However, to validate the proposed framework, Software-In-the-Loop and Processor-In-the-Loop test benches were configured to ensure a smoother transition from simulation to implementation on a real embedded system.

6.1. Software-in-the-Loop

Software-in-the-loop simulation ensures that the model with anomalies tests function correctly in a real deployment environment. This stage assumes that the application interface is an STM32-based embedded system. Therefore, an embedded C code generator is used, as it is already implemented in Simulink.

To use this tool, the language was set to C99 (ISO) due to its widespread use in embedded boards. The parameters are treated as atomic units. The solver was set to fixed step with a Real-Time ODE1b. The entire detection block is generated within the model’s control block itself. Therefore, the corresponding file is generated, which will then be used by the C-caller to execute it in each time step. This setup is particularly useful to validate data packetization and to compare the performance onboard and offline of both the controller and the anomaly detector.

6.2. Processor-in-the-Loop

Following SIL validation, the PIL environment replaces the software-based C execution with actual embedded hardware. The Simulink simulation, including orbital dynamics, fault injection, and sensor modeling, continues running on the host computer, transmitting 35-byte data packets at a 0.1 s sample time via UDP over a local network. A Raspberry Pi 4 acts as a communication bridge, receiving the UDP packets and forwarding them via I2C (400 kHz) to an STM32F446RET6 microcontroller. Figure 8, illustrates de complete communication pipeline employed for this testing. This MCU was selected based on its heritage in the BIRDS Bus nanosatellite series developed at the Kyushu Institute of Technology for ADCS boards. In fact, the framework was tested on the Enhanced On Board Computer (eOBC) board, a STM32-based payload for the BIRDS RPM mission. The PIL testbed including eOBC can be seen in Figure 9.

Onboard the MCU, both the attitude controller and the FDI system, comprising the binary detector and the fault isolation classifier, execute in real time. The controller computes the magnetic dipole, the fault probability is evaluated, and if an anomaly is detected, the isolation stage identifies the faulty channel. Figure 7 describes this interaction, and also summarizes the model deployment process. Results are packaged into 35-byte response packets and returned to Simulink via Simulink Desktop Real-Time via the Raspberry Pi interface, with the Packet Input Block configured at 0.002 s to minimize data loss.

7. Results

7.1. Dataset Characterization

Figure 10 shows the binary distribution of the generated dataset, displaying the number of windows classified as nominal (class 0) and anomalous (class 1) for a window size of W = 20. The vertical axis indicates the number of windows, and the horizontal axis distinguishes between the two classes. The number of nominal and anomalous windows is approximately equal, with an anomaly ratio of 0.493.

The distribution of injected faults across the 15 channels of the ADCS system is approximately uniform across all channels, with counts ranging from approximately 15,000 to 17,500 windows per channel.

Figure 11 presents the t-SNE two-dimensional projection of all sliding windows, colored by anomaly label. Nominal windows (blue) concentrate along a structured, curvilinear one-dimensional manifold with repeating branched loops, consistent with the quasi-periodic orbital dynamics of the ADCS under fault-free operation. Anomalous windows (red) are predominantly co-located along the same manifold, with only a small number of isolated points projecting to off-manifold regions. This near-complete overlap in the reduced embedding space indicates that the injected faults produce perturbations of insufficient magnitude to displace the system trajectory from its nominal attractor at the representational level, confirming that the fault signatures are not linearly separable from nominal behavior in the raw feature space and motivating the use of learned feature representations for detection.

Figure 12 shows the correlation matrix between the system’s 16 output channels. The color scale ranges from 0.0 (dark blue) to 1.0 (yellow), where the diagonal values correspond to perfect autocorrelation. High internal correlation is observed between the magnetometer axes, as well as between the gyroscope channels, while the solar sensors exhibit moderate correlations with one another.

7.2. Binary Detection Results

Table 6 compares ten anomaly detection methods across RAW, RSR, and HFE representations at the window size W maximizing Fault-class F1 per model. Precision quantifies the rate of true detections among flagged windows, recall quantifies the fraction of actual faults detected, F1 balances both, accuracy reflects the overall fraction of correctly classified windows, and model size in MB serves as a proxy for embedded deployment feasibility.

In the onboard FDIR context, Recall is the domain-critical metric for Stage 1, as undetected faults risk propagating through the attitude control loop. Methods exhibiting high precision at the cost of low recall are therefore unsuitable as primary detectors: Wavelet Thresholding recalls 94.95% yet achieves only 50.54% of precision due to mostly false positives, while Variance-based Thresholding recovers just 66.70% despite reasonable precision.

The WCNN model is selected for Stage 1, achieving 94.75% recall and 94.79% F1. Although the LSTM yields a marginally higher F1 of 95.19%, its recall of 91.51% is inferior and its footprint of 0.432 MB is 28 times larger than the WCNN’s 0.015 MB, making the latter the optimal choice for the memory-constrained target hardware.

Differences in performance were observed in relation with window sizes. As seen in Figure 13 that shows binary accuracy across window sizes W, sequential RSR models maintain peak stability at

W = 20

, whereas RAW thresholding suffers degradation below 55% due to transient dilution. HFE ensembles display a flat, intermediate baseline.

7.3. Multilabel Classification Results

Table 7 compares twelve fault isolation methods across RSR and HFE representations at the window size W maximizing Macro-F1 per model. As Stage 2 operates on windows pre-filtered by the binary detector, its design criterion shifts from recall toward precision and Hamming Score. Macro-Precision and Macro-Recall weight each of the 15 fault channels equally; Macro-F1 balances both into a single per-channel scalar; Hamming Score measures the fraction of correctly predicted labels across all channels and windows. Model size in KB is treated as a hard constraint given the memory budget of the target hardware.

Several methods exhibit inadequate recall despite high precision. Wavelet Thresholds achieve 95.78% precision but only 42.26% recall, and Random Forest reaches 98.00% precision at 57.00% recall, failing to isolate a substantial fraction of fault channels. Tree-based methods present an additional concern: XGBoost and Random Forest occupy 5361.65 KB and 14668 KB respectively, exceeding the feasible memory budget and disqualifying them from deployment regardless of performance.

RSR-based sequential models peak at

W \geq 15

, while ensemble tree-based methods and threshold-based approaches peak at

W = 5

. Wavelet decompositions retain the full time-frequency structure of each channel across sub-bands, and LSTM and CNN architectures exploit inter-timestep dependencies that only become resolvable when the window spans a sufficient portion of the fault transient. HFE methods, by contrast, aggregate each window into a fixed-size statistical feature vector before classification, so larger windows dilute localized fault signatures within the aggregate, reducing discriminative power for tree-based models that operate on feature vectors without temporal ordering.

The WELR model is selected for Stage 2, achieving 85.09% Macro-Precision, 74.69% Macro-F1, and 97.95% Hamming Score at only 2.11 KB. Although the WMCNN yields superior Macro-F1 (87.69%) and Macro-Precision (98.50%), its 516.69 KB footprint exceeds the available hardware budget when combined with the Stage 1 allocation. Logistic Regression thus represents the optimal balance between isolation performance and deployment feasibility on the STM32F446RE.

Additionally, Figure 14 confirms these behavioral groups across all window sizes

W \in {5, 10, 15, 20}

, showing the monotonic degradation of ensemble tree-based and threshold-based methods beyond

W = 5

and the ascending trend of RSR-based sequential models up to

W \geq 15

, similar to what was observed in the detection stage.

7.4. Feasibility

The two-stage framework is assembled from the models selected in Section 7.2 and Section 7.3 according to a formal trade-off criterion: for each stage, the selected model must maximize the stage-specific performance metric subject to the memory constraints of the STM32F446RE target hardware. Stage 1 selects the model maximizing Recall and F1, with CNN Wavelet satisfying both while occupying 15 KB on FLASH and requiring 7.87 KB of RAM during inference. Stage 2 selects the model maximizing Macro-Precision, Macro-F1, and Hamming Score within the remaining memory budget, with WELR meeting this criterion at 2.11 KB on FLASH and 2.37 KB of RAM. Binary detections above

τ = 0.5

are forwarded to Stage 2, which applies an isolation threshold of

τ = 0.7

. The PIL implementation of this framework on the STM32F446RE is analyzed below.

Regarding resource usage, the results show that 85.05 KB of RAM (33.55%) and 356.15 KB of FLASH memory (30.44%) were used in total across the full framework and controller. The latency of the framework’s execution alongside the controller was measured during runtime, averaging 88 ms from all valid samples during a representative PIL run, which represents an ultra-lightweight footprint suitable for onboard deployment.

Figure 15 illustrates two representative fault scenarios of onboard detection. The first one is an erratic anomaly of magnitude

5 σ

injected at second

129.3

for

1.3

seconds in the gyroscope x-axis (channel 10), and a

7 σ

spike anomaly injected at second

224.1

in the magnetometer x-axis (channel 7), confirming successful detection and isolation shortly after fault onset.

7.5. Comparative Summary and Framework Justification

The binary detection results expose a clear separation between method families. Threshold-based OOL methods achieve low recall relative to learned approaches, with Variance-based Thresholding limited to a 75.67% F1 score, constrained primarily by missed fault windows. WCNN and LSTM outperform all remaining methods, reaching F1 scores of 94.79% and 95.19% respectively, while the other RSR-based methods perform comparatively poorly, reflecting the limited linear separability of the detection task established in the dataset characterization.

The multilabel results reflect the interaction between data representation and window size. HFE-based tree models capture discriminative content at short intervals, with XGBoost reaching the highest Macro-F1 of 77.10% at

W = 5

, while RSR-based sequential models require broader temporal context and peak at

W \geq 15

, where BiLSTM with temporal attention attains 75.26%. WELR reaches a competitive 74.69% Macro-F1 at substantially lower complexity, indicating that the dominant limitation is the separability of wavelet-based channel signatures rather than model capacity.

The two-stage framework is preferable to any single-stage alternative because no individual model satisfies both the recall-driven criterion of Stage 1 and the precision and Hamming Score criteria of Stage 2 within the memory budget of the STM32F446RE. The sequential structure further restricts Stage 2 inference to windows already flagged as anomalous, improving isolation precision without adding load to the control loop. The selected pipeline, WCNN followed by WELR, occupies 17.37 KB on FLASH and executes at an average latency of 88 ms, confirming feasibility under the target hardware constraints.

8. Discussion

One of the main contributions of this research is the proposed fault injection methodology, which provides insight into fault modeling and its relationship to anomalies characteristic of an ADCS. The resulting dataset summarized in Table 4 was designed to provide statistically structured coverage of the evaluated fault space under controlled simulation conditions. The distribution of fault types, locations, and magnitudes was defined to approximate operationally plausible anomaly patterns within the simulated scenario, providing a suitable basis for developing and validating FDIR systems in a controlled environment. However, because the dataset was generated through MIL simulation, its representativeness is bounded by the modeled scenario; comparison with in-orbit telemetry or hardware-in-the-loop datasets remains a necessary step before extending these conclusions to operational systems.

Regarding its internal structure, the balanced class ratio in the dataset, with an anomaly rate of 0.493, reduces the training bias typically of imbalanced datasets, while the uniform distribution of faults across the 15 ADCS channels ensures homogeneous coverage without sampling bias. In addition, Figure 11 shows how anomalies tend to blend in with normal patterns. In particular, low-magnitude faults (

5 σ

) remain within the nominal dynamic range, limiting separability at the window level. Furthermore, strong intra-system correlations, especially between the magnetometer axes and magnetorquers, introduce dependencies between channels, causing localized perturbations to produce distributed signatures across multiple telemetry streams. Although wavelet preprocessing partially mitigates this effect by isolating transient components independently per channel, it does not eliminate anomalies that occur simultaneously across correlated subsystems. Together, these characteristics define a challenging detection scenario due to limited class separability and inter-channel correlation.

Grad-CAM Attribution Analysis

To provide an importance attribution for each failure channel, the Gradient-weighted Class Activation Mapping (Grad-CAM) method was applied, which uses the gradients of the classification score with respect to the final convolutional feature map [50]. This method was applied to the WCNN Detection algorithm using the RSR data representation described in Section 5.3.4. Attribution maps were calculated for three representative cases: a true positive (

{\hat{p}}_{b} = 0.938

), a false positive (

{\hat{p}}_{b} = 0.607

, true label = Normal), and a nominal window (

{\hat{p}}_{b} = 0.072

), as shown in Figure 16.

In the confirmed fault case, activation was concentrated in the magnetometer channels (magn1, magn2, magn3) at DWT coefficients 8, 9, 10, and 12, which correspond to mid-to-high frequencies in the DWT spectrum commonly associated with transient spikes and erratic signal behavior. On the other hand, the false positive case exhibited distributed activation across both magnetometer and gyroscope channels, with magnetometers remaining the dominant contributors. This pattern reflects an ambiguous state with inconsistencies among multiple sensors, which could imply simultaneous failures across multiple sensors rather than a single dominant failure signature. Finally, the nominal window produced low-magnitude activation concentrated primarily in the magnetometer channels, but extending toward lower DWT indices (index 2). Thus, while the model monitors the magnetometer signals, it does not extract discriminatory patterns in the absence of anomalies.

Additionally, considering all true positive detections shown in Figure 17, the average Grad-CAM importance identified magn3, magn2, and magn1 as the dominant channels, jointly accounting for approximately 55–60% of the total attribution mass, followed by secondary contributions from the gyroscope channels.

8.1. Model Interpretability and Feature Attribution

Two complementary attribution methods were applied to the three proposed HFE-based classifiers to provide fault-detection traceability. Tree-based impurity importance offers an initial feature ranking but is known to carry an upward bias toward numerically abundant and correlated features [51]. SHAP (SHapley Additive exPlanations) [52] corrects this bias by computing marginal feature attributions for individual predictions, revealing both the magnitude and direction of each feature’s contribution to fault classification. The DT was evaluated at

W = 15

and both XGB and RF at

W = 5

, consistent with the best-performing configurations reported in Section 7.

Global Feature Importance Analysis

Under the impurity criterion, sun sensors and magnetorquers jointly account for 65.4%–83.4% of decision weight across all three models, while magnetometers rank third, as shown in Table 8. SHAP substantially revises this ordering: magnetometers become the dominant subsystem for DT (59.37%) and RF (69.68%), while sun sensor contributions fall from 39.91%–46.89% to 9.51%–29.21%. This reversal reflects the impurity bias toward the six sun sensor channels, which collectively accumulate split-reduction scores through correlated descriptors rather than independent predictive signal. This interpretation is reinforced by the Grad-CAM analysis in Figure 17, where sun sensors contribute negligibly to activation attribution. At the descriptor level, SHAP reveals that rolling standard deviation carries greater marginal contribution than variance for DT (82.03%) and RF (70.48%), correcting the impurity overestimation of variance attributable to its wider numerical range. SHAP also exposes non-zero contributions from skewness, kurtosis, and deviation from rolling mean, all registering zero under impurity, confirming that these physically motivated descriptors are statistically suppressed rather than genuinely absent.

In all SHAP beeswarm plots, each point represents one telemetry window. The horizontal position corresponds to the SHAP contribution in log-odds space, where positive values push the prediction toward the Fault class and negative values favour the Normal class. The color of the points represents the normalized value of the feature after preprocessing, ranging from low values (blue) to high values (red). Wide horizontal dispersion means that the variable interacts with other variables in different ways in different contexts. Compact clusters stand for groups of samples which receive similar attribution under the same decision rules.

The SHAP beeswarm plots in Figure 18, Figure 19 and Figure 20 confirm the dominant role of magnetometer-derived descriptors across all models. The magnetometer standard deviation features (Bout_magn_1/2/3_std) consistently exhibit the largest attribution magnitudes, although their directional behaviour varies across architectures. For XGB, Bout_magn_2_std_5 reaches contributions exceeding

+ 3.3

log-odds units, sufficient to shift the prediction from near-certain Normal to near-certain Fault based on a single feature contribution. The RF beeswarm additionally highlights magnetorquer mean features (mout_mtq_1/2/3_mean) at ranks 4–9, where high normalized values are displaced rightward, indicating that elevated actuator current levels act as complementary fault indicators. Cross-model agreement under SHAP narrows to three consistently dominant descriptors: Bout_magn_1_std, Bout_magn_2_std, and Bout_magn_3_std, consistent with the intra-subsystem coupling among magnetometer axes identified during dataset characterization. Furthermore, the three strongest XGB pairwise SHAP interactions (mean absolute values of 0.882, 0.769, and 0.351) occur entirely within the magnetometer subsystem, confirming that XGB exploits multi-axis geomagnetic covariance as a joint discriminative signal beyond individual channel contributions.

In the XGBoost plot shown in Figure 18, the dominant feature Bout_magn_2_std_5 exhibits a non-linear and context-dependent distribution. Low normalized values (blue) concentrate near

+ 0.4

, whereas high values (red) disperse across both negative and strongly positive SHAP regions, indicating that the fault-driving effect of this descriptor depends on the concurrent state of other telemetry variables. The magnetorquer variance features (mout_mtq_1_var_5, mout_mtq_2_var_5) show a more monotonic behaviour: low values remain concentrated near zero contribution, while high values extend rightward up to approximately

+ 2.3

log-odds, confirming elevated actuator variability as a secondary fault signature. Conversely, mout_mtq_3_var_5 presents an asymmetric inversion, where low normalized values extend toward strongly positive SHAP regions, indicating that unusually suppressed actuator variability can also contribute to fault predictions under specific operating contexts. Gyroscope-related descriptors remain tightly concentrated around zero regardless of colour, confirming their comparatively weak discriminative contribution.

In contrast, the Random Forest classifier produces a narrower attribution profile, as shown in Figure 19. The overall SHAP range spans approximately

- 0.05

to

+ 0.30

, reflecting the averaging effect of the ensemble architecture, where extreme individual tree contributions are attenuated across the forest. The dominant magnetometer standard deviation features (Bout_magn_3/1/2_std_5) exhibit an inverted directional pattern relative to XGB: low-to-moderate normalized values extend rightward toward positive SHAP regions, while high values cluster slightly left of zero. The magnetorquer mean features at ranks 4–9 behave as threshold-triggered indicators, where low values remain concentrated near zero contribution and high values form segmented rightward bands between approximately

+ 0.07

and

+ 0.25

. Deviation and skewness descriptors (mout_mtq_1/3_dev_5, mout_mtq_1_skew_5) remain highly concentrated around zero with limited horizontal spread, contributing only minor corrective adjustments to the ensemble prediction.

The Decision Tree beeswarm plot in Figure 20 is structurally distinguished by discrete vertical blocks rather than continuous clouds, reflecting the rigid threshold-based partitions of a single decision tree. These blocks indicate groups of telemetry windows assigned identical SHAP contributions after traversing the same leaf-node rules. The magnetometer descriptors (Bout_magn_3_std_15, Bout_magn_2_std_15) dominate by a wide margin, where low-to-moderate normalized values cluster within fixed positive attribution bands near

+ 0.15

and

+ 0.19

, with isolated jumps toward

+ 0.40

. In contrast, high normalized values remain concentrated slightly left of zero with mild negative contribution. Sun sensor features (Vout_ss_*_std_15) occupy positions 3–9 but remain concentrated almost entirely at zero contribution, except for small isolated red clusters associated with mild negative attribution. This near-zero concentration is consistent with the impurity bias correction shown in Table 8: SHAP reveals that the six correlated sun sensor channels accumulate split-reduction importance without carrying proportionally independent predictive information.

False Alarm Characterization and Cross-Method Comparison

To analyze the false alarm mechanism at the sample level, SHAP waterfall decomposition was computed for the nominal test sample with the highest predicted fault probability in each classifier. In the waterfall representation, the prediction begins at the baseline

E [f (X)] = 0.051

, which corresponds to the model’s expected output over the training distribution in log-odds space. This baseline represents the unconditioned prior prediction before incorporating information from any individual feature. Each horizontal bar then represents the marginal contribution of one feature, sequentially accumulated to construct the final prediction. Positive contributions increase the fault evidence, whereas negative contributions shift the prediction toward the Normal class. The final value

f (x)

therefore represents the total accumulated evidence for the analyzed telemetry window before conversion into probability through the sigmoid function.

The XGB decomposition shown in Figure 21 reveals that the dominant fault-driving contributions originate from magnetorquer features: mout_mtq_1_var_5 (

+ 2.1

, normalized value 10.502) and mout_mtq_1_std_5 (

+ 0.9

, normalized value 15.998), indicating unusually elevated actuator current variability under otherwise nominal operating conditions. Critically, Bout_magn_2_std_5 contributes

+ 2.25

despite a below-average normalized value of

- 0.129

, indicating that the model has learned that anomalously low magnetometer dispersion co-occurring with high actuator variability amplifies rather than attenuates the fault prediction. This attribution pattern is consistent with the Grad-CAM analysis in Figure 16, where activation was distributed across magnetometer and gyroscope channels in a pattern interpreted as a multi-sensor inconsistency state. SHAP provides the corresponding feature-level explanation for this ambiguity: it is the simultaneous presence of suppressed magnetometer output and elevated actuator current variability that triggers the false detection, constituting a nominal high-activity operating regime that the models cannot reliably distinguish from a fault condition. This failure mode therefore represents an important target for future refinement of the fault taxonomy and decision thresholds.

The SHAP subsystem rankings converge with the Grad-CAM channel importance reported in Figure 17, where magnetometer channels account for approximately 55–60% of the total attribution mass across true positive detections. Both methods reach this conclusion through methodologically independent mechanisms operating on entirely different signal representations, tree-based marginal attribution on rolling statistical features versus gradient backpropagation on DWT coefficient tensors, which constitutes strong evidence that elevated magnetometer signal dispersion is a physically genuine fault signature rather than a model-specific artifact. The divergence on magnetorquers, which are significant under SHAP (9.49%–28.86%) but absent in Grad-CAM, reflects the representational differences between the methods: rolling statistics capture slow MTQ covariance patterns that are not localized into high-gradient DWT activations.

Regarding detection limitations, the consistently low SHAP importance assigned to gyroscope features, 1.93% for DT, 1.13% for RF, and 20.16% for XGB, is consistent with their secondary role in the Grad-CAM analysis and directly explains the reduced gyroscope fault recall observed in the multilabel results. Improved simulation fidelity for gyroscope fault modes and the addition cross-axis correlation descriptors are identified as the principal directions for future work. It should also be noted that this analysis is based on simulated data; therefore, validation using in-orbit data remains an important step for the future.

9. Conclusions

This work presented three main contributions for on-board Fault Detection and Isolation in ADCS for nanosatellites. First, a two-stage onboard FDIR framework tailored for a simulated 1U CubeSat ADCS, integrating a binary WCNN detector with a logistic regression classifier (WELR) for fault isolation, achieving a 94.79% binary F1 score, and 85.09% macro-precision for fault isolation. The proposed framework was benchmarked against ten lightweight detection methods spanning threshold-based baselines, tree-based ensembles operating on handcrafted statistical descriptors, and recurrent and convolutional architectures operating on wavelet detail coefficients.

Second, the framework was evaluated on a statistically validated dataset of 16,590 fault injections drawn from a unified error population at 99% confidence with a 1% margin of error. The dataset was generated through Statistical Fault Injection on a 1U CubeSat MIL simulation, covering six fault types across the 15 signal channels of 11 ADCS sensor and actuator components, with balanced binary distribution and uniform per-channel coverage. Based on this simulated scenario, the selected pipeline was deployed on an STM32F446RE microcontroller in a Processor-in-the-Loop (PIL) configuration running in parallel with the ADCS firmware, employing a total memory footprint of 17.37 KB and an end-to-end latency of 88 ms, demonstrating real-time feasibility under the specific resource constraints of the evaluated hardware platform.

Third, besides computational tractability, safety critical FDIR systems must ensure that detection decisions can be linked to identifiable physical phenomena. This traceability allows ground operators to validate system responses and maintain mission confidence. The interpretability analysis addressed this requirement directly. SHAP attribution applied to the tree-based classifiers and Grad-CAM attribution applied to the WCNN detector independently identified magnetometer rolling standard deviation as the dominant fault-discriminative descriptor. Both methods reached this conclusion despite operating on fundamentally different signal representations and attribution mechanisms. This result provides meaningful, coherent, and convergent evidence that the high signal dispersion from the magnetometer behaves as a significant fault signal within the evaluated simulation environment, rather than as a model-specific artifact, supporting operator-traceable justification for autonomous detection decisions under the injected conditions.

The analysis also exposed a systematic limitation of standard impurity-based feature importance, which incorrectly elevated sun sensor relevance due to feature abundance across six correlated channels. SHAP corrected this attribution and reassigned dominance to magnetometer descriptors, a result with direct implications for sensor prioritization in lightweight FDIR monitoring architectures. False alarm characterization further revealed a specific failure mode: suppressed magnetometer dispersion co-occurring with elevated actuator current variability constitutes a nominal high-activity regime that the current classifiers cannot reliably distinguish from a fault condition.

Three limitations should be acknowledged. First, gyroscope channels contributed only as secondary indicators in binary detection and showed the lowest per-channel recall in the multilabel task, consistent with their low SHAP attribution (1.93% for DT, 1.13% for RF) reported in Section 8. Improved simulation fidelity for gyroscope fault modes and the addition of cross-axis correlation descriptors are identified as the primary directions for future work. Second, the detection pipeline was validated only using simulated telemetry under MIL conditions with PIL deployment validation; therefore, the conclusions remain limited to the evaluated simulated 1U CubeSat ADCS scenario. For instance, the injected drift faults were modeled as discrete-onset bias ramps and therefore do not yet represent progressive long-timescale degradation effects such as radiation-induced sensitivity decay. Third, the results regarding feature importance obtained using SHAP and impurity-based methods are not necessarily applicable to failure profiles generated with different injector configurations. This constitutes an inherent limitation of interpretability analysis performed on synthetic datasets. A sensitivity analysis of fault interpretability as a function of injector parameters is left for future work.

In this scenario, DWT detailed coefficients were the data representation that performed best in distinguishing anomaly patterns; this is attributed to their optimal capture of transient high-frequency components. However, for future work, we propose integrating DWT approximation coefficients into this analysis, as these represent more stationary behavior, which could indicate sensor degradation or bias, information that is valuable for ground-based analysis. Additionally, future work will focus on improving gyroscope fault modeling fidelity, incorporating cross-axis correlation descriptors, and extending drift simulation toward continuous degradation trajectories for prognostic applications such as remaining useful life estimation. Additional validation using hardware-in-the-loop and in-orbit telemetry is also required to assess operational robustness. Although this work focused on the ADCS, the proposed methodology based on wavelet feature extraction and lightweight interpretable classifiers can also be adapted to other nanosatellite subsystems such as EPS and OBC through equivalent telemetry descriptor pipelines.

Author Contributions

Conceptualization, K.W.V.T. and A.H.; methodology, K.W.V.T., A.F.A.V. and C.R.C.C.; software, K.W.V.T., C.R.C.C. and L.X.C.A.; validation, K.W.V.T., A.F.A.V. and F.J.T.C.; formal analysis, K.W.V.T. and C.R.C.C.; investigation, K.W.V.T., C.R.C.C., A.F.A.V., L.X.C.A. and S.M.C.A.; resources, A.H.; data curation, K.W.V.T. and A.F.A.V; writing—original draft preparation, K.W.V.T., C.R.C.C., A.F.A.V., L.X.C.A. and S.M.C.A.; writing—review and editing, A.H., F.J.T.C. and J.R.C.A.; visualization, C.R.C.C, K.W.V.T., A.F.A.V., L.X.C.A. and S.M.C.A.; supervision, A.H. and J.R.C.A.; project administration and funding acquisition, A.H.; All authors have read and agreed to the published version of the manuscript.

Funding

This research has not received external funding.

Data Availability Statement

The source code, models and dataset are available in the following GitHub repository: https://github.com/karen-vidaurre/Anomaly-Detection-for-Nanosatellites.git.

Acknowledgments

The authors would like to express their gratitude to Kyushu Institute of Technology for providing access to its facilities and tools at the Laboratory of Lean Satellite Enterprises and In-Orbit Experiments (LaSeine). In addition, the authors would like to acknowledge the BIRDS RPM mission team for providing access to the eOBC payload board for the deployment of the proposed framework.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADCS	Attitude Determination and Control System
BiLSTM	Bidirectional Long Short-Term Memory
CNN	Convolutional Neural Network
DWT	Discrete Wavelet Transform
FDIR	Fault Detection, Isolation and Recovery
GRU	Gated Recurrent Unit
HFE	Handcrafted Feature Engineering
IMU	Inertial Measurement Unit
LSTM	Long Short-Term Memory
ML	Machine Learning
PCA	Principal Component Analysis
RSR	Reduced Signal Representation
MIL	Model-in-the-Loop
SIL	Software-in-the-Loop
PIL	Processor-in-the-Loop
SVM	Support Vector Machine
t-SNE	t-distributed Stochastic Neighbor Embedding
XGBoost	Extreme Gradient Boosting

Appendix A

Table A1. Fault mathematical modeling.

Fault Type	Mathematical model	Parameters
Spike fault	$n_{f} [k] = n [k] + A δ [k - k_{0}]$ (A1)	Where A is the amplitude and $δ$ is the duration.
Erratic fault	$u_{f} [k] = \{\begin{matrix} u [k], Bern (1 - p_{e}) = 1 \\ g [k], Bern (1 - p_{e}) = 0 \end{matrix}$ (A2)	In this case $p_{e}$ is a fault probability in each sample, $g [k]$ is the affected signal with noise.
Drift fault	$u_{f} [k] = u [k] + b_{0} + A (k - k_{0}) + n [k]$ (A3)	In the equation of this row, $b_{0}$ is an offset, the gain is A and $n [k]$ could be a small noise.
Hardover/Bias fault	$u_{f} [k] = \{\begin{matrix} u [k], k < k_{0} \\ u [k] + A, k \geq k_{0} \end{matrix}$ (A4)	The bias A can be a positive or negative number.
Data loss fault	$u_{f} [k] = \{\begin{matrix} u [k], k \notin [k_{0}, k_{0} + T - 1] \\ 0, \in [k_{0}, k_{0} + T - 1] \end{matrix}$ (A5)	Where T represents the total failure time.
Stuck fault	$u_{f} [k] = \{\begin{matrix} u [k], k < k_{0} \\ u [k_{0} - 1], k \in [k_{0}, k_{0} + T - 1] \end{matrix}$ (A6)	The stuck value takes the last read value.

References

Garcia, B.E.; Oswaldo R Banda-Sayco, G.M.; Ramírez-Revilla, S.A. Technological Readiness and System-Level Maturity of Aerospace Development in Peru: An Engineer Based Systematic Review. technologies 2026, 14, 118. [CrossRef]
Langer, M.; Bouwmeester, J. Reliability of Cubesats - Statistical Data, Developers Beliefs and the Way Forward. In Proceedings of the 30th Annual Conference AIAA/USU Conference on Small Satellites. Utah State University Digital Commons, 2016.
Horne, R.; Mauw, S.; Mizera, A.; Stemper, A.; Thoemel, J. Anomaly Detection Using Deep Learning Respecting the Resources on Board a CubeSat. Journal of Aerospace Information Systems 2023, 20, 859–872. [CrossRef]
Colagrossi, A.; Lavagna, M. Fault Tolerant Attitude and Orbit Determination System for Small Satellite Platforms. Aerospace 2022, 9. [CrossRef]
Perumal, R.P.; Voos, H.; Vedova, F.D.; Moser, H. Small Satellite Reliability: A decade in review. Journal Name 2021.
Yairi, T.; Inui, M.; Yoshiki, A.; Kawahara, Y.; Takata, N. Spacecraft telemetry data monitoring by dimensionality reduction techniques. In Proceedings of the Proceedings of SICE Annual Conference 2010, 2010, pp. 1230–1234.
Tagawa, T.; Yairi, T.; Takata, N.; Yamaguchi, Y. Data monitoring of spacecraft using mixture probabilistic principal component analysis and hidden Semi-Markov models. In Proceedings of the The 3rd International Conference on Data Mining and Intelligent Information Technology Applications, 2011, pp. 141–144.
Tariq, S.; Lee, S.; Shin, Y.; Lee, M.S.; Jung, O.; Chung, D.; Woo, S.S. Detecting Anomalies in Space using Multivariate Convolutional LSTM with Mixtures of Probabilistic PCA. In Proceedings of the Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019; KDD ’19, pp. 2123–2133. [CrossRef]
Cuéllar, S.; Santos, M.; Alonso, F.; Fabregas, E.; Farias, G. Explainable anomaly detection in spacecraft telemetry. Engineering Applications of Artificial Intelligence 2024, 133, 108083. [CrossRef]
Leveugle, R.; Calvez, A.; Maistri, P.; Vanhauwaert, P. Statistical fault injection: Quantified error and confidence. In Proceedings of the 2009 Design, Automation ‘I&’ Test in Europe Conference ‘I&’ Exhibition, 2009, pp. 502–506. [CrossRef]
Martínez, J.; Donati, A. Enhanced Telemetry Monitoring with Novelty Detection. AI Magazine 2014, 35, 37–46. [CrossRef]
Yairi, T.; Takeishi, N.; Oda, T.; Nakajima, Y.; Nishimura, N.; Takata, N. A Data-Driven Health Monitoring Method for Satellite Housekeeping Data Based on Probabilistic Clustering and Dimensionality Reduction. IEEE Transactions on Aerospace and Electronic Systems 2017, 53, 1384–1401. [CrossRef]
Bingqing, F.; Shaolin, H.; Chuan, L.; Yangfan, M. Anomaly detection of spacecraft attitude control system based on principal component analysis. In Proceedings of the 2017 29th Chinese Control And Decision Conference (CCDC), 2017, pp. 1220–1225. [CrossRef]
Cheng, Y.; Gong, Y.; Wang, J.; Xiong, X. Research on Spacecraft Fault Diagnosis and Recovery Architecture. Journal of Physics: Conference Series 2024, 2762, 012064. [CrossRef]
Liu, L.; Tian, L.; Kang, Z.; Wan, T. Spacecraft anomaly detection with attention temporal convolution networks. Neural Computing and Applications 2023, 35, 9753––9761. [CrossRef]
Lakey, D.; Schlippe, T. A Comparison of Deep Learning Architectures for Spacecraft Anomaly Detection. In Proceedings of the 2024 IEEE Aerospace Conference, 2024, pp. 1–11. [CrossRef]
Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding. In Proceedings of the Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’18), 2018, pp. 387–395. [CrossRef]
Baireddy, S.; Desai, S.R.; Mathieson, J.L.; Foster, R.H.; Chan, M.W.; Comer, M.L.; Delp, E.J. Spacecraft Time-Series Anomaly Detection Using Transfer Learning. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 1951–1960. [CrossRef]
Meng, H.; Zhang, Y.; Li, Y.; Zhao, H. Spacecraft Anomaly Detection via Transformer Reconstruction Error. In Proceedings of the International Conference on Aerospace System Science and Engineering; Jing, Z., Ed.; Springer: Singapore, 2020; pp. 351–362. [CrossRef]
Song, Y.; Yu, J.; Tang, D.; Yang, J.; Kong, L.; Li, X. Anomaly Detection in Spacecraft Telemetry Data using Graph Convolution Networks. IEEE Transactions on Aerospace and Electronic Systems 2022, pp. 1–6. [CrossRef]
Tuli, S.; Casale, G.; Jennings, N.R. TranAD: deep transformer networks for anomaly detection in multivariate time series data. Proc. VLDB Endow. 2022, 15, 1201–1214. [CrossRef]
Meng, H.; Li, Y.; Zhang, Y.; Zhao, H. Spacecraft Anomaly Detection and Relation Visualization via Masked Time Series Modeling. IEEE Access 2019, 8, 1–7. [CrossRef]
Yu, B.; Yu, Y.; Xu, J.; Xiang, G.; Yang, Z. MAG: A Novel Approach for Effective Anomaly Detection in Spacecraft Telemetry Data. IEEE Transactions on Industrial Informatics 2023, 19, 10164–10173. [CrossRef]
Chen, S.; Jin, G.; Ma, X. Detection and analysis of real-time anomalies in large-scale complex system. Measurement 2021, 184, 109929. [CrossRef]
Chensiya. Public_Data. https://github.com/chensiya/public_data, 2024. Accessed: 31 October 2024.
Jin, X.; Wang, H.Q.; Jin, Z.H. Anomaly detection of satellite telemetry data based on extended dominant sets clustering. Journal of Physics: Conference Series 2023, 2489, 012036. [CrossRef]
Ruszczak, B.; Kotowski, K.; Andrzejewski, J.; Musiał, A.; Evans, D.; Zelenevskiy, V.; Bammens, S.; Laurinovics, R.; Nalepa, J. Machine Learning Detects Anomalies in OPS-SAT Telemetry. In Computational Science – ICCS 2023; Lecce, V.; Stankov, S.; Poryadnya, V.; Taniar, D., Eds.; Springer, Cham, 2023; Vol. 14073, Lecture Notes in Computer Science, pp. 257–270. [CrossRef]
Ruszczak, B.; Kotowski, K.; Evans, D.; Nalepa, J. The OPS-SAT benchmark for detecting anomalies in satellite telemetry. Scientific Data 2025, 12, 710. [CrossRef]
Ruszczak, B.; Kotowski, K.; Evans, D.; Nalepa, J. OPSSAT-AD - anomaly detection dataset for satellite telemetry, 2024. [CrossRef]
Herrmann, L.; Bieber, M.; Verhagen, W.J.C.; Cosson, F.; Santos, B.F. Unmasking overestimation: A re-evaluation of deep anomaly detection in spacecraft telemetry. CEAS Space Journal 2024, 16, 225–237. [CrossRef]
Fan, S.; Cui, Z.; Chen, X.; Liu, X.; Xing, F.; You, Z. Magnetic Fault-Tolerant Attitude Control with Dynamic Sensing for Remote Sensing CubeSats. Remote Sensing 2023, 15, 4858. [CrossRef]
Colagrossi, A.; Lavagna, M. Fault Tolerant Attitude and Orbit Determination System for Small Satellite Platforms. Aerospace 2022, 9, 46. [CrossRef]
Horne, R.; Mauw, S.; Mizera, A.; Stemper, A.; Thoemel, J. Anomaly Detection Using Deep Learning Respecting the Resources on Board a CubeSat. AIAA Journal 2023, pp. 119–126. [CrossRef]
Koch, A.; Krstova, A.; Hegwein, F.; Castro De Lera, M.; Ales, F.; Petry, M.; Ali, R.; Mallah, M.; Hili, L.; Ghiglione, M.; et al. On-Board Anomaly Detection on a Flight-Ready System. In Proceedings of the 2023 European Data Handling and Data Processing Conference for Space (EDHPC), 2023, pp. 568–577. [CrossRef]
Szibbo, D. Advances in Applied Onboard Machine Learning for Autonomous Space Systems. In Proceedings of the 20th Australian International Aerospace Congress, Melbourne, Victoria, Australia, Feb. 2023. ISBN: 978-1-925627-66-4.
Abdel Aziz, T.S.; Salama, G.I.; Mohamed, M.S.; Hussein, S. Spacecraft fault detection and identification techniques using artificial intelligence. Journal of Physics: Conference Series 2023, 2616, 012025. [CrossRef]
Murphy, J.; Buckley, M.; Buckley, L.; Taylor, A.; O’Brien, J.; Mac Namee, B. Deploying Machine Learning Anomaly Detection Models to Flight Ready AI Boards. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 6828–6836.
Colagrossi, A.; Brandonisio, A.; Lavagna, M. Autonomous Fault Management in Attitude Determination and Control Subsystems: Hardware and Processor In the Loop Testing. In Proceedings of the 75th International Astronautical Congress (IAC), Milan, Italy, October 2024. IAC-24,C1,IPB,36,x91025.
Crotti, E.; Colagrossi, A. Machine Learning Approaches for Data-Driven Self-Diagnosis and Fault Detection in Spacecraft Systems. Applied Sciences 2025, 15, 7761. [CrossRef]
Katsube, S.; Sahara, H. Toward an Onboard Anomaly Detection and Identification Method for Satellites. IEEE Access 2025, 13, 134655–134668. [CrossRef]
Jan, S.U.; Lee, Y.D.; Shin, J.; Koo, I. Sensor Fault Classification Based on Support Vector Machine and Statistical Time-Domain Features. IEEE Access 2017, 5, 8682–8690. [CrossRef]
Kanavouras, K. Design of Fault Detection, Isolation and Recovery in the AcubeSAT nanosatellite. PhD thesis, Aristotle University of Thessaloniki, 2021. [CrossRef]
Bergner, P.; Posch, A.; Reggio, D. GAFE Methodology: Generic AOCS/GNC Techniques & Design Framework for FDIR. Technical Report GAFE-UM-D7.5a, European Space Agency (ESA), Noordwijk, The Netherlands, 2018. Copyright © European Space Agency 2018. Authored by Airbus Defence & Space and Universität Stuttgart (iFR).
Gundecha, D.; Gavhane, N.; Dubey, V.; Joshi, S.; Karve, P.; Avadhanam, A.; Singh, A.K.; Marathey, C.; Goyal, A. Complete Failure Analysis of Attitude Determination and Control System. In Proceedings of the 2021 IEEE Aerospace Conference (50100), 2021, pp. 1–16. [CrossRef]
Komadina, A.; Martinić, M.; Grovs, S.; Mihajlović, Ž. Comparing Threshold Selection Methods for Network Anomaly Detection. IEEE Access 2024, 12, 124943–124973. [CrossRef]
Leveugle, R.; Calvez, A.; Maistri, P.; Vanhauwaert, P. Statistical fault injection: Quantified error and confidence. In Proceedings of the 2009 Design, Automation ‘I&’ Test in Europe Conference ‘I&’ Exhibition, 2009, pp. 502–506. [CrossRef]
Ticona Coaquira, F.J.; Wang, X.; Vidaurre Torrez, K.W.; Mamani Quiroga, M.J.; Silva Plata, M.A.; Luna Verdueta, G.A.; Murillo Quispe, S.E.; Auza Banegas, G.J.; Antezana Lopez, F.P.; Rojas, A. Model-Based Design and Testbed for CubeSat Attitude Determination and Control System with Magnetic Actuation. Applied Sciences 2024, 14. [CrossRef]
Zou, C.; Yuan, A.; Hu, J. BiLSTM-Based Anomaly Detection in Multivariate Time Series with Attention Mechanism and Dual Analysis. In Proceedings of the 2024 IEEE 7th International Conference on Information Systems and Computer Aided Education (ICISCAE), 2024, pp. 379–384. [CrossRef]
Nivaashini, M.; Aarthi, S.; Ramya, R.S. MalNet: Detection of Malwares Using Ensemble Learning Techniques. In Proceedings of the 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE, 2023, pp. 1469–1474. [CrossRef]
Selvaraju, R.R.; et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626.
Strobl, C.; Boulesteix, A.L.; Zeileis, A.; Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 2007, 8, 25. [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, 2017, Vol. 30, pp. 4765–4774.

Figure 1. Fault matrix interpretation.

Figure 2. Fault injector model.

Figure 3. Model-in-the-Loop environment simulation with fault injections.

Figure 6. Proposed two-stage framework for anomaly detection and identification.

Figure 7. Software and hardware validation framework.

Figure 8. Hardware implementation block diagram.

Figure 9. Testbed for Processor-in-the-Loop.

Figure 10. Binary dataset balance.

Figure 11. T-SNE 2D projection.

Figure 12. Output signal correlation heatmap.

Figure 13. Accuracy as a function of sliding window size W for all evaluated binary detection models.

Figure 14. Macro-F1 score as a function of sliding window size W for all evaluated multilabel models, illustrating the divergent sensitivity patterns between HFE-based and RSR-based representations.

Figure 15. Time-domain telemetry and anomaly detection results from the PIL implementation. The top and middle subplots show the gyroscope and magnetometer measurements, respectively, with shaded regions indicating injected fault intervals. The bottom subplot presents the anomaly detection output, including the binary fault probability

{\hat{p}}_{b}

and per-channel isolation probabilities, along with the decision thresholds (

τ = 0.5

for binary detection,

τ = 0.7

for channel isolation). Channel-level responses are emphasized over the binary output to highlight fault localization behavior. In the bottom subplot, the red curve corresponds to the gyroscope x-axis channel (gyro1), while the blue curve represents the magnetometer x-axis channel (magn1).

Figure 15. Time-domain telemetry and anomaly detection results from the PIL implementation. The top and middle subplots show the gyroscope and magnetometer measurements, respectively, with shaded regions indicating injected fault intervals. The bottom subplot presents the anomaly detection output, including the binary fault probability

{\hat{p}}_{b}

and per-channel isolation probabilities, along with the decision thresholds (

τ = 0.5

for binary detection,

τ = 0.7

for channel isolation). Channel-level responses are emphasized over the binary output to highlight fault localization behavior. In the bottom subplot, the red curve corresponds to the gyroscope x-axis channel (gyro1), while the blue curve represents the magnetometer x-axis channel (magn1).

Figure 16. Grad CAM analysis for WCNN detector.

Figure 17. Grad-CAM channel importance per sensor channel.

Figure 18. Global SHAP beeswarm plot for the XGBoost classifier (

W = 5

, top 10 features). Horizontal position indicates SHAP contribution magnitude and direction toward Fault (positive) or Normal (negative). The color of the points represents the normalized value of the characteristic, ranging from the lowest (blue) to the highest (red).

Figure 18. Global SHAP beeswarm plot for the XGBoost classifier (

W = 5

, top 10 features). Horizontal position indicates SHAP contribution magnitude and direction toward Fault (positive) or Normal (negative). The color of the points represents the normalized value of the characteristic, ranging from the lowest (blue) to the highest (red).

Figure 19. Global SHAP beeswarm plot for the Random Forest classifier (

W = 5

, top 10 features). Axes and colour convention follow Figure 18.

Figure 19. Global SHAP beeswarm plot for the Random Forest classifier (

W = 5

, top 10 features). Axes and colour convention follow Figure 18.

Figure 20. Global SHAP beeswarm plot for the Decision Tree classifier (

W = 15

, top 10 features). Axes and colour convention follow Figure 18.

Figure 20. Global SHAP beeswarm plot for the Decision Tree classifier (

W = 15

, top 10 features). Axes and colour convention follow Figure 18.

Figure 21. SHAP waterfall decomposition for the highest-confidence false positive case in the XGBoost classifier (

W = 5

,

\hat{y} = 0.999

, true label = Normal,

E [f (X)] = 0.051

). Each bar shows one feature’s contribution to the deviation from the model baseline. Normalized feature values are shown on the left axis. Positive contributions (red) push toward Fault; negative contributions (blue) push toward Normal.

Figure 21. SHAP waterfall decomposition for the highest-confidence false positive case in the XGBoost classifier (

W = 5

,

\hat{y} = 0.999

, true label = Normal,

E [f (X)] = 0.051

). Each bar shows one feature’s contribution to the deviation from the model baseline. Normalized feature values are shown on the left axis. Positive contributions (red) push toward Fault; negative contributions (blue) push toward Normal.

Table 1. Summary of Ground-Based (GB) and On-Board (OB) based anomaly detection approaches applied to spacecraft missions.

Mission / Spacecraft Context	Dataset Source	Deployment	Detection Method	Year of Publication	Validation Strategy	Explainability Support
ESA XMM-Newton [11]	Telemetry (multi-variate)	GB	OOL	2014	Offline	✗
JAXA SDS-4 / Attitude Control Sys. [12,13]	In-orbit telemetry	GB	PCA, Clustering, OCSVM	2017	Offline	✗
NASA SMAP / MSL [14,15,16,17,18,19,20,21,22,23]	Telemetry (multi-variate)	GB	LSTM, ResNet, GRU, Transformers, etc.	2018-2024	Offline	✗
Real telemetry system (in-orbit) [24,25]	On-orbit telemetry	GB	CF-LSTM	2021, 2024	Offline	✗
China CASC Tianping-2B [26]	Magnetometer real in-orbit telemetry	GB	Clustering	2023	Offline	✗
ESA OPS-SAT [27,28,29]	On-orbit telemetry	GB	SVM, KNN, PCA, OCSVM	2023-2025	Offline	✗
ESA Sentinel-1 [30]	Telemetry (multi-variate)	GB	PCA, KNN, OCSVM, LSTM	2024	Offline	✓ Supported (LIME Algorithm)
TY-Space Corporation, OPS-SAT [31,32]	Simulated / MIL / lab data	OB	GVCA + Kalman filter + QUEST + statistical FDIR	2022–2023	MIL + HIL	✗
EduSat / NanoAvionics flight-like satellite [33]	Lab-generated	OB	ANN, CNN, LSTM, OOL	2023	PIL / HIL	✗
ESA and SmallSat missions [34,35]	Real / in-orbit telemetry	OB	AE, LSTM, LDA, OC-SVM, PCA, CNN	2023	MIL, HIL, FPGA, in-orbit validation	✗
EgyptSat-2, PROBA-V, LightSail-2, SMAP [36,37]	Simulated + archived / in-orbit telemetry	OB	AE, LSTM, HybridAE, KPCA-SVM, HVM-SVM	2023–2024	MIL+ On-orbit/HIL + hardware validation	✗
CubeSat ADCS simulators [38,39]	Simulated + HIL/PIL	OB	Model-based + hybrid ML (SVM/ANN)	2024–2025	MIL, PIL, HIL	✗
ADCS nanosatellite in LEO [this work]	Model-based simulation	OB	OOL, SVM, PCA, Light CNN	2025	MIL + PIL	✓ Supported (XAI)

Table 2. Fault classification.

Fault Type	Description
Spike fault	Sudden isolated deviation (outlier) in sensor output.
Erratic fault	Significant increase in noise level, irregular variations above nominal.
Drift fault	Gradual increase or decrease over time from the nominal value.
Hardover/Bias fault	Sudden offset (step change) from nominal state, persistent afterwards.
Data loss fault	Missing data intervals, creating gaps in time series.
Stuck fault	Output remains frozen at a fixed value (loss of dynamics).

Table 3. Realistic failures for ADCS sensors and actuators.

Component	Failure mode	Realistic fault
Gyroscope	Erratic	Intermittent short-circuit or increased broadband noise due to analog front-end damage.
	Drift	Thermal-induced bias drift slowly accumulating over time (bias ramp).
Magnetometer	Hardover	Permanent bias due to local magnetic contamination or sensor damage.
Sun sensor	Spike	Single-sample bright-glint or SEU-induced spike in sun-angle output.
	Data Loss	Output pinned to zero despite being sunlit due to communication or readout failure.
Magnetorquer	Stuck	Actuator stuck ON/OFF due to MOSFET failure, or control signal remaining constant because of DAC or PWM driver malfunction.

Table 4. Dataset generation.

Variable	Description	Categories	Range
Time	The time for a complete simulation for 40000 $[s]$	$f (t) [0]$	$0 - 40000$
Sun sensors	The information from the 6 sun sensors	$f (t) [1 : 6]$	$0 - 4095$
Magnetometer	The magnetometer has data for 3 axis	$f (t) [7 : 9]$	$32520 - 33033$
Gyroscope	The gyroscope has data for 3 axis	$f (t) [10 : 12]$	$32742 - 32791$
Magnetorquer	The data for 3 magnetorquers	$f (t) [13 : 15]$	$- 0.094 - 0.0091$

Table 5. Summary of sensors and actuators modeled in the MIL simulation.

Component	Model / Type	Resolution	Math Symbol	Sampling Time (s)
Magnetometer	MMC5603NJ	16-bit per axis	^$ℬ$ $B_{Earth}$	0.1
Coarse Sun Sensor (x6)	SLCD-61N8	12-bit (limited by STM32F411RE ADC)	$V_{C S S}$	0.1
Gyroscope	L3GD20	16-bit per axis	^$ℬ$ $ω_{B / N}$	0.01
Magnetorquers (x3)	Air core magnetorquers	12-bit command	^$ℬ$ $m$	0.01

Table 6. Binary fault detection: performance and model size comparison across evaluated methods, evaluated at the window size W maximizing Fault-class F1. Bold entries indicate the Stage 1 selected model.

Algorithm	Data Rep.	W	Precision (%)	Recall (%)	F1 (%)	Accuracy (%)	Size (MB)
Variance-based Thresholding	RAW	5	86.50	54.04	49.95	74.38	<0.01
Adaptive Variance Thresholding	RAW	5	80.32	71.44	80.32	81.78	<0.01
Decision Tree	HFE	15	80.00	69.00	72.00	84.86	0.016
XGBoost	HFE	5	84.00	82.00	83.00	89.03	0.425
Random Forest	HFE	5	80.00	74.00	76.00	85.96	0.674
Wavelet Thresholding	RSR	20	50.54	94.95	65.97	51.48	<0.01
LSTM	RSR	20	95.75	95.39	95.43	95.44	0.432
WCNN	RSR	20	95.21	94.75	94.79	94.81	0.015

Table 7. Multilabel fault channel isolation: performance and model size comparison across evaluated methods, at the window size W maximizing Macro-F1 per model. Bold entry indicates the method selected for Stage 2 of the proposed framework.

Algorithm	Data Rep.	W	Macro-Precision (%)	Macro-Recall (%)	Macro-F1 (%)	Hamming Score (%)	Size (KB)
Wavelet Thresholds	RSR	20	95.78	42.26	54.05	97.27	0.15
LSTM	RSR	15	76.58	70.67	72.99	98.12	1494.69
GRU	RSR	15	79.79	70.12	74.16	98.24	1372.39
BiLSTM + Temporal Attention	RSR	15	79.79	72.24	75.26	98.30	1886.83
CNN+LSTM	RSR	20	74.16	65.06	68.62	98.19	884.14
Variance-based Thresholding	RSR	5	85.22	51.33	61.09	95.26	<0.1
Adaptive Variance Thresholding	RSR	10	69.05	56.16	57.89	93.85	<0.1
Decision Tree	HFE	15	85.00	66.00	73.73	99.29	365.55
XGBoost	HFE	5	96.00	66.00	77.10	99.42	5361.65
Random Forest	HFE	5	98.00	57.00	69.71	99.29	14668.00
WELR	RSR	20	85.09	67.65	74.69	97.95	2.11
WMCNN	RSR	20	98.50	80.19	87.69	98.92	516.69

Table 8. Aggregated feature importance by sensor subsystem under impurity and SHAP criteria (%). DT at

W = 15

; XGBoost and RF at

W = 5

.

Table 8. Aggregated feature importance by sensor subsystem under impurity and SHAP criteria (%). DT at

W = 15

; XGBoost and RF at

W = 5

.

	Impurity Importance (%)			SHAP Importance (%)
Sensor Subsystem	DT	XGB	RF	DT	XGB	RF
Magnetometers	9.78	13.40	22.30	59.37	35.33	69.68
Sun Sensors	46.89	39.91	45.88	29.21	15.65	9.51
Magnetorquers	36.50	35.56	29.49	9.49	28.86	19.69
Gyroscopes	6.83	11.12	2.33	1.93	20.16	1.13

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

An Interpretable Anomaly Detection and Identification Framework for Onboard ADCS Fault Management in Nanosatellites

Abstract

Keywords:

Subject:

1. Introduction

2. Related Work

3. Anomaly Characterization

3.1. Anomaly Classification

3.2. Magnitude-Based Anomaly Levels

3.3. Fault Injection

4. Simulation Environment for ADCS Modeling in CubeSats

4.1. Component Level Modeling

5. Anomaly Detection and Classification Framework

5.1. Signal Segmentation and Window Formulation

5.2. Data Representation Strategy

5.2.1. Handcrafted Feature Extraction (HFE)

5.2.2. Raw Sequence Representation (RSR)

5.3. Supervised Anomaly Detection and Identification Algorithms

5.3.1. Tree-Based Classification Models

5.3.2. Recurrent Neural Network Classifier

5.3.3. Wavelet Energy Logistic Regression (WELR)

5.3.4. Wavelet Convolutional Neural Network (WCNN)

5.3.5. Wavelet Multilabel CNN (WMCNN)

6. Embedded Implementation and Validation

6.1. Software-in-the-Loop

6.2. Processor-in-the-Loop

7. Results

7.1. Dataset Characterization

7.2. Binary Detection Results

7.3. Multilabel Classification Results

7.4. Feasibility

7.5. Comparative Summary and Framework Justification

8. Discussion

Grad-CAM Attribution Analysis

8.1. Model Interpretability and Feature Attribution

Global Feature Importance Analysis

False Alarm Characterization and Cross-Method Comparison

9. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

MDPI Initiatives

Important Links

Subscribe