3.3.1. Diagnostic Accuracy Results
First, the experimental results on the CWRU benchmark dataset are presented and analyzed. Adhering to the 5-fold cross-validation protocol for the CWRU dataset, defined in
Section 3.2.2, we report the diagnostic performance of the proposed method under this 5-fold validation, as shown in
Table 5.
To demonstrate the superiority of the proposed approach, we further compare the diagnostic accuracy on the CWRU benchmark with several representative state-of-the-art (SOTA) fault diagnosis methods. The results of our method and the competing approaches are summarized in
Table 6. Specifically, existing advanced methods evaluated on the CWRU bearing dataset include the Diagnosisformer model based on multi-feature parallel fusion and attention mechanisms proposed in Ref. [
38] (average accuracy = 99.85%), the Probabilistic Shock Response Model (PSRM) proposed in Ref. [
27] (average accuracy = 99.38%), and the Variational Kernel CNN (VKCNN) proposed in Ref. [
39], which achieved 100% accuracy.
In comparison, our proposed SNN-based method (results computed from
Table 5) achieved a high mean validation accuracy of 99.75% with a low standard deviation of ±0.16% (calculated as the average of the testing accuracy across all 5 folds). Although this does not reach the perfect accuracy reported in Ref. [
39], it remains highly competitive, performing on par with other SOTA methods such as PSRM (99.38%) and Diagnosisformer (99.85%). This demonstrates that the proposed method ranks among the leading diagnostic approaches, exhibiting excellent generalization and robustness in classical bearing fault diagnosis tasks.
To further verify the feature separability,
Figure 7 visualizes one cross-validation fold using t-SNE. Distinct fault categories form highly compact and well-separated clusters in the latent space, visually confirming that the model has learned highly discriminative feature representations, which explains the achieved high classification accuracy.
Secondly, for the PU dataset, to rigorously evaluate the model’s generalization capability on more complex, unseen bearings, we conducted a 5-fold cross-validation based on specific bearing instances. As detailed in
Table 3, the dataset was divided into five distinct groups. Each group contained one H, one IR and one OR bearing from the set of real damaged bearings. The validation was conducted using a "Leave-One-Group-Out" strategy, where the model was trained on four groups and tested on the one held-out group.
The overall quantitative result from this 5-fold cross-validation was an average validation accuracy of only 57.47%, with a high standard deviation of 12.55%, as detailed in
Table 7. This significant variance highlights the extreme challenge posed by this dataset, with accuracy ranging from a high of 73.92% (Fold 3) to as low as 36.02% (Fold 5).
The t-SNE visualizations in
Figure 8 visually confirm this difficulty, showing significant overlap and poor separation of the feature clusters across all folds. The aggregate confusion matrix in
Figure 9 quantifies this, showing high confusion, particularly between fault states and the ’Normal’ condition.
The low accuracy observed on the PU dataset could potentially be attributed to three factors, though further investigation is required to confirm these hypotheses. The first factor is related to the Leave-One-Group-Out strategy used in the 5-fold cross-validation. This approach, while effective in many scenarios, might not be ideal for evaluating generalization on more complex and unseen bearings, particularly when the dataset contains a wide range of fault types. It is possible that the specific way in which the dataset was split could have hindered the model’s ability to generalize effectively. The second factor is the potential mismatch between the selected features and the characteristics of the PU dataset. The features used (e.g., Kurtosis, Crest Factor, Spectral Energy) were chosen to be robust to amplitude attenuation in deep-sea environments. These features, however, may not be well-suited for the more subtle and complex fault modes present in the PU dataset, which was collected under standard in-air conditions. Lastly, the subtle differences between fault types in the real bearings of the PU dataset, such as pitting versus indentation, could be difficult to capture using the pulse-based features employed in this study.
To further investigate these potential factors, an additional experiment was conducted focusing solely on the data from Fold 5 (bearings K005, KI21, and KA30), which produced the poorest results in the 5-fold cross-validation. By training and testing the model exclusively on this data, the influence of the cross-validation strategy was effectively isolated. The results indicated that the model still struggled to converge, with a training accuracy of only 42.02% and a test accuracy of 43.27%. This suggests that the low accuracy is not solely attributable to the validation strategy but might instead be related to the feature mismatch and the complexity of fault types in the PU dataset.
These findings suggest that the features optimized for deep-sea environments, which focus on detecting amplitude variations due to fluid damping, may not be the most suitable for distinguishing fault types in a standard in-air dataset like the PU dataset. The features used in the current model, such as Kurtosis, Crest Factor, and Spectral Energy, are robust to amplitude attenuation and are effective at detecting impulsive events, such as bearing faults, in fluid-damped conditions. However, the PU dataset contains real-world faults, such as pitting and indentation, which involve more subtle variations in the signal that may not be captured well by these pulse-centric features. These fault types are likely to exhibit more nuanced changes in the signal that require different types of features for accurate classification.
This analysis points to the importance of tailoring feature extraction methods to the specific characteristics of the dataset, particularly when the fault types are complex and the operating conditions differ significantly from those for which the features were originally optimized.
3.3.2. Noise Robustness Experimental Results
This section presents the results of the experiment described in
Section 3.2.2, which aims to evaluate the robustness of the proposed method under low SNR conditions—a key challenge in subsea operating environments. Following the experimental setup in
Section 3.2.2, the CWRU test samples were contaminated with Gaussian white noise of varying intensities (SNR ranging from 10 dB to -10 dB), while the model was consistently trained on clean data. For the PU dataset, additional noise injection was not conducted, as its baseline diagnostic performance was relatively limited, making further noise degradation uninformative for comparative analysis.
Figure 10 illustrates the accuracy–SNR curves of the proposed method and several baseline approaches on the CWRU dataset [
40]. As shown, all methods maintained high diagnostic accuracy under high-SNR conditions (SNR > 4 dB). However, as noise intensity increased and SNR decreased, the performance of all models declined, albeit at different rates. Among them, SVM exhibited the highest sensitivity to noise and almost completely failed when SNR fell below 0 dB, with its accuracy dropping to approximately 10%. WDCNN and GRU performed slightly better, but their accuracies at 0 dB fell to around 82% and below 96%, respectively.
In contrast, the Proposed Method achieved the best performance across all noise levels and showed remarkable stability under severe noise. At 0 dB, it still achieved 98.5% accuracy; at -4 dB, the accuracy remained 94%, whereas the standard SNN and WDCNN dropped to approximately 82% and 70%, respectively. Even under the extremely harsh condition of -10 dB, the proposed method maintained an accuracy of about 65%. These results clearly demonstrate the strong robustness of the proposed approach under severe noise interference, confirming its ability to effectively extract and discriminate fault features that are heavily masked by noise.
Following the preceding noise robustness experiments, we further validated the proposed model using our self-collected subsea drilling rig dataset. This dataset simulates a deep-sea, high-pressure environment and features a significantly lower signal-to-noise ratio (SNR) compared to the terrestrial benchmark datasets. This enables a more realistic assessment of the model’s diagnostic robustness under harsh operating conditions.The experimental parameter design is similar to that of the previous two datasets. At the encoding layer, we still employ the Gaussian Receptive Field with a width factor
and a competition parameter
, combined with the adaptive threshold
k-WTA mechanism. The AdEx neuron parameters remain unchanged.The SNN training parameters are also largely unchanged; however, the SNN architecture (layer design) was modified to 600-200-120-4. For details, please refer to the
Section 3.2.1.
To ensure stable and comparable performance assessment despite the limited sample size (640 samples), this experiment employed the same 5-fold cross-validation protocol used for the CWRU dataset. The experimental results are summarized in
Table 8.
The proposed model achieved a high average training accuracy of 99.48% and an average validation (test) accuracy of 94.94% (±2.85%), confirming its capability to effectively isolate and identify bearing fault features even in a high-noise, deep-sea environment. To further examine class-wise diagnostic behavior, the aggregated confusion matrix across the 5 folds is presented in
Figure 11. The diagonal dominance of the matrix indicates strong overall discriminative performance, with recall rates of 96%, 91%, 94%, and 98% across the four fault categories. Minor misclassifications are observed between class 1 and class 3 (9%) and between class 2 and class 0 (6%), which collectively account for the observed 94.94% average accuracy. These small deviations are mainly attributed to the intense background noise and signal complexity inherent to real subsea conditions.
Figure 12 further visualizes the t-SNE projection of test samples from the self-collected dataset. Compared to the clearly separated fault clusters observed in the CWRU results, the subsea dataset exhibits slightly overlapping cluster boundaries, particularly between class 1 and class 3, and between class 0 and class 2. Nevertheless, most samples still form distinct and compact clusters, demonstrating that the proposed model maintains strong discriminative power and practical anti-interference capability even under severe low-SNR conditions.
3.3.3. Power Consumption Results
This section presents the results of power consumption experiment, which aims to quantitatively assess the core advantage of the proposed method in terms of low power consumption. Following the estimation method described in
Section 3.2.3, we performed theoretical energy consumption estimates for the proposed SNN, a conventional ANN with the same topology, and a standard SNN (without the
k-WTA mechanism), based on the 45 nm CMOS technology energy benchmark (where each MAC operation in the ANN consumes
and each AC operation in the SNN consumes
). The key to the estimation is measuring the average spike firing rate of the SNN model during inference tasks.
Figure 13 summarizes the energy consumption results of all three models on three datasets during a single inference task.
The proposed method achieved the lowest energy consumption across all three datasets, demonstrating significant energy efficiency advantages. On the CWRU dataset, the estimated energy consumption of the proposed method is only 0.173 nJ, a reduction of 68.72% compared to the equivalent ANN model (0.553 nJ), and a reduction of 26.07% compared to the standard SNN (0.234 nJ). On the more complex PU dataset, although the overall energy consumption increased due to the larger network structure, the proposed method (0.705 nJ) still outperforms the baseline models, reducing energy consumption by 73.6% compared to the ANN (2.671 nJ) and by 28.6% compared to the standard SNN (0.988 nJ). Finally, on the self-collected dataset simulating real deep-sea conditions, the proposed method again achieved the lowest energy consumption (0.487 nJ), reducing energy by 53.1% compared to the ANN (1.038 nJ) and by 26.1% compared to the standard SNN (0.659 nJ).
In conclusion, the proposed method demonstrates the most energy-efficient solution across benchmark conditions (CWRU), complex real-world conditions (PU), and the target operational conditions. This advantage is primarily attributed to the low-amplitude firing and high sparsity enabled by the adaptive-threshold
k-WTA mechanism. Combined with the strong diagnostic accuracy and noise robustness validated in
Section 3.3.1 and
Section 3.3.2, the proposed method exhibits comprehensive advantages in terms of accuracy, robustness, and power consumption, showing potential for low-power monitoring applications in deep-sea environments.