A Dual-Attention CNN–GCN–BiLSTM Framework for Intelligent Intrusion Detection in Wireless Sensor Networks

Laith H. Baniata; Ashraf ALDabbas; Jaffar M. Atwan; Hussein Alahmer; Basil Elmasri; Chayut Bunterngchit

doi:10.20944/preprints202511.1423.v1

Submitted:

18 November 2025

Posted:

19 November 2025

You are already at the latest version

Abstract

Wireless sensor networks (WSNs) are increasingly being used in mission-critical infrastructures. In such applications, they are evaluated on the risk of cyber intrusions that can target the already constrained resources. The traditional intrusion detection systems (IDS) in WSNs are based on machine learning techniques. Such models fail to capture the nonlinear, temporal, and topological dependencies across the network nodes. Consequently, they cause degradation in the detection accuracy and poor adaptability against evolving threats. To overcome these limitations, this study introduced a hybrid deep learning-based IDS that integrated multi-scale convolutional feature extraction, dualstage attention fusion, and graph convolutional reasoning. In addition, bidirectional long short-term memory components are embedded into the unified framework. The proposed architecture captures the hierarchical spatial-temporal correlations in the traffic patterns. This allows making a precise discrimination between the normal and attack behaviors across several intrusion classes. The model has been evaluated on the benchmarking public available dataset and found to attain a higher classification capability in the multiclass scenarios. The model has further been found to outperform the conventional models focusing on the IDS frameworks. In addition, the proposed design is aimed at retaining suitable computational efficiency, which is suitable for edge and distributed deployments. This makes it an effective solution for the next-generation WSN cybersecurity. The overall findings have focused on combining topology-aware learning with multi-branch attention mechanisms for offering a balanced trade-off between interpretability, accuracy, and deployment efficiency for the resource-constrained WSN networks.

Keywords:

wireless sensor networks (WSN)

;

intrusion detection system (IDS)

;

deep learning

;

multiscale convolution

;

graph convolutional networks

;

attention mechanism

;

bidirectional lstm

;

wsn-ds dataset

;

cybersecurit

;

edge computing

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Wireless sensor networks (WSNs) are an essential part of the internet of things that are increasingly becoming common in critical infrastructures, environmental monitoring, healthcare, and industrial control systems [1,2,3,4,5]. These networks have revolutionized the sensing and communication frameworks. Nevertheless, the distributed and energy-constrained nature of these networks makes them susceptible to cyber intrusions [6,7,8]. Some of the commonly found attacks involve denial of service (DoS), jamming, sinkhole, blackhole, and selective forwarding. Such attacks lead to a compromise in the data integrity, network reliability, and real-time decision-making. Thus, the development of efficient and intelligent intrusion detection systems (IDS) is essential for protecting the WSNs while ensuring their energy efficiency and scalability [6,7,8].

The traditional WSN IDS framework predominantly relies on statistical analysis and rule-based systems. The classical machine learning (ML) algorithms like random forest (RF), support vector machine, k-nearest neighbors (kNN), and decision trees (DT) offer interpretability and low computational overhead [9,10,11], yet their performance reduces drastically under the dynamic topologies and non-linear attack patterns which are commonly found in real-world deployments. Some of the recent advancements towards deep learning based IDS have explored architectures like convolutional neural networks (CNNs), long short-term Memory (LSTM), and a hybrid combination of CNN-LSTMs [9,10,11]. In addition, autoencoders and graph neural networks have also been explored for capturing hierarchical and temporal dependencies in the traffic behavior. While offering higher accuracies, these methods exhibit certain limitations: (i) high energy and memory footprints unsuited for low-power sensor nodes, (ii) poor generalization to unseen attack types and evolving network conditions, and (iii) lack of interpretability and privacy preservation in distributed scenarios. Such gaps have been targeted by several studies in recent times.

Houda et al. [12] offered a collaborative federated learning framework that makes use of a secure aggregation protocol for the detection of jamming (including contrast, random, reactive, and deceptive). The study attained an accuracy of ≈99%, yet the focus has been limited to jamming behaviors while further assuming federated learning connectivity. The broader multi-attack generalization and on-device energy and communication costs have not been incorporated in the proposed design. Jeyakumar et al. [13] proposed a hybrid stacked CNN-bidirectional LSTM (BiLSTM). The model has been tuned by an African vulture optimization algorithm and trained using federated learning. The communication and aggregation overheads, along with the robustness of federated learning under, have not been fully quantified in the study. In addition, the interpretability and node-level resource constraints are found to be partially addressed. In Zhou et al. [14], a tabular to image transform has been incorporated by using transfer learning. The article involves MobileNet and Xception with black kite algorithm for the hyperparameter search and ensemble. The model has been found to offer a heavy computational footprint with a lack of explainability and uncertainty for real-world deployments.

Halbouni et al. [15] adopted CNN and LSTM for spatial and temporal features for the classification using three datasets. The models offered limited treatment of the class imbalance and WSN energy/latency constraints. Vinayakumar et al. [16] focused on a systematic deep learning (DNN) benchmarking compared to classical ML using multiple intrusion datasets. The early deep benchmarking datasets were found to lack WSN-specific topology and context modeling for interpretability. The false positive control was found under multi-attack scenarios. Birahim et al. [17] adopted particle swarm optimization (PSO) for the feature and hyperparameter search using an ensemble (RF/DT/kNN) and class imbalance handled using SMOTE-Tomek. The study failed to model the network-structure awareness and temporal dependencies. In addition scalability of the models was not fully addressed. Hakami et al. [18] presented a pipeline having SMOTE for the balancing of Pearson correlation for the feature selection. While using the WSN-DS dataset, the studies offered fair performance yet failed to address the privacy-preserving learning along with computational resource requirements. Alzahrani et al. [19] presented a ConvLSTM model offering benchmarking performance in several datasets. However, it was tailored for UAV networks having poor transferability towards resource-constrained environments. Similarly, a Red Kite Optimization framework [20] was proposed that focused on average ensemble, LCWOA, and hyperparameter tuning. The model relies on heuristic feature selection and moving ensembles without offering temporal and graph modeling. Atitallah et al. [21] Jiang et al. [21], and Saleh et al. [22], collectively enhanced the detection accuracy using fuzzy-graph attention, meta-heuristic optimization, and SGD-based learning. Yet their real-time adaptability and scalability are limited for making lightweight deployments.

A summary of these articles has been presented in Table 1

The studies reviewed above help in highlighting substantial progress in the IDS detection approaches; however, they have several challenges. The federated and optimization-based models, including Federated SCNN-BiLSTM and AVOA, along with Privacy-Preserving FL for jamming, have improved distributed learning, yet they face communication and synchronization bottlenecks. The optimization-driven framework involving CBCTL-IDS, RKOA, AEID, and ST-IAOA-XGBoost has shown high detection accuracies. Yet they remain computationally heavy on the node deployments. The explainable artificial intelligence (XAI) techniques, including PSO-Ensemble and LIME/SHAP, have enhanced the interpretability, yet they lack real-time adaptability and spatial-temporal reasoning. The fuzzy graph attention networks capture the topological relations, yet suffer from high complexity and limited scalability in the constrained WSN environments. Overall, the following research gaps have been identified:

Deployment Realism: Existing models overlook computational and energy limitations of distributed sensor nodes.
Temporal–Spatial Dependency: Most IDS fail to jointly model both the temporal evolution of attacks and spatial correlations among nodes.
Dynamic Adaptation: Static training prevents adaptation to changing traffic distributions and novel intrusions.
Interpretability and Fusion: Few works integrate multi-level feature fusion or interpretable decision mechanisms within hybrid deep architectures.

To overcome these limitations, the proposed framework introduces a multi-branch hybrid deep learning framework optimized for the IDS in WSNs. The model integrated multi-scale CNN blocks, attention-based fusion layers, graph convolutional network (GCN) operations, and BiLSTM for effectively capturing the multi-resolution, spatial, and temporal dependencies in the WSN-DS traffic. The multi-scale CNN layers have been used to extract the hierarchical frequency-temporal patterns. The attention fusion dynamically re-weights the salient spatial-temporal features. The GCN components help in encoding the inter-node topological relationships, and the BiLSTM layers model the bidirectional temporal correlations. These help in enhancing the detection of subtle and evolving attack behaviors. Finally, the dense layers and softmax classifiers produce probabilistic intrusion classifications. Compared to previous models, the proposed framework is lightweight, modular, and optimized for both the centralized and distributed detection schemes. It helps in offering improved generalization towards unseen intrusions and maintains computational efficiency in real-time edge deployments. Overall, the study contributes as follows:

It integrates multi-scale CNN, attention fusion, GCN, and BiLSTM to capture comprehensive spatio-temporal dynamics of WSN traffic.
The model learns hierarchical and context-aware embeddings that improve separability between normal and anomalous traffic. This is attained through multi-branch feature extraction and adaptive attention weighting.
Introduces advanced preprocessing and normalization steps to ensure stability.

The rest of the article has been structured as follows. Section II offers a detailed methodology framework incorporating preprocessing, model, and evaluation details. Section III provides results along with a critical analysis of the findings and validation against the benchmarking models. Section IV concludes the study along with a potential future roadmap.

2. Materials and Methods

2.1. Dataset Description

The study employed the WSN-DS dataset for carrying out the experimentation and evaluations. The WSN-DS dataset, developed by Almomani et al. [24], is specifically designed for the detection of Denial-of-Service (DoS) attacks and consists of 374,661 records, with approximately 9% labeled as DoS incidents. This dataset was constructed using the LEACH protocol, a widely adopted hierarchical routing protocol in Wireless Sensor Networks (WSNs), and encompasses both normal network behavior and four distinct types of DoS attacks: Grayhole, Blackhole, TDMA, and Flooding. Data collection was performed using Network Simulator 2 (NS-2), and the resulting traces were processed to extract 18 relevant features. Due to its comprehensive structure and labeled attack scenarios, the WSN-DS dataset serves as a valuable benchmark for researchers developing intrusion detection strategies and enhancing the security of WSNs [25].

The dataset has been treated as a benchmark corpus for the IDS in the WSNs. Each of the records in the data comprises 18 continuous-valued attributes that represent traffic, energy, and protocol-level indicators. These are followed by a categorical class label

y \in {C_{1}, C_{2}, \dots, C_{5}}

corresponding to different attack or normal states. The data attributes are represented as follows:

D = {(x_{i}, y_{i}) ∣ x_{i} \in R^{18}, y_{i} \in {1, \dots, 5}, i = 1, \dots, N}

(1)

where N denotes the total number of samples. The dataset was partitioned into training and testing subsets with an 80:20 ratio:

D_{t r a i n} \cup D_{t e s t} = D, D_{t r a i n} \cap D_{t e s t} = \emptyset

(2)

Algorithm 1 Proposed Intrusion Detection Framework

Require:: Dataset $D$ , learning rate $η$ , epochs E, batch size B
Ensure:: $Θ^{★}$
1:: Split $D \to (D_{train}, D_{val}, D_{test})$
2:: Apply Min–Max normalization; rank features by $χ^{2}$
3:: Reshape inputs to $X \in R^{k \times 1}$
4:: for $e = 1$ to E do
5:: Forward →:
6:: Apply 1D convolutions with kernel sizes ${3, 5}$ to $X$ to obtain $C_{i}$
7:: Concatenate ${[C_{i}]}_{i}$ , apply batch normalization and dropout $\to \tilde{C}$
8:: Compute spatial attention $A_{s}$ on $\tilde{C}$ , temporal attention $A_{t}$ on $A_{s}$
9:: Build graph features $H$ via GCN( $A_{s}, A_{t}$ ); obtain $h$ via BiLSTM( $H$ )
10:: Compute logits $\hat{y} = softmax (W h + b)$
11:: Backward ⇐:
12:: Compute gradients $\nabla_{Θ} L$ and update $Θ \leftarrow Θ - η \nabla_{Θ} L$ (Adam)
13:: Validate on $D_{val}$ and save best checkpoint
14:: end for
15:: Test on $D_{test}$ ; report accuracy, precision, recall, and F1

The following features are the part of dataset as described in Table 2:

2.2. Design Framework

The proposed framework entails multi-scale convolutional filters, attention-based fusions, and BiLSTM for the extraction of spatio-temporal dependencies in the WSN traffic. The overall structure of the proposed IDS framework has been presented in Algorithm 1.

2.3. Data Preprocessing

To allow numerical stability and optimal convergence, the features have been subjected to normalization by using mix-max scaling as follows:

x^{'} = \frac{x - min (x)}{max (x) - min (x)} \in [0, 1]

(3)

Feature selection was achieved using the

χ^{2}

statistical test. For each feature

f_{j}

, the relevance score was computed as:

χ^{2} (f_{j}) = \sum_{i = 1}^{m} \frac{{(O_{i j} - E_{i j})}^{2}}{E_{i j}}

(4)

where

O_{i j}

and

E_{i j}

denote observed and expected frequencies across class distributions. The top-16 features with highest

χ^{2}

scores were retained:

X^{'} = {SelectKBest}_{χ^{2}, k = 16} (X)

(5)

Additionally, label encoding was applied to the target variable, which was categorical in nature and therefore unsuitable for direct use in ML algorithms. Label encoding is a common preprocessing technique that transforms categorical labels into numerical representations, making them more suitable for algorithmic processing—particularly when the target classes are limited and discrete. The WSN-DS dataset includes five target categories: Flooding, TDMA, Grayhole, Blackhole, and Normal. Each class was assigned a unique numerical identifier, as summarized in Table 3, to ensure compatibility with supervised learning models.

2.4. Feature Engineering

Feature transformation for temporal learning was performed via 3D reshaping:

X^{''} \in R^{n \times k \times 1}, where k = 16

(6)

This embedding enables convolutional and recurrent layers to explore the localized dependencies.

2.5. Model Design

The proposed hybrid framework

M (Θ)

constitutes a multi-branch. It is a hierarchically coupled architecture designed to capture spatio–temporal–spectral correlations.

Θ

denotes the complete set of learnable parameters of the model

M

, including convolutional kernels, recurrent weights, normalization matrices, and bias terms. The model comprises four principal computational entities: a Multi-Scale Convolutional Block for local feature extraction, an Attention Fusion Layer for dynamic context weighting, a Graph Convolutional Module for structural regularization, and a Bidirectional LSTM with Contextual Attention for temporal propagation modeling. These integrated modules help in collectively representing the learnable tensors that involve convolutional kernels, recurrent weights, normalization matrices, and bias offsets. Overall, the model design has been depicted in Figure 1.

2.5.1. Multi-Scale Convolutional Block

Given a normalized sequence tensor

X \in R^{k \times 1}

that represents the compacted feature manifold. The multi-scale convolutional extractor performs convolutions at multiple receptive field scales for capturing heterogeneous spatial dependencies:

\begin{matrix} F_{1} & = σ (N_{1} (W_{1} *_{3} X + b_{1})), \end{matrix}

(7)

\begin{matrix} F_{2} & = σ (N_{2} (W_{2} *_{5} X + b_{2})), \end{matrix}

(8)

\begin{matrix} F_{3} & = σ (N_{3} (W_{3} *_{7} X + b_{3})), \end{matrix}

(9)

where

*_{n}

denotes 1-D convolution with kernel size n,

N_{i} (\cdot)

indicates batch normalization, and

σ (\cdot)

is the ReLU activation. The multi-scale responses are concatenated into a composite feature tensor:

F_{m s} = [F_{1} ∥ F_{2} ∥ F_{3}] \in R^{k \times d_{m s}},

(10)

where

d_{m s}

denotes the total concatenated dimensionality. A dropout mapping

D_{p}

with stochastic rate

p = 0.3

is subsequently applied to prevent co-adaptation:

{\tilde{F}}_{m s} = D_{p} (F_{m s}) .

(11)

This operation enforces robustness to local perturbations. In addition, it preserves gradient stability across convolutional depths.

2.5.2. Dual-Stage Attention Fusion

The fused features

{\tilde{F}}_{m s}

are passed into a dual-stage attention mechanism. These have been designed to disentangle spatial and temporal significance within the feature domain. Let

Q_{s}, K_{s}, V_{s} \in R^{T \times d_{k}}

represent the query, key, and value embeddings for the spatial attention subspace:

A_{s} = softmax (\frac{Q_{s} K_{s}^{T}}{\sqrt{d_{k}}} + M_{s}) V_{s},

(12)

where

M_{s}

is a learned bias mask regulating sparsity across nodes. The temporal refinement stage analogously computes:

A_{t} = softmax (\frac{(Q_{t} K_{t}^{T}) W_{τ} + B_{τ}}{\sqrt{d_{k}}}) V_{t},

(13)

where

W_{τ}

introduces a learnable transformation capturing cross-time contextual drift. The joint fused representation is then expressed as:

A_{f u s e d} = α A_{s} + (1 - α) A_{t} + λ (A_{s} ⊙ A_{t}),

(14)

where

α

and

λ

are trainable coupling coefficients, and ⊙ denotes element-wise interaction. This composite fusion reinforces both spatially localized and temporally evolving intrusion cues.

2.5.3. Graph Convolutional Regularization

To embed topological priors of the WSN, an adjacency matrix

A \in R^{n \times n}

is constructed. This encodes communication reachability among sensor nodes. The spectral graph convolution for layer l is formulated as:

H^{(l + 1)} = ξ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)} + γ H^{(l)}),

(15)

where

\tilde{A} = A + I

ensures self-loops,

{\tilde{D}}_{i i} = \sum_{j} {\tilde{A}}_{i j}

is the degree matrix,

ξ (\cdot)

is a nonlinear mapping (ReLU), and

γ

is a residual stability factor.

This operation integrates both direct communication correlations and latent dependencies that are inferred using the higher-order neighborhoods.

2.5.4. Bidirectional LSTM with Contextual Attention

To capture temporal recurrency and bidirectional dependencies, the model employs forward and backward LSTMs defined as:

\begin{matrix} {\vec{h}}_{t} & = f_{LSTM} (x_{t}, {\vec{h}}_{t - 1}; Θ_{f}), \end{matrix}

(16)

\begin{matrix} {\overset{\leftarrow}{h}}_{t} & = f_{LSTM} (x_{t}, {\overset{\leftarrow}{h}}_{t + 1}; Θ_{b}), \end{matrix}

(17)

yielding the contextual embedding. Here,

Θ_{f}

and

Θ_{b}

represent the sets of learnable parameters (weights and biases) for the forward and backward LSTM networks, respectively.

H_{t} = [{\vec{h}}_{t} ∥ {\overset{\leftarrow}{h}}_{t}] .

(18)

An adaptive attention mechanism refines

H_{t}

into a context-weighted summary vector:

h_{a t t} = \sum_{t = 1}^{T} α_{t} H_{t}, α_{t} = \frac{exp (u_{t}^{⊤} w_{a})}{\sum_{i = 1}^{T} exp (u_{i}^{⊤} w_{a})}, u_{t} = tanh (W_{u} H_{t} + b_{u}),

(19)

where

w_{a}

serves as the attention query vector, optimizing temporal salience through soft alignment.

2.5.5. Hierarchical Aggregation and Output Projection

The contextual embedding

h_{a t t}

is aggregated through a hierarchical fusion of global average and maximum pooling:

z = β_{1} \cdot \frac{1}{T} \sum_{t = 1}^{T} h_{a t t, t} + β_{2} \cdot max_{t} (h_{a t t, t}),

(20)

where

β_{1}

and

β_{2}

are learned weighting scalars enforcing balanced statistical and extremal emphasis. The resultant descriptor

z

traverses two nonlinear dense transformations under

L_{2}

regularization:

z^{'} = ϕ (W_{1} z + b_{1}) + ρ ϕ (W_{2} z + b_{2}),

(21)

where

ϕ

denotes ReLU activation and

ρ

acts as a dense fusion coefficient. Finally, the class posterior distribution over intrusion categories is modeled as:

\hat{y} = softmax (W_{o} z^{'} + b_{o}),

(22)

with optimization governed by categorical cross-entropy loss:

L_{C E} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i, c} log {\hat{y}}_{i, c} .

(23)

This hierarchical deep representation ensures that

M (Θ)

can capture localized transient anomalies. It further encodes persistent cross-node intrusion dynamics characteristic of WSN attack behavior.

2.6. Evaluation and Simulation

The model was implemented in TensorFlow 2.15 and trained on a single NVIDIA GPU with CUDA acceleration. Hyperparameters and simulation settings are summarized in Table 4.

2.6.1. Evaluation Metrics

Performance evaluaiton has been carried out using Accuracy (

A c c

), Precision (P), Recall (R), and F1-score (

F_{1}

):

\begin{matrix} A c c & = \frac{T P + T N}{T P + T N + F P + F N} \end{matrix}

(24)

\begin{matrix} P & = \frac{T P}{T P + F P} \end{matrix}

(25)

\begin{matrix} R & = \frac{T P}{T P + F N} \end{matrix}

(26)

\begin{matrix} F_{1} & = 2 \times \frac{P \times R}{P + R} \end{matrix}

(27)

where

T P

,

T N

,

F P

, and

F N

denote true positives, true negatives, false positives, and false negatives, respectively. Confusion matrices and learning curves were used to further assess classification stability and convergence.

3. Results

The performance analysis of the proposed IDS in WSN has been carried out under multi-tier analysis. The system has been implemented using Python and TensorFlow while using Keras backends. The dataset has been divided into three sub-sets (training, validation, and testing). The model has been trained for 30 epochs using the Adam optimizer with a learning rate of

η = 10^{- 3}

. It has been categorized by using a categorical cross-entropy loss function.

3.1. Training and Validation Performance

The model convergence behavior has been depicted in Figure 2. These depict the evolution of both the accuracy and loss across epochs. The training accuracy has been found to consistently increase with the increasing epochs. The stabilization is approximately

98 %

after epoch 10. This indicated that a fast convergence has been attained along with strong generalization. The validation accuracy also follows the same trajectory. This suggests that the overfitting has been successfully mitigated through the inclusion of dropout and batch normalization layers.

The model’s stability can be characterized by the loss differential:

Δ L (t) = | L_{t r a i n} (t) - L_{v a l} (t) |,

(28)

This asymptotically approaches zero as

t \to T_{f i n a l}

and leads to confirming the convergence without oscillation and divergence. This results from the adaptive gradient dynamics, which are inherent in the Adam optimizer, represented as follows:

Θ_{t + 1} = Θ_{t} - \frac{η}{\sqrt{{\hat{v}}_{t}} + ϵ} {\hat{m}}_{t},

(29)

where

{\hat{m}}_{t}

and

{\hat{v}}_{t}

denote bias-corrected first and second moment estimates, respectively.

3.2. Confusion Matrix Analysis

To further analyze the classification performance of the model across the multiple attack classes, outcomes have been presented in the form of a confusion matrix. The proposed CNN-Attention-BiLSTM hybrid model has been found to exhibit strong diagonal dominance as presented in Figure 3. The model has been found to attain accurate multi-class discrimination. The normal and DoS categories in particular have attained a near-perfect classification. This led to minimal confusion between normal network behavior and four distinct types of DoS attacks: Grayhole, Blackhole, TDMA, and Flooding.

The overall precision (P), recall (R), and

F_{1}

-score were computed as:

\begin{matrix} P & = \frac{\sum_{i} T P_{i}}{\sum_{i} (T P_{i} + F P_{i})} = 0.9842, \end{matrix}

(30)

\begin{matrix} R & = \frac{\sum_{i} T P_{i}}{\sum_{i} (T P_{i} + F N_{i})} = 0.9791, \end{matrix}

(31)

\begin{matrix} F_{1} & = \frac{2 P R}{P + R} = 0.9791, \end{matrix}

(32)

which collectively demonstrate the superior discriminative capacity of the model.

3.3. Model Interpretability and Structural Visualization

The network architecture has been depicted in Figure 4. It entails a multiscale convolutional feature extraction with multi-head attention fusion and BiLSTM encoding for the temporal dependency modeling. The total number of trainable parameters is approximately

96, 735

, with only 256 non-trainable parameters. This ensured a lightweight deployment within constrained WSN environments.

3.4. Learning Dynamics and Validation Logs

Figure 5 presents the epoch-wise training logs. These depict a consistent improvement in accuracy and reduction in loss. Each epoch’s validation loss (

L_{v a l}

) was evaluated, and the best-performing model was checkpointed according to:

L_{v a l}^{b e s t} = min_{t} {L_{v a l} (t)} .

(33)

3.5. Comparative Evaluation

The proposed model has been validated against the benchmark models and classifiers that involve CNN, CNN+ recurrent neural network (RNN), and naïve nayes. The summary of the comparison has been presented in Table 5. The model has been found to attain an overall accuracy of

98.0 %

. The accuracy is higher compared to the conventional approaches by up to

2.2 %

. In addition, the inclusion of the attention mechanism further improved the interpretability. This allows offering insights into the neuron activation relevance during detection.

The CNN model recorded 97.0% accuracy, demonstrating the ability of convolutional layers to extract important spatial features. However, it presented lower precision and recall of 83.60% and 82.60% respectively, indicating difficulty in identifying several attack types and resulting in a relatively high false positive rate. The CNN + RNN gained better recall of 96.48%, and F1-score with 96.86% due to the ability of learn patterns over time, while the overall accuracy was similar to the CNN model. This means that using only simple time modeling is not enough to express complex and nonlinear behaviors observed in WSNs.

On the other hand, the Naïve Bayes classifier demonstrated the lowest accuracy at 95.82% compared to other baseline models. This finding shows the limitation to cope with non-linear, and high-dimensional feature interactions which appear in intrusion data.

The proposed model outperforms baseline models across all metrics, reaching 98.42% precision, 97.91% recall, and a 97.91% F1-score. This indicates the model is more effective at detecting attacks while minimizing false positives, which matters in real-time WSN uses. The dual-attention mechanism assists the model in targeting key features, boosting both robustness and stability.

Unlike other baseline models, the proposed model offers interpretability based on the XAI feature. Attention layers reveal the spatial or temporal regions that most strongly influence classification decisions. Interpreting decisions helps network administrators to understand the model’s reasoning and identify vulnerable parts of the network, unlike black-box models that only provide prediction results. The proposed approach provides both interpretable reasoning and high-performance detection, which is beneficial for WSN security monitoring in the real world.

3.6. Discussion

Overall, the proposed framework has been found to outperform the baseline models in terms of accuracy and interpretability. In addition, the contextual attention mechanisms have allowed improved discrimination between the overlapping attack signatures. The mathematical baseline for this improvement is related to non-linear fusion of multi-scale convolutional and temporal embeddings:

z_{f i n a l} = ϕ (W_{a} [F_{m s} \oplus h_{b i}] + b_{a}),

(34)

where ⊕ denotes concatenation and

ϕ

represents a non-linear mapping. The end-to-end architecture thus achieves robust generalization, scalability, and real-time inference capability. This makes it suitable for deployment in resource-constrained WSN environments.

The integration of multi-scale convolutional features and bidirectional temporal embeddings enables the framework to create a coherent latent space that simultaneously captures both local spatial patterns and temporal dynamics of WSN traffic. This combination of features from both domains enhances the model’s ability to detect subtle differences in behavior between normal and malicious traffic. The linear transformation, characterized by

W_{a}

and the bias term

b_{a}

, maps these features into a more differentiated subspace, while the non-linear activation function

ϕ

facilitates higher-order interactions among features, allowing the model to capture complex and non-linear attack behaviors common in WSN environments. The proposed approach not only improves the representative power of the learned embeddings but also enhances the differentiation between classes within the latent space, as demonstrated by the improved clustering of attack categories during the evaluation process.

Compared with conventional CNN or LSTM-based models, the proposed hybrid framework offers a significant advantage in handling the complex characteristics of WSNs. Traditional CNNs can extract local spatial features but often struggle with irregular topologies of WSN. LSTM models excel at capturing temporal patterns but overlook the spatial relationships among network nodes. Introducing the GCN addresses these limitations by learning the connections between nodes and how anomalies propagate across the network, which is crucial for identifying distributed or coordinated attacks. Furthermore, the BiLSTM component of the model improves temporal learning by analyzing data in both forward and backward directions. Additionally, the attention fusion mechanism emphasizes the most significant spatial, structural, and temporal features. As a result, the proposed framework provides reliable and precise intrusion detection, decreases false alarms, and adapts effectively to changing network conditions.

4. Conclusion

The research presents an advanced hybrid deep learning framework for intrusion detection in the WSNs. The model integrated multi-scale convolutional blocks, attention fusion layers, graph convolutional reasoning, and BiLSTM components. The proposed framework helped in effectively capturing both spatial and temporal dependencies in sensor network traffic. The evaluation of the model has been carried out on the WSN-DS dataset. The model has been found to attain superior detection capability and has shown the ability to distinguish against diverse attacks, including Grayhole, Blackhole, TDMA, and Flooding. The study not only attained high detection accuracy but also maintained lightweight computational complexity that is suitable for real-time WSN environments. Compared to the existing baseline models, the proposed model attained enhanced generalization, reduced false alarms, and improved feature interpretability through its attention-driven design. Overall, the model bridged the gap between a high-performance IDS framework and practical WSN deployment. These offer scalable, energy-efficient, and topology-aware detection solutions. In the future, the following can be explored: (i) adaptive federated implementations for decentralized WSN nodes, (ii) self-evolving detection modules leveraging online learning to handle emerging attack patterns, and (iii) explainable visual analytics to strengthen trust and interpretability in mission-critical applications.

References

Puccinelli, D.; Haenggi, M. Wireless sensor networks: applications and challenges of ubiquitous sensing. IEEE Circuits and Systems Magazine 2005, 5, 19–31. [CrossRef]
Borges, L.M.; Velez, F.J.; Lebres, A.S. Survey on the Characterization and Classification of Wireless Sensor Network Applications. IEEE Communications Surveys & Tutorials 2014, 16, 1860–1890. [CrossRef]
Bunterngchit, C.; Pornchaivivat, S.; Bunterngchit, Y. Productivity Improvement by Retrofit Concept in Auto Parts Factories. In Proceedings of the 2019 8th International Conference on Industrial Technology and Management (ICITM), 2019, pp. 122–126. [CrossRef]
Othman, M.F.; Shazali, K. Wireless Sensor Network Applications: A Study in Environment Monitoring System. Procedia Engineering 2012, 41, 1204–1210. International Symposium on Robotics and Intelligent Sensors 2012 (IRIS 2012). [CrossRef]
Bunterngchit, C.; Baniata, L.H.; Baniata, M.H.; ALDabbas, A.; Khair, M.A.; Chearanai, T.; Kang, S. GACL-Net: Hybrid Deep Learning Framework for Accurate Motor Imagery Classification in Stroke Rehabilitation. Computers, Materials & Continua 2025, 83, 517–536. [CrossRef]
Chhaya, L.; Sharma, P.; Bhagwatikar, G.; Kumar, A. Wireless Sensor Network Based Smart Grid Communications: Cyber Attacks, Intrusion Detection System and Topology Control. Electronics 2017, 6. [CrossRef]
Prodanović, R.; Rančić, D.; Vulić, I.; Zorić, N.; Bogićević, D.; Ostojić, G.; Sarang, S.; Stankovski, S. Wireless Sensor Network in Agriculture: Model of Cyber Security. Sensors 2020, 20. [CrossRef]
Dritsas, E.; Trigka, M. A Survey on Cybersecurity in IoT. Future Internet 2025, 17. [CrossRef]
Thapa, N.; Liu, Z.; KC, D.B.; Gokaraju, B.; Roy, K. Comparison of Machine Learning and Deep Learning Models for Network Intrusion Detection Systems. Future Internet 2020, 12. [CrossRef]
Biermann, E.; Cloete, E.; Venter, L. A comparison of Intrusion Detection systems. Computers & Security 2001, 20, 676–683. [CrossRef]
Abdulganiyu, O.H.; Ait Tchakoucht, T.; Saheed, Y.K. A systematic literature review for network intrusion detection system (IDS). International journal of information security 2023, 22, 1125–1162. [CrossRef]
Houda, Z.A.E.; Naboulsi, D.; Kaddoum, G. A Privacy-Preserving Collaborative Jamming Attacks Detection Framework Using Federated Learning. IEEE Internet of Things Journal 2024, 11, 12153–12164. [CrossRef]
Jeyakumar, S.R.; Rahman, M.Z.U.; Sinha, D.K.; Kumar, P.R.; Vimal, V.; Singh, K.U.; Syamsundararao, T.; Kumar, J.N.V.R.S.; Balajee, J. An Innovative Secure and Privacy-Preserving Federated Learning-Based Hybrid Deep Learning Model for Intrusion Detection in Internet-Enabled Wireless Sensor Networks. IEEE Transactions on Consumer Electronics 2025, 71, 273–280. [CrossRef]
Zhou, H.; Zou, H.; Zhou, P.; Shen, Y.; Li, D.; Li, W. CBCTL-IDS: A Transfer Learning-Based Intrusion Detection System Optimized With the Black Kite Algorithm for IoT-Enabled Smart Agriculture. IEEE Access 2025, 13, 46601–46615. [CrossRef]
Halbouni, A.; Gunawan, T.S.; Habaebi, M.H.; Halbouni, M.; Kartiwi, M.; Ahmad, R. CNN-LSTM: Hybrid Deep Neural Network for Network Intrusion Detection System. IEEE Access 2022, 10, 99837–99849. [CrossRef]
Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep Learning Approach for Intelligent Intrusion Detection System. IEEE Access 2019, 7, 41525–41550. [CrossRef]
Birahim, S.A.; Paul, A.; Rahman, F.; Islam, Y.; Roy, T.; Asif Hasan, M.; Haque, F.; Chowdhury, M.E.H. Intrusion Detection for Wireless Sensor Network Using Particle Swarm Optimization Based Explainable Ensemble Machine Learning Approach. IEEE Access 2025, 13, 13711–13730. [CrossRef]
Hakami, H.; Faheem, M.; Bashir Ahmad, M. Machine Learning Techniques for Enhanced Intrusion Detection in IoT Security. IEEE Access 2025, 13, 31140–31158. [CrossRef]
Alzahrani, A. Novel Approach for Intrusion Detection Attacks on Small Drones Using ConvLSTM Model. IEEE Access 2024, 12, 149238–149253. [CrossRef]
Alruwaili, F.F.; Asiri, M.M.; Alrayes, F.S.; Aljameel, S.S.; Salama, A.S.; Hilal, A.M. Red Kite Optimization Algorithm With Average Ensemble Model for Intrusion Detection for Secure IoT. IEEE Access 2023, 11, 131749–131758. [CrossRef]
Atitallah, S.B.; Driss, M.; Boulila, W.; Koubaa, A. Securing Industrial IoT Environments: A Fuzzy Graph Attention Network for Robust Intrusion Detection. IEEE Open Journal of the Computer Society 2025, 6, 1065–1076. [CrossRef]
Saleh, H.M.; Marouane, H.; Fakhfakh, A. Stochastic Gradient Descent Intrusions Detection for Wireless Sensor Network Attack Detection System Using Machine Learning. IEEE Access 2024, 12, 3825–3836. [CrossRef]
Jiang, L.; Gu, H.; Xie, L.; Yang, H.; Na, Z. ST-IAOA-XGBoost: An Efficient Data-Balanced Intrusion Detection Method for WSN. IEEE Sensors Journal 2025, 25, 1768–1783. [CrossRef]
Almomani, I.; Al-Kasasbeh, B.; AL-Akhras, M. WSN-DS: A Dataset for Intrusion Detection Systems in Wireless Sensor Networks. Journal of Sensors 2016, 2016, 4731953. [CrossRef]
Marriwala, N.; Rathee, P. An approach to increase the wireless sensor network lifetime. In Proceedings of the 2012 World Congress on Information and Communication Technologies, 2012, pp. 495–499. [CrossRef]

Figure 1. Model architecture of the proposed framework

Figure 2. The model convergence behavior: (a) Training and validation accuracy curves; (b) Training and validation loss convergence.

Figure 3. Confusion matrix of the proposed intrusion detection model.

Figure 4. Architectural summary of the proposed model.

Figure 5. Epoch-wise model training logs showing validation checkpoints.

Table 1. Summary of reviewed intrusion detection models on WSN-DS and related datasets.

Article	Methodology / Model	Dataset	Key Limitation / Gap
[12]	Federated learning with secure aggregation for jamming attack detection	WSN-DS (Jamming classes)	Limited to jamming attacks; lacks multi-attack scalability
[13]	Hybrid SCNN–BiLSTM optimized via african vulture optimization under a federated learning setup	WSN-DS, CIC-IDS2017	Low communication efficiency in FL; lacks interpretability
[14]	Transfer learning with MobileNet/VGG19 ensemble optimized by Black Kite Algorithm	ToN-IoT, Edge-IIoTset, WSN-DS	High computational load; poor real-time adaptability
[15]	CNN–LSTM hybrid model integrating spatial-temporal dependencies	WSN-DS (Binary and Multi-class)	Class imbalance and explainability not addressed
[16]	DNN benchmarked against classical ML baselines	KDDCup’99, NSL-KDD, WSN-DS	No WSN-specific topology modeling; high false positives
[17]	PSO-based feature selection with RF, DT, and kNN ensemble plus LIME/SHAP explanations	WSN-DS (Binary)	No temporal or spatial dependency modeling
[18]	SMOTE-based balancing and PCC feature selection for ML/DL comparison	WSN-DS, UNSW-NB15, CIC-IDS2017	No topology-aware or energy-efficient design
[19]	ConvLSTM for spatial-temporal intrusion detection in IoD networks	WSN-DS, NSL-KDD, Drone dataset	Limited to UAV context; weak transferability to WSNs
[20]	Red Kite Optimization with average ensemble fusion and LCWOA tuning	WSN-DS (Binary)	No adaptive temporal modeling; lacks robustness to evolving threats
[21]	Fuzzy graph attention network for relational uncertainty learning	Edge-IIoTSet, CIC-Malmem, WSN-DS	Computationally expensive; unsuitable for constrained WSNs
[22]	SGD-based optimization for lightweight ML classifiers in WSN intrusion detection	WSN-DS (Binary)	Simplistic linear models; limited scalability for dense WSNs
[23]	Improved arithmetic optimization algorithm integrated with XGBoost	WSN-DS (Binary)	Static learning; lacks adaptive or online retraining

Table 2. Feature description of the WSN-DS dataset.

Feature symbol	Description	Feature symbol	Description
`id`	A unique identifier assigned to each sensor node; distinguishes nodes across rounds and stages.	`Time`	Current simulation time of the node representing its temporal position in the network.
`Is_CH`	Binary flag indicating whether a node is a cluster head (1) or a normal node (0).	`who_CH`	Identifier of the cluster head associated with the node in the current round.
`Dist_To_CH`	Distance between the node and its respective cluster head, calculated per round.	`ADV_S`	Number of advertise messages broadcast by cluster heads to surrounding nodes.
`ADV_R`	Number of advertise messages received by a node from nearby cluster heads.	`JOIN_S`	Number of join request messages sent by nodes to cluster heads for cluster formation.
`JOIN_R`	Number of join request messages received by cluster heads from their member nodes.	`SCH_S`	Number of TDMA schedule broadcast messages sent by cluster heads to nodes.
`SCH_R`	Number of TDMA schedule messages received from cluster heads by the nodes.	`Rank`	The order or rank of a node within the TDMA schedule during communication.
`DATA_S`	Number of data packets sent from a sensor node to its cluster head.	`DATA_R`	Number of data packets received by the cluster head from its sensor nodes.
`Data_Sent_To_BS`	Number of data packets transmitted from the cluster head to the base station.	`dist_CH_To_BS`	Distance between the cluster head and the base station used for energy computation.
`send_code`	Cluster sending code identifying the transmitting node within its cluster.	`Expanded_Energy`	Amount of energy consumed by the node during the previous communication round.
`Attack_type`	Target variable representing the attack category with five classes: Blackhole, Grayhole, Flooding, TDMA, and Normal.	–	–

Table 3. Label encoding used in the proposed method.

Class	Label
Blackhole	0
Flooding	1
Grayhole	2
Normal	3
TDMA	4

Table 4. Simulation parameters used in the proposed intrusion detection framework.

Parameter	Value
Learning rate	$1 \times 10^{- 4}$
Batch size	128
Epochs	30
Optimizer	Adam
Regularization	$L_{2} (λ = 0.001)$
Dropout rate	$0.25$ – $0.30$
Feature dimension	16
Hidden units (BiLSTM)	64 per direction

Table 5. Summary of comparative results for intrusion detection performance.

Model	XAI	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
CNN	No	97.00	83.60	82.60	82.00
CNN + RNN	No	97.04	98.79	96.48	96.86
Naive Bayes	No	95.82	96.80	95.40	–
Proposed Model	Yes	98.00	98.42	97.91	97.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Dual-Attention CNN–GCN–BiLSTM Framework for Intelligent Intrusion Detection in Wireless Sensor Networks

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. Dataset Description

2.2. Design Framework

2.3. Data Preprocessing

2.4. Feature Engineering

2.5. Model Design

2.5.1. Multi-Scale Convolutional Block

2.5.2. Dual-Stage Attention Fusion

2.5.3. Graph Convolutional Regularization

2.5.4. Bidirectional LSTM with Contextual Attention

2.5.5. Hierarchical Aggregation and Output Projection

2.6. Evaluation and Simulation

2.6.1. Evaluation Metrics

3. Results

3.1. Training and Validation Performance

3.2. Confusion Matrix Analysis

3.3. Model Interpretability and Structural Visualization

3.4. Learning Dynamics and Validation Logs

3.5. Comparative Evaluation

3.6. Discussion

4. Conclusion

References

MDPI Initiatives

Important Links

Subscribe