Submitted:
13 June 2025
Posted:
17 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. IoT Security Challenges
1.2. Machine Learning in Intrusion Detection
1.3. Complete IoT Datasets Requirement
1.4. Objective and Contributions
- A detailed preprocessing pipeline including missing value imputation, feature scaling, and class balancing using SMOTE.
- Design of a hybrid deep learning model integrating CNN and Transformer components.
- Comprehensive evaluation across multiple performance metrics (accuracy, precision, recall, F1-score).
- Comparative analysis with baseline models including standalone CNN, LSTM, and Transformer architectures.
- Practical discussion on deployment feasibility and future improvements in edge-compatible IDS for IoT.
2. Literature Review
2.1. Machine Learning Models for Intrusion Detection in IoT
2.2. Issues of Current Datasets
- Outdated attack patterns: Older datasets such as KDD99 or NSL-KDD do not reflect current threat landscapes.
- Class imbalance: Real-world attack distributions are often skewed, with benign traffic vastly outnumbering malicious instances.
- Lack of heterogeneity: Many datasets are captured in controlled environments and fail to represent the diverse and noisy nature of real IoT networks.
- Missing data and noise: Incomplete features and inconsistent labeling can compromise model training and evaluation.
2.3. Research Gap and Proposed Approach
- Designing a hybrid CNN-Transformer model tailored for multi-class intrusion detection in IoT networks.
- Implementing a robust data preprocessing pipeline to handle quality issues in CIC-IDS2017.
- Conducting detailed performance evaluation and benchmarking against baseline models to assess the efficacy of the proposed hybrid approach.
3. Dataset
3.1. Summary Features and Organization of the Dataset
3.2. Dataset Utility
- Diversity of Attacks: Unlike older datasets, it captures a variety of contemporary threats relevant to modern networks.
- Realistic Traffic Simulation: Data was collected from a live lab environment with emulated IoT-like traffic patterns, including web browsing, email, VOIP, and video streaming.
- Feature Richness: It includes flow-based features, payload-based statistics, and header-based information, which are critical for both CNN and Transformer components of our model.
- Labeled for Supervised Learning: The dataset includes ground-truth labels, making it suitable for classification tasks.
- Dropping columns with excessive missing values
- Replacing NaNs with statistical imputation
- Normalizing numerical features using MinMaxScaler
- Balancing the dataset using the Synthetic Minority Oversampling Technique (SMOTE)
4. Methodology
4.1. Data Cleaning and Preparation
- Handling Missing Values: Several features contained NaN values due to incomplete flow statistics (e.g., missing packet counts in short-lived sessions). Columns with over 50% missing data were dropped. For the remaining features, missing values were imputed using column-wise mean values.
- Feature Selection: Non-informative columns (e.g., Timestamp, Flow ID, Source IP, Destination IP) were excluded. This reduced noise and prevented data leakage.
- Normalization: All numerical features were scaled using MinMaxScaler to a range between 0 and 1. This is crucial for deep learning models to ensure faster convergence and balanced gradient flow.
- Label Encoding: Categorical labels were converted to numeric class indices using LabelEncoder.
- Class Balancing: The dataset exhibited significant imbalance across classes-benign traffic constituted over 80% of the total. To mitigate this, the Synthetic Minority Oversampling Technique (SMOTE) was applied. SMOTE generates synthetic examples for minority classes to balance the training set, improving model generalization.

4.2. Model Selection and Design.
- CNN Layer: The input feature vector (of size 78) is first reshaped and passed through 1D convolutional layers to capture local spatial dependencies. The CNN layers serve as a feature extractor by learning low-level interactions among network flow attributes.
- Positional Encoding: To prepare the features for the Transformer, positional encodings are added to incorporate sequential information, even though the original data is not inherently sequential.
- Transformer Block: The Transformer module includes multi-head self-attention and feed-forward networks. It captures long-range relationships and contextual dependencies between features, enhancing the model’s ability to detect sophisticated attack patterns.
- Dense Layers: Output from the Transformer is flattened and passed through a series of fully connected layers with dropout regularization.
- Output Layer: A softmax-activated dense layer with 15 units (corresponding to the 15 traffic classes) is used for final classification.
4.3. Model Training and Testing Environment
- Optimizer: Adam with an initial learning rate of 0.001
- Loss Function: Categorical Cross-Entropy
- Batch Size: 64
- Epochs: 30
- Early Stopping: Patience of 5 epochs to avoid overfitting
4.4. Experimental Results and Analysis
- Overall Accuracy: 99.1%
- Average Precision: 98.8%
- Average Recall: 97.6%
- Macro F1 Score: 98.1%
5. Results and Analysis
5.1. Multi-Class Intrusion Detection


5.2. Attack-Specific Detection
5.3. Training Dynamics

5.4. Models Comparison
6. Discussion
6.1. Feature Importance and Model Understanding
6.2. Effectiveness of the Hybrid CNN-Transformer Model
6.3. Practical Implications and Deployment Readiness
6.4. Constraints and Future Directions
- Imbalanced Class Distribution: Despite applying SMOTE for balancing the dataset, minority classes such as XSS, Infiltration, and SQL Injection still show lower recall. Oversampling may not fully capture their complexity. Future work could explore adversarial or generative oversampling techniques (e.g., GAN-based synthetic data) to better enrich low-frequency classes.
- Limited Feature Engineering: Although the model performed well with automated learning, it could benefit from richer contextual features such as protocol metadata, payload entropy, and behavior over time. Incorporating domain knowledge or protocol-specific heuristics could further improve classification fidelity.
- Computational Complexity: The hybrid architecture, though efficient for training on GPUs, may not yet be optimized for edge deployment. Model compression techniques like pruning, quantization, or distillation should be explored to reduce inference time and model size.
- Static Dataset: The CIC-IDS2017 dataset, while rich, is still static and may not fully represent evolving attack tactics, techniques, and procedures (TTPs). Real-time or streaming intrusion detection datasets could help validate performance under dynamic conditions.
- Lack of Explainability: While the feature importance plot offers some insight, full explainability tools such as SHAP or LIME can help better understand decision paths in high-stakes environments like industrial control systems or healthcare IoT.
- Develop online-learning variants of the model for live traffic
- Incorporate real-world feedback loops to update the model
- Compare edge inference performance under resource-constrained conditions
- Explore federated learning for decentralized security in large-scale IoT ecosystems
7. Conclusion
- A robust preprocessing pipeline addressing dataset imbalance and missing values
- A hybrid architecture that leverages both spatial and contextual feature learning
- A detailed empirical analysis supported by metrics, training dynamics, and feature interpretability
References
- Sharafaldin, A. H. Lashkari, and A. A. Ghorbani. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. ICISSp 2018 - Proceedings of the 4th International Conference on Information Systems Security and Privacy*, pp. 108–116, 2018.
- Vaswani, *!!! REPLACE !!!*; et al. Attention Is All You Need. Advances in Neural Information Processing Systems 2017, 30, 5998–6008. [Google Scholar]
- N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi. A Deep Learning Approach to Network Intrusion Detection. IEEE Transactions on Emerging Topics in Computing 2018, 6, 530–543. [Google Scholar]
- Yin, Y. Zhu, J. Fei, and X. He. A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks. IEEE Access 2017, 5, 21954–21961. [Google Scholar] [CrossRef]
- H. Zhang, L. Huang, C. Q. Wu, and Z. Li. An Intrusion Detection System Based on Deep Learning for IoT Networks. IEEE Internet of Things Journal 2022, 9, 3456–3468. [Google Scholar]
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 2002, 16, 321–357. [Google Scholar] [CrossRef]
- M. Tavallaee, E. M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani. A Detailed Analysis of the KDD CUP 99 Data Set. Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications pp. 1–6, 2009.
- P. Lin, K. Ye, and C.-Z. Xu. An Enhanced Intrusion Detection System Using SMOTE and Recurrent Neural Networks on CIC-IDS2017. Journal of Network and Computer Applications 2021, 178, 102974. [Google Scholar]
- Chandekar, Prathamesh, Mansi Mehta, and Swet Chandan. Enhanced anomaly detection in iomt networks using ensemble ai models on the ciciomt2024 dataset. arXiv 2025, arXiv:2502.11854. [Google Scholar]


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).