Submitted:
05 April 2024
Posted:
08 April 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Motivations and contributions
- We developed an ensemble ML-based detection method against IoT attacks using the IoTID20 dataset. The proposed model makes three-faceted predictions: binary classification for anomalous or normal traffic, multi-class prediction for broad categorization of the attack, and classifying attacks within the sub-category. This layered approach provides a more nuanced understanding of the threat landscape, which can significantly aid in effective threat mitigation and response. To the best of our knowledge, this is the first work that applies an ensemble machine learning approach specifically to the IoTID20 dataset. This dataset represents a specific and complex IoT environment with varied network attacks, which poses unique challenges that our model effectively addresses.
- We provided a proof of concept implementation of the proposed ensemble ML-based solution, which consists of a diverse mix of individual classifiers (Decision Trees, Extra Trees Classifier, K-Nearest Neighbors, and Random Forests), creating a more robust and reliable system for anomaly detection. This particular combination is a novel contribution that enhances the performance and stability of the detection system.
2. Related Work
- Mitigate IoT Abuse
- b.
- Recent Related Work Utilizing IoTID20 Dataset
3. Methodology
- IoTID20 Testbed Environment
- Denial of Service Attack: an attack that tries to compromise the availability of its target by making it unresponsive.
- Mirai Attack: an evolved type of DoS attack named DDoS. DDoS attacks use the same method as DoS. However, this time a group of attackers are attacking the victim simultaneously.
- Man in the Middle Attack: an attack that tries to eavesdrop on communications taking place between two nodes, where the attacker starts acting as a proxy.
- Probing Attack: an attack that scans the network looking for vulnerabilities that could be exploited to help perform an attack on the device.
- Probing Attack: an attack that scans the network looking for vulnerabilities that could be exploited to help perform an attack on the device.
- Converted all data types to integer values to ensure data is understood by the machine learning model.
- Removed any infinity values found and replaced them with NaN values instead.
- Replaced NaN values utilizing the mean strategy ‘simple imputer’ imputation transformer from sklearn, which replaces NaN values by the mean column value.
- Performed label encoding on three columns (Binary, Multi-Class Category, and Multi-Class Subcategory) which transforms categorical columns into numerical values.
- Dropped the remaining categorical features to reduce margin of error (Flow_ID, Dst_IP, Src_IP, Timestamp)
- Dropped constant features as per the correlation matrix of the IoTID20 dataset; they contain only one value for all the outputs in the dataset and provide no information.
- b.
- Overall IDS Model Design Phase
- c.
- Ensemble Machine Learning Model Selection Phase
4. Experiments and Results
- Binary Classification
- b.
- Multi-Class Classification (Categories)
- c.
- Multi-Class Classification (Sub-Categories)
- d.
- Ensemble Techniques
- e.
- Comparison with Related Previous Work
- Binary Classification: Out of the previously outlined literature, nine papers evaluated their models against binary classification. Figure 10 summarizes the results of the literature in comparison with the proposed IDS model. The proposed stacking IDS model performed better than some of the literature, and in some instances, the performance was similar with 100% accuracy.
- 2.
- Multi-Class Category Classification: Out of the previously outlined literature, only three papers evaluated their models against multi-class category classification. Figure 11 outlines the results of the literature in comparison with the proposed IDS model. The proposed stacking IDS model performed better than the literature in the category classification task. The proposed model achieved an accuracy of 100% and recall, precision, and F1-score of 99%.
- 3.
- Multi-Class Subcategory Classification: Out of the previously outlined literature, four papers evaluated their models against multi-class subcategory classification. Figure 12 presents the results of the literature in comparison with the proposed IDS model.
- Researchers in [17] evaluated the IoTID20 dataset on six ML classifiers. Out of the six classifiers, the decision tree achieved the best result with 88%, followed by an ensemble method with 87%.
- Researchers in [22] evaluated the IoTID20 dataset on four classifiers. Of the four, DT performed the highest detection rate in detecting 6 of the nine classes. Figure 13 displays the ROC-AUC curve for the multi-class subcategory classification of this study. The proposed Ensemble model in this study achieved similar results to the DT classifier in [22] in terms of AUC except for the Scan host port class, where in [22], AUC is 96 compared to 81 in this study.
5. Conclusions
References
- Chin, J.; Callaghan, V.; Allouch, S.B. The Internet-of-Things: Reflections on the Past, Present and Future from a User-Centered and Smart Environment Perspective. Journal of Ambient Intelligence and Smart Environments 2019, 11, 45–69. [CrossRef]
- Winchcomb, T.; Massey, S.; Beastall, P. Review of Latest Developments in the Internet of Things; 2017.
- Cyrus, C. IoT Cyberattacks Escalate in 2021, According to Kaspersky Available online: https://www.iotworldtoday.com/security/iot-cyberattacks-escalate-in-2021-according-to-kaspersky (accessed on 26 April 2023).
- Abomhara, M.; Køien, G.M. Cyber Security and the Internet of Things: Vulnerabilities, Threats, Intruders and Attacks. Journal of Cyber Security and Mobility 2015, 65–88. [CrossRef]
- Sethi, P.; Sarangi, S.R. Internet of Things: Architectures, Protocols, and Applications. Journal of Electrical and Computer Engineering 2017, 2017, e9324035. [CrossRef]
- Al-Attar, R.; Alkasassbeh, M.; Al-Dala’Ien, M. A Survey: Soft Computing for Anomaly Detection to Mitigate IoT Abuse. In Proceedings of the 2022 International Conference on Engineering & MIS (ICEMIS); July 2022; pp. 1–6.
- Ullah, S.; Ahmad, J.; Khan, M.A.; Alkhammash, E.H.; Hadjouni, M.; Ghadi, Y.Y.; Saeed, F.; Pitropakis, N. A New Intrusion Detection System for the Internet of Things via Deep Convolutional Neural Network and Feature Engineering. Sensors 2022, 22, 3607. [CrossRef]
- Omar, M.; George, L. Toward a Lightweight Machine Learning Based Solution against Cyber-Intrusions for IoT. In Proceedings of the 2021 IEEE 46th Conference on Local Computer Networks (LCN); October 2021; pp. 519–524.
- Yu, X.; Yang, X.; Tan, Q.; Shan, C.; Lv, Z. An Edge Computing Based Anomaly Detection Method in IoT Industrial Sustainability. Applied Soft Computing 2022, 128, 109486. [CrossRef]
- Shafiq, M.; Tian, Z.; Bashir, A.K.; Du, X.; Guizani, M. IoT Malicious Traffic Identification Using Wrapper-Based Feature Selection Mechanisms. Computers & Security 2020, 94, 101863. [CrossRef]
- Lu, H.; Jin, C.; Helu, X.; Du, X.; Guizani, M.; Tian, Z. DeepAutoD: Research on Distributed Machine Learning Oriented Scalable Mobile Communication Security Unpacking System. IEEE Transactions on Network Science and Engineering 2022, 9, 2052–2065. [CrossRef]
- Shafiq, M.; Tian, Z.; Bashir, A.K.; Du, X.; Guizani, M. CorrAUC: A Malicious Bot-IoT Traffic Detection Method in IoT Network Using Machine-Learning Techniques. IEEE Internet Things J. 2021, 8, 3242–3254. [CrossRef]
- Dat-Thinh, N.; Xuan-Ninh, H.; Kim-Hung, L. MidSiot: A Multistage Intrusion Detection System for Internet of Things. Wireless Communications and Mobile Computing 2022, 2022, e9173291. [CrossRef]
- Islam, N.; Farhin, F.; Sultana, I.; Shamim Kaiser, M.; Sazzadur Rahman, Md.; Mahmud, M.; S. M. Sanwar Hosen, A.; Hwan Cho, G. Towards Machine Learning Based Intrusion Detection in IoT Networks. Computers, Materials & Continua 2021, 69, 1801–1821. [CrossRef]
- Indrasiri, P.L.; Lee, E.; Rupapara, V.; Rustam, F.; Ashraf, I. Malicious Traffic Detection in Iot and Local Networks Using Stacked Ensemble Classifier. Computers, Materials and Continua 2022, 71, 489–515. [CrossRef]
- Song, Y.; Hyun, S.; Cheong, Y.-G. Analysis of Autoencoders for Network Intrusion Detection. Sensors 2021, 21, 4294. [CrossRef]
- Ullah, I.; Mahmoud, Q.H. A Scheme for Generating a Dataset for Anomalous Activity Detection in IoT Networks. In Proceedings of the Advances in Artificial Intelligence; Goutte, C., Zhu, X., Eds.; Springer International Publishing: Cham, 2020; pp. 508–520.
- Maniriho, P.; Niyigaba, E.; Bizimana, Z.; Twiringiyimana, V.; Mahoro, L.J.; Ahmad, T. Anomaly-Based Intrusion Detection Approach for IoT Networks Using Machine Learning. In Proceedings of the 2020 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM); November 2020; pp. 303–308.
- Alkahtani, H.; Aldhyani, T.H.H. Intrusion Detection System to Advance Internet of Things Infrastructure-Based Deep Learning Algorithms. Complexity 2021, 2021, e5579851. [CrossRef]
- Qaddoura, R.; M. Al-Zoubi, A.; Faris, H.; Almomani, I. A Multi-Layer Classification Approach for Intrusion Detection in IoT Networks Based on Deep Learning. Sensors 2021, 21, 2987. [CrossRef]
- Qaddoura, R.; Al-Zoubi, A.M.; Almomani, I.; Faris, H. Predicting Different Types of Imbalanced Intrusion Activities Based on a Multi-Stage Deep Learning Approach. In Proceedings of the 2021 International Conference on Information Technology (ICIT); July 2021; pp. 858–863.
- Albulayhi, K.; Smadi, A.A.; Sheldon, F.T.; Abercrombie, R.K. IoT Intrusion Detection Taxonomy, Reference Architecture, and Analyses. Sensors 2021, 21, 6432. [CrossRef]
- Krishnan, S.; Neyaz, A.; Liu, Q. IoT Network Attack Detection Using Supervised Machine Learning. 2021.
- Mohammed, R.; Rawashdeh, J.; Abdullah, M. Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results. In Proceedings of the 2020 11th International Conference on Information and Communication Systems (ICICS); April 2020; pp. 243–248.
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 2002, 16, 321–357. [CrossRef]
- Pearson, K. LIII. On Lines and Planes of Closest Fit to Systems of Points in Space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 1901, 2. [CrossRef]
- Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. 2001, 14.
- Rincy, T.N.; Gupta, R. Ensemble Learning Techniques and Its Efficiency in Machine Learning: A Survey. In Proceedings of the 2nd International Conference on Data, Engineering and Applications (IDEA); February 2020; pp. 1–6.
- Yin, X.; Liu, Q.; Pan, Y.; Huang, X.; Wu, J.; Wang, X. Strength of Stacking Technique of Ensemble Learning in Rockburst Prediction with Imbalanced Data: Comparison of Eight Single and Ensemble Models. Nat Resour Res 2021, 30, 1795–1815. [CrossRef]













| Paper | Binary | Multi – Class (Category) |
Multi-Class (Subcategory) |
|---|---|---|---|
| [14] | Training and Testing Accuracy, IoTID20 = 100% |
X | X |
| [15] | Accuracy, IoTID20 = 100% | X | X |
| [16] | Accuracy, IotID20 = 94.7% | X | X |
| [13] | X | Accuracy DT = 99.15% |
X |
| [17] | X | X | Accuracy DT = 88% Ensemble = 87% Random Forest = 84% Gaussian NB = 73% LDA = 70% Logistic Regression = 40% SVM = 40% |
| [18] | Accuracy, DoS = 99.95% MITM = 99.9761%, Scan = 99.96% Mirai Not covered in the paper |
X | X |
| [19] | Accuracy: CNN = 96%, CNN-LSTM = 98% LSTM = 98.2% |
X | X |
| [20] | X | Accuracy Proposed Model = 86.48% | X |
| [21] | X | X | Accuracy Proposed Model = 55.79% |
| [8] | Detection Rate, DT = 0.99, KNN = 0.98 RF = 0.98 |
X | X |
| [22] | X | X | Detection Rate (Average) ANN = 96% DT = 94% SVM = 91.5% LR = 91.3% |
| [23] | Sequential Backward Processing Highest Accuracy: XGBoost: 99.31% Sequential Forward Processing Highest Accuracy: XGBoost: 99.3% Recursive Feature Elimination Highest Accuracy: XGBoost: 98.79% |
X | X |
| [7] | Accuracy Proposed Model = 99.84% |
Accuracy Proposed Model = 98.12% | Accuracy Proposed Model = 77.5% |
| Binary | Multi-Class (Category) | Multi-Class (Subcategory) |
|---|---|---|
| Normal (40,073) |
Normal (40,073) |
Normal (40,073) |
| Anomaly (585,710) | DoS (59,391) | Syn-flooding(59,391) |
| MITM (35,377) | ARP Spoofing (35,377) | |
| Mirai (415,677) | Brute Force (121,181) | |
| HTTP Flooding (55,818) | ||
| UDP Flooding (183,554) | ||
| ACK Flooding (55,124) | ||
| Scan (75,265) | Host Port (22,192) | |
| Port OS (53,073) |
| Class | Recall | Precision | F1-score |
|---|---|---|---|
| Normal | 100% | 99% | 99.2% |
| Attack | 100% | 100% | 100% |
| Target Class Type | Sampling Method |
|---|---|
| MITM | All instances were under sampled to 35,377 |
| Normal | Instances < 40,073 were oversampled using SMOTE Instances > 40,073 were under sampled |
| DoS | Instances < 59,391 were oversampled using SMOTE Instances > 59,391 were under sampled |
| Scan | Instances < 75,265 were oversampled using SMOTE Instances > 75,265 were under-sampled |
| Mirai | All instances were oversampled to 415,677 using SMOTE |
| No Sampling | All instances are kept as their original distribution within the IoTID20 dataset |
| Class | Recall | Precision | F1-score |
|---|---|---|---|
| Normal | 99% | 98% | 99% |
| DoS | 100% | 100% | 100% |
| MITM | 99% | 90% | 93% |
| Mirai | 99% | 100% | 99% |
| Scan | 99% | 98% | 99% |
| Target Class Type | Sampling Method |
|---|---|
| Scan - Host Port | All instances were under sampled to 22,192 |
| MITM | Instances < 35,377 were oversampled using SMOTE Instances > 35,377 were under-sampled |
| Normal | Instances < 40,073 were oversampled using SMOTE Instances > 40,073 were under-sampled |
| Scan - Port OS | Instances < 53,073 were oversampled using SMOTE Instances > 53,073 were under-sampled |
|
Mirai - ACK Flooding |
Instances < 55,124 were oversampled using SMOTE Instances > 55,124 were under-sampled |
|
Mirai - HTTP Flooding |
Instances < 55,818 were oversampled using SMOTE Instances > 55,818 were under-sampled |
| DoS | Instances < 59,391 were oversampled using SMOTE Instances > 59,391 were under-sampled |
| Mirai - Brute Force | Instances < 121,181 were oversampled using SMOTE Instances > 121,181 were under-sampled |
|
Mirai - UDP Flooding |
All instances were oversampled to 183,554 using SMOTE |
| No Sampling | All instances are kept as their original distribution within the IoTID20 dataset |
| Class | Recall | Precision | F1-score |
|---|---|---|---|
| Normal | 99% | 100% | 99% |
| DoS | 100% | 100% | 100% |
| MITM | 97% | 97% | 97% |
|
Mirai - Ack Flooding |
65% | 67% | 66% |
| Mirai - Brute Force | 98% | 93% | 96% |
|
Mirai - HTTP Flooding |
65% | 67% | 66% |
|
Mirai - UDP Flooding |
89% | 91% | 90% |
| Scan - Host Port | 62% | 80% | 70% |
| Scan - Port OS | 93% | 86% | 89% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).