Submitted:
23 July 2023
Posted:
24 July 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
3. Methodology
Prepossessing
Structure of AWID3 Dataset
AWID3 feature description
Attacks in AWID3
4. Results and Discussion
Phase I: Multi Attack (Nominal class)
- At first, the data was cleaned by removing empty features and features with a constant value, the remaining features were 49.
- Randomize the data: Randomly shuffle the order of instances passed through it. The random number generator is reset with the seed value whenever a new set of instances is passed in, we chose the seed value to be 33.
- Remove Percentage: To fit the allocated memory for WEKA. The new sample consists of 7499 instances.
- Remove attributes that have over 50% of missing value, the remaining attributes are 44.

Phase II: Multi Attack (Numeric class)
- At first, the data was cleaned by removing empty features and features with a constant value, the remaining features were 49.
- Randomize the data: Randomly shuffle the order of instances passed through it. The random number generator is reset with the seed value whenever a new set of instances is passed in, we chose the seed value to be 33.
- Remove Percentage: To fit the allocated memory for WEKA. The new sample consists of 7500 instances and 49 features.
- Remove attributes that have over 50% of missing value, the remaining attributes are 44.
- The remaining 36 features based on Gain Ratio Attributes Evaluation were ( wlan.fc.protected, wlan.fc.subtype, radio- tap.present.tsft, wlan.bssid, wlan.fc.type, wlan.fc.ds, wlan.duration, frame.number, wlan.sa, frame.len, wlan radio.data rate, wlan.ta, frame.time relative, wlan.fc.retry, wlan.ra, wlan radio.duration, wlan.da, ip.proto, radiotap.channel.flags.cck, wlan radio.signal dbm radiotap.dbm antsignal, wlan.fc.moredata, radiotap.datarate, ip.ttl, ip.src, frame.time delta, frame.time delta displayed, ip.dst, wlan.fc.pwrmgt, wlan.fc.frag, frame.time, wlan.seq, ip.version, llc, wlan.fc.order).
- According to Information Gain Attributes evaluator these features (radiotap.timestamp.ts, frame.time epoch, frame.time, frame.number, frame.time relative, frame.len, wlan radio.duration) are copying the label, so we removed them to get the accurate result for the ML model.
- The remaining 37 features were (frame.time delta, frame.time delta displayed, radiotap.channel.flags.cck, radiotap.channel.flags.ofdm, radiotap.channel.freq, radiotap.datarate, radiotap.dbm antsignal, radiotap.length, radiotap.present.tsft, wlan.duration, wlan.bssid, wlan.da, wlan.fc.ds, wlan.fc.frag, wlan.fc.order, wlan.fc.moredata, wlan.fc.protected, wlan.fc.pwrmgt, wlan.fc.type, wlan.fc.retry, wlan.fc.subtype, wlan.ra, wlan.sa, wlan.seq, wlan.ta, wlan radio.channel, wlan radio.data rate, wlan radio.frequency, wlan radio.signal db wlan radio.phy, llc, ip.dst, ip.proto, ip.src, ip.ttl, ip.version, Label).
Phase III: Binary Classification
Analysis Results for the Three Phases
Analysis Binary Classification; Phase III
Comparison of our findings with other studies
5. Conclusion
References
- Zhou, Y.; Gao, Y.; Chen, J.; Li, D.; Liu, Z.; Wei, Y.; Ma, Z. Blockchain for 5g advanced wireless networks. In Proceedings of the 2022 International Wireless Communications and Mobile Computing (IWCMC); IEEE, 2022; pp. 1306–1310. [Google Scholar]
- Chettri, L.; Bera, R. A Comprehensive Survey on Internet of Things (IoT) Toward 5G Wireless Systems. IEEE Internet Things J. 2019, 7, 16–32. [Google Scholar] [CrossRef]
- Tsiknas, K.; Taketzis, D.; Demertzis, K.; Skianis, C. Cyber Threats to Industrial IoT: A Survey on Attacks and Countermeasures. IoT 2021, 2, 163–186. [Google Scholar] [CrossRef]
- Gondal, K.I.; Vamplew, P.; Kamruzzaman, J. Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2019, 2, 1–22. [Google Scholar]
- Tahsien, S.M.; Karimipour, H.; Spachos, P. Machine learning based solutions for security of Internet of Things (IoT): A survey. J. Netw. Comput. Appl. 2020, 161, 102630. [Google Scholar] [CrossRef]
- Brownlee, J. Introduction to dimensionality reduction for machine learning; Machine Learning Mastery: Vermont, Australia, 2020. [Google Scholar]
- Chatzoglou, E.; Kambourakis, G.; Smiliotopoulos, C.; Kolias, C. Best of both worlds: Detecting application layer attacks through 802.11 and non-802.11 features. Sensors 2002, 22, 5633. [Google Scholar] [CrossRef]
- Chatzoglou, E.; Kambourakis, G.; Kolias, C.; Smiliotopoulos, C. Pick quality over quantity: Expert feature selection and data preprocessing for 802.11 intrusion detection systems. IEEE Access 2022, 10, 64761–64784. [Google Scholar] [CrossRef]
- Saini, R.; Halder, D.; Baswade, A.M. RIDS: Real-time Intrusion Detection System for WPA3 enabled Enterprise Networks. arXiv 2022, arXiv:2207.02489. [Google Scholar] [CrossRef]
- Zheng, C.; Zang, M.; Hong, X.; Bensoussane, R.; Vargaftik, S.; Ben-Itzhak, Y.; Zilberman, N. Automating in-network machine learning. arXiv 2022, arXiv:2205.08824. [Google Scholar]
- Sethuraman, S.C.; Dhamodaran, S.; Vijayakumar, V. Intrusion detection system for detecting wireless attacks in IEEE 802.11 networks. IET Networks 2019, 8, 219–232. [Google Scholar] [CrossRef]
- Anthi, E.; Williams, L.; Slowinska, M.; Theodorakopoulos, G.; Burnap, P. A Supervised Intrusion Detection System for Smart Home IoT Devices. IEEE Internet Things J. 2019, 6, 9042–9053. [Google Scholar] [CrossRef]
- Alzubi, Q.M.; Anbar, M.; Alqattan, Z.N.M.; Al-Betar, M.A.; Abdullah, R. Intrusion detection system based on a modified binary grey wolf optimisation. Neural Comput. Appl. 2019, 32, 6125–6137. [Google Scholar] [CrossRef]
- Alam, S.; Yao, N. The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 2018, 25, 319–335. [Google Scholar] [CrossRef]
- Go, R. Bhayani, and L. Huang, “Twitter sentiment classification using distant supervision,” CS224N project report, Stanford, vol. 1, no. 12, p. 2009, 2009.
- Kubik, C.; Knauer, S.M.; Groche, P. Smart sheet metal forming: importance of data acquisition, preprocessing and transformation on the performance of a multiclass support vector machine for predicting wear states during blanking. Journal of Intelligent Manufacturing 2022, 33, 259–282, 2022. [Google Scholar] [CrossRef]
- Chatzoglou, E.; Kambourakis, G.; Kolias, C. Empirical evaluation of attacks against ieee 802.11 enterprise networks: The awid3 dataset. IEEE Access 2021, 9, 34188–34205. [Google Scholar] [CrossRef]
- Kolias, C.; Kambourakis, G.; Stavrou, A.; Gritzalis, S. Intrusion Detection in 802.11 Networks: Empirical Evaluation of Threats and a Public Dataset. IEEE Commun. Surv. Tutorials 2015, 18, 184–208. [Google Scholar] [CrossRef]
- Arora. Preventing wireless deauthentication attacks over 802. arXiv 2018, arXiv:1901.07301. [Google Scholar]
- Aung, M.A.C.; Thant, K.P. Ieee 802.11 attacks and defenses,” Ph.D. dissertation, MERAL Portal, 2019.
- Cheema, R.; Bansal, D.; Sofat, S. Deauthentication/Disassociation Attack: Implementation and Security in Wireless Mesh Networks. Int. J. Comput. Appl. 2011, 23, 7–15. [Google Scholar] [CrossRef]
- Lee, B. Stateless Re-Association in WPA3 Using Paired Token. Electronics 2021, 10, 215. [Google Scholar] [CrossRef]
- Zhou, T.; Cai, Z.; Xiao, B.; Chen, Y.; Xu, M. Detecting rogue ap with the crowd wisdom. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS); IEEE, 2017; pp. 2327–2332. [Google Scholar]
- Kohlios, C.P.; Hayajneh, T. A Comprehensive Attack Flow Model and Security Analysis for Wi-Fi and WPA3. Electronics 2018, 7, 284. [Google Scholar] [CrossRef]
- M. Cˇ erma’k, S. Svorencˇ’ık, and R. Lipovsky`, “Kr00k-cve-2019-15126–serious vulnerability deep inside your wi-fi encryption,” ESET Research White Paper, 2020.
- Wanjau, S.K.; Wambugu, G.M.; Kamau, G.N. SSH-Brute Force Attack Detection Model based on Deep Learning. Int. J. Comput. Appl. Technol. Res. 2021, 10, 42–50. [Google Scholar] [CrossRef]
- Najafabadi, M.M.; Khoshgoftaar, T.M.; Kemp, C.; Seliya, N.; Zuech, R. Machine learning for detecting brute force attacks at the network level. In Proceedings of the 2014 IEEE International Conference on Bioinformatics and Bioengineering; IEEE, 2014; pp. 379–385. [Google Scholar]
- Ahmed, A.A.; Jabbar, W.A.; Sadiq, A.S.; Patel, H. Deep learning-based classification model for botnet attack detection. J. Ambient. Intell. Humaniz. Comput. 2020, 13, 3457–3466. [Google Scholar] [CrossRef]
- Aslan, Ö.A.; Samet, R. A comprehensive review on malware detection approaches. IEEE Access 2020, 8, 6249–6271. [Google Scholar] [CrossRef]
- Liu, X.; Zheng, L.; Cao, S.; Helal, S.; Zhou, J.; Jia, H.; Zhang, W. A Multi-location Defence Scheme Against SSDP Reflection Attacks in the Internet of Things. 2019, 187–198. [Google Scholar] [CrossRef]
- Halfond, W.G.; Viegas, J.; Orso, A. A classification of sql-injection attacks and countermeasures. In Proceedings of the IEEE international symposium on secure software engineering; IEEE, 2006; pp. 13–15. [Google Scholar]
- Shrivastava, P.; Jamal, M.S.; Kataoka, K. EvilScout: Detection and Mitigation of Evil Twin Attack in SDN Enabled WiFi. IEEE Trans. Netw. Serv. Manag. 2020, 17, 89–102. [Google Scholar] [CrossRef]
- Ahmed, A.; Abdullah, N.A. Real time detection of phishing websites. In Proceedings of the 2016 IEEE 7th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON); 2016; pp. 1–6. [Google Scholar]












| The AWID3 Feature Description |
||
|---|---|---|
| radiotap.dbm-antsignal | wlan-radio.signal-dbm | tcp.checksum |
| tcp.payload | wlan.duration | frame.time-delta-displayed |
| frame.time-delta | frame.time | tcp.time-relative |
| radiotap.channel.freq | wlan.fc.moredata | wlan-radio.frequency |
| wlan-radio.channel | wlan.fc.ds | wlan.fc.type |
| wlan.fc.protected | radiotap.channel.flags. cck | wlan.fc.subtype |
| wlan.fc.pwrmgt | wlan-radio.phy | radiotap.channel.flags. ofdm |
| radiotap.present.tsft | wlan.ra | radiotap.length |
| wlan.fc.retry | wlan.ta | wlan.bssid |
| wlan.sa | llc | ip.version |
| ip.proto | tcp.checksum.status | ip.ttl |
| ip.src | tcp.flags.reset | tcp.flags.syn |
| tcp.flags.fin | tcp.flags.ack | tcp.flags.push |
| frame.number | frame.len | frame.time-relative |
| wlan.sa | tcp.ack | tcp.analysis |
| tcp.seq | tcp.seq-raw | tcp.time-delta |
| Attack | Normal traffic | Malicious traffic |
|---|---|---|
| Deauth | 1,587,527 | 38,942 |
| Disas | 1,938,585 | 75,131 |
| (Re)Assoc | 1,838,430 | 5,502 |
| Rogue AP | 1,971,875 | 1,310 |
| Krack | 1,388,498 | 49,990 |
| Kr00k | 2,708,637 | 186,173 |
| SSH | 2,428,688 | 11,882 |
| Botnet | 3,169,167 | 56,891 |
| Malware | 2,181,148 | 131,611 |
| SQL Injection | 2,595,727 | 2,629 |
| SSDP | 2,641,517 | 5,456,395 |
| Evil Twin | 3,673,854 | 104,827 |
| Website spoofing | 2,263,446 | 405,121 |
| Total | 30,387,099 | 6,526,404 |
| Attack type | Traffic in the sample |
|---|---|
| Krack | 20,000 |
| Kr00k | 20,000 |
| Disas | 20,000 |
| Malware | 20,000 |
| SSDP | 20,000 |
| Normal | 20,000 |
| Total | 120,000 |
| Gain Ratio—Nomnal | ||||||||
| Splitting data 70% train and 30% test | 10-fold cross-validation | |||||||
| Algorithm | Accuracy | Precision | Recall | F-Measure | Accuracy | Precision | Recall | F-Measure |
| treesJ48 | 99.82% | 0.997 | 0.997 | 0.977 | 99.84% | 0.998 | 0.998 | 0.998 |
| NaiveBayes | 98.76% | 0.997 | 1 | 0.999 | 99.21% | 0.998 | 0.996 | 0.997 |
| Logistic | 99.82% | 1 | 1 | 1 | 99.73% | 0.998 | 0.989 | 0.993 |
| Info Gain - Nominal | ||||||||
| Splitting data 70% train and 30% test | 10-fold cross-validation | |||||||
| Algorithm | Accuracy | Precision | Recall | F-Measure | Accuracy | Precision | Recall | F-Measure |
| treesJ48 | 99.67% | 1 | 1 | 1 | 99.69% | 1 | 1 | 1 |
| NaiveBayes | 92.38% | 0.995 | 0.998 | 0.996 | 92.39% | 0.995 | 0.998 | 0.996 |
| Random Tree | 99.44% | 1 | 0.997 | 0.998 | 99.49% | 0.99 | 1 | 0.99 |
| Gain Ratio - Nominal | ||||
| Algorithm | Overall Accuracy | Average Accuracy | Precision | Recall |
| Multiclass Decision Forest | 0.91372 | 0.97124 | 0.9587 | 0.9709 |
| Multiclass Decision Jungle | 0.89103 | 0.96368 | 0.9155 | 0.8911 |
| Multiclass Logistic Regression | 0.99989 | 0.99996 | 0.9999 | 0.9999 |
| Info Gain - Nominal | ||||
| Multiclass Decision Forest | 0.99133 | 0.99711 | 0.9916 | 0.9913 |
| Multiclass Decision Jungle | 0.92969 | 0.97657 | 0.9393 | 0.9296 |
| Multiclass Logistic Regression | 0.94375 | 0.98125 | 0.9569 | 0.9436 |
| Attack type | Class Value | Traffic in the sample |
|---|---|---|
| Normal | 0 | 20,000 |
| Krack | 1 | 20,000 |
| Disas | 2 | 20,000 |
| SSDP | 4 | 20,000 |
| Malware | 5 | 20,000 |
| Total | 120,000 |
| Gain Ratio- Numerical | ||||
| Splitting data 70% train and 30% test | ||||
| Algorithm | Correlation coefficient | Mean absolute error | Relative absolute error | Root relative squared error |
| DecisionStump | 0.8297 | 0.7723 | 51.39% | 55.83% |
| Random Tree | 0.8939 | 0.4455 | 29.65% | 45.31% |
| 10-fold cross-validation | ||||
| Algorithm | Correlation coefficient | Mean absolute error | Relative absolute error | Root relative squared error |
| DecisionStump | 0.8282 | 0.7732 | 51.84% | 56.03% |
| Random Tree | 0.7005 | 0.795 | 53.30% | 71.36% |
| Info Gain- Numerical | ||||
| Splitting data 70% train and 30% test | ||||
| Algorithm | Correlation coefficient | Mean absolute error | Relative absolute error | Root relative squared error |
| DecisionStump | 0.6079 | 1.0661 | 71.19% | 79.40% |
| Random Tree | 0.9965 | 0.0268 | 1.79% | 8.48% |
| 10-fold cross-validation | ||||
| Algorithm | Correlation coefficient | Mean absolute error | Relative absolute error | Root relative squared error |
| DecisionStump | 0.6079 | 1.0661 | 71.19% | 79.40% |
| Random Tree | 0.9958 | 0.0169 | 1.13% | 9.14% |
| Gain Ratio- Numerical | ||||
| Algorithm | Overall Accuracy | Average Accuracy | Precision | Recall |
| Multiclass Decision Forest | 0.94972 | 0.98324 | 0.94972 | 0.94972 |
| Multiclass Decision Jungle | 0.89397 | 0.96466 | 0.9031 | 0.894 |
| Multiclass Logistic Regression | 0.99994 | 0.99998 | 0.9999 | 0.9999 |
| Info Gain- Numerical | ||||
| Algorithm | Overall Accuracy | Average Accuracy | Precision | Recall |
| Multiclass Decision Forest | 0.99133 | 0.99711 | 0.99133 | 0.99133 |
| Multiclass Decision Jungle | 0.92969 | 0.97657 | 0.9393 | 0.9296 |
| Multiclass Logistic Regression | 0.94381 | 0.98127 | 0.9569 | 0.9437 |
| Algorithm | Accuracy | Precision | Recall | F1 Score | |
|---|---|---|---|---|---|
| Two-Class | Logistic Regression | 0.994 | 0.998 | 0.927 | 0.961 |
| Two- Class | Decision Jungle | 0.888 | 0.993 | 0.783 | 0.876 |
| Two-Class | Decision Forest | 0.947 | 0.977 | 0.916 | 0.945 |
| Two-Class | Boosted Decision Tree | 0.968 | 1 | 0.614 | 0.76 |
| Two-Class | Support Vector Machine | 0.993 | 0.994 | 0.927 | 0.959 |
| Two-Class | Locally Deep Support Vector Machine | 0.995 | 1 | 0.938 | 0.968 |
| Algorithm | Correlation coefficient | Mean absolute error | Relative absolute error | Root relative squared error |
|---|---|---|---|---|
| DecisionStump | 0.9273 | 0.0762 | 15.2441 % | 37.4972 % |
| Random Tree | 0.9038 | 0.0784 | 15.6863% | 42.9573% |
| Decision Table | 0.9192 | 0.0771 | 15.4243 % | 39.4434 % |
| Algorithm | Accuracy | True Positive Rate (TPR) for class 1 |
False Negative Rate (FNR) for class 1 |
True Positive Rate (TPR) for class |
False Negative Rate (FNR) for class 0 |
|---|---|---|---|---|---|
| Decision tree- fine tree Decision tree- medium tree |
95.2% 95.2% |
92.2% 92.2% |
7.8% 7.8% |
98.2% 98.2% |
1.8% 1.8% |
| Decision tree- coarse tree | 94.6% | 91.5% | 8.5% | 97.8% | 2.2% |
| Ensemble classification- Boosted tree |
99.0% | 99.9% | .1% | 98% | 2% |
| Ensemble classification- Bagged tree | 91.3% | 84.2% | 15.7% | 98.3% | 1.7% |
| Ensemble classification- Subspace discriminant |
86.7% | 89.3% | 10.7% | 84.2% | 15.8% |
| Na¨ıve Bayes | 95.3% | 98.4% | 1.6% | 92.2% | 7.8% |
| Reference | Attack | Feature Selection | Approach and Accuracy |
|---|---|---|---|
| [7] | Attacks on Application Layer ( Botnet, Malware, SSH, SQL Injection, SSDP amplification, and Web- site spoofing) |
Yes | ML: 98.7% DL: 97.86% F.S: 99% |
| [8] | Flood category contains Deauth, Disas, Assoc, and Kr00k attacks. Impersonation contains: RogueAP, EvilT win, andKrack |
YES | ML and DNN: 99.96% |
| [9] | De-authentication, Rogue AP, Evil Twin, Krack, and SSID |
NO | ML :99.7% |
| [10] | All attacks | NO | SVM:79% DT: 99.8% |
| Our Work | Krack. Kr00k, Dis, Malware and SSDP | Yes | Multi class: 99.9% Bi- nary: 99% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).