Submitted:
12 August 2024
Posted:
14 August 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Background and Motivation
1.2. Related Work
1.2.1. Autoencoder and Its Application to Intrusion Detection
1.2.2. On-Device Learning Anomaly Detector
1.2.3. Multiple Autoencoders Joint Decision-Making
1.3. Challenges in This Study
1.4. Our Contributions
- The concept of paired autoencoders is introduced, where an autoencoder trained on attack data is paired with another trained on normal data.
- The data is partitioned into multiple regions to reduce the complexity of data distribution in each region and finally to improve the detection performance. For this purpose, multiple autoencoders are used for initial data prediction, and based on the prediction results, the data is partitioned.
- The threshold characteristics of the autoencoders are leveraged to precisely detect data types within each region.
- Even when dealing with traffic that contains a mix of different types of attacks, our proposal performs well, as has been validated on the public dataset (NSL-KDD).
2. Literature Review
3. Our Proposal
3.1. Overviews
3.2. Training Multiple Autoencoders Using Attack Data and Normal Data, Respectively
3.3. First Layer — Data Partitioning
- If all paired autoencoders classify the data as normal, then the data is classified into the quasi-normal region.
- If any paired autoencoder determines the data to be abnormal (i.e., both the autoencoder trained with normal data and the autoencoder trained with attack data classify the data as abnormal), then the data is classified into the quasi-attack region.
- Otherwise, If the performance of the data on any paired autoencoder is in the intersection, then the data is classified into the divergence region.
- In all other cases, the data is classified into the undetermined region. For example, when there is a pair of autoencoders, where the one trained on normal data classifies the data as an attack, and the one trained on attack data classifies the data as non-attack.
- Data falls within the intersection only in one paired autoencoders for one kind of attack;
- Data falls within the intersection in multiple paired autoencoders.
3.4. Second Layer — Precise Detection
4. Experiments
4.1. Dataset
- Basic features: Describe the basic attributes of a single TCP connection, such as duration, protocol type, service type, etc. These features are derived from basic information at the IP and TCP layers.
- Content features: Describe features related to the contents of the data packets, such as the number of failed login attempts, number of access control files, etc. These features are mainly used to detect U2R and R2L attacks, as these attacks often involve spoofing or abnormal login behaviors.
- Time-based traffic features: Traffic features calculated based on a time window, such as the time interval between connections, the number of connections to the same service within a time window, etc. These features help detect DoS and Probe attacks, as these attacks often manifest as a large number of connection requests in a short period.
- Host-based traffic features: Statistical features based on the host, such as the number of connections to the same host, the number of connections to the same host within a specific time window, etc. These features are used to identify attack behaviors targeting a single host.
4.2. Data Preprocessing
4.3. Model Training and Selecting
4.4. Evaluation Metrics
4.5. Experiment Result: Overall Accuracy and Overall F1-Score
4.6. Experiment Result: Within-Region Accuracy
4.7. Training Time and Model Size Analysis
5. Conclusion and Future Work
References
- Abdul-Qawy A S, Pramod P J, Magesh E, et al. The internet of things (iot): An overview[J]. International Journal of Engineering Research and Applications 2015, 5, 71–82. [Google Scholar]
- Gartner Says Worldwide IoT Security Spending Will Reach $1.5 Billion in 2018 (https://www.gartner.com/en/newsroom/press-releases/2018-03-21-gartner-says-worldwide-iot-security-spending-will-reach-1-point-5-billion-in-2018.
- Antonakakis M, April T, Bailey M, et al. Understanding the mirai botnet[C]//26th USENIX security symposium (USENIX Security 17). 2017, 1093-1110.
- Khan R, Maynard P, McLaughlin K, et al. Threat analysis of blackenergy malware for synchrophasor based real-time control and monitoring in smart grid[C]//4th International Symposium for ICS & SCADA Cyber Security Research 2016. BCS, 2016: 53-63.
- Farwell J P, Rohozinski R. Stuxnet and the future of cyber war[J]. Survival 2011, 53, 23–40. [Google Scholar] [CrossRef]
- Stanislav M, Beardsley T. Hacking iot: A case study on baby monitor exposures and vulnerabilities[J]. Rapid7 Report, 2015.
- Eskandari M, Janjua Z H, Vecchio M, et al. Passban IDS: An intelligent anomaly-based intrusion detection system for IoT edge devices[J]. IEEE Internet of Things Journal 2020, 7, 6882–6897. [Google Scholar] [CrossRef]
- Nguyen T D, Marchal S, Miettinen M, et al. DÏoT: A federated self-learning anomaly detection system for IoT[C]//2019 IEEE 39th International conference on distributed computing systems (ICDCS). IEEE. 2019; 756–767.
- Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors[J]. nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Le Cun Y, Fogelman-Soulié F. Modèles connexionnistes de l'apprentissage[J]. Intellectica 1987, 2, 114–143. [Google Scholar]
- Choi H, Kim M, Lee G, et al. Unsupervised learning approach for network intrusion detection system using autoencoders[J]. The Journal of Supercomputing 2019, 75, 5597–5621. [Google Scholar] [CrossRef]
- Ieracitano C, Adeel A, Morabito F C, et al. A novel statistical analysis and autoencoder driven intelligent intrusion detection approach[J]. Neurocomputing 2020, 387, 51–62. [Google Scholar] [CrossRef]
- Tsukada M, Kondo M, Matsutani H. A neural network-based on-device learning anomaly detector for edge devices[J]. IEEE Transactions on Computers 2020, 69, 1027–1044. [Google Scholar] [CrossRef]
- Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: a new learning scheme of feedforward neural networks[C]//2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541). Ieee. 2004, 2, 985–990. [CrossRef]
- Yang Qin, Masaaki Kondo,”Federated Learning-Based Network Intrusion Detection with a Feature Selection Approach”,IPSJ SIG Technical Report (in Japanese), Vol.2021-EMB-56 No.24, Pages 1-7, 2021. [CrossRef]
- NSL_KDD dataset: http://nsl.cs.unb.ca/NSL-KDD/.
- Ioulianou P, Vasilakis V, Moscholios I, et al. A signature-based intrusion detection system for the internet of things[J]. Information and Communication Technology Form, 2018.
- Li W, Tug S, Meng W, et al. Designing collaborative blockchained signature-based intrusion detection in IoT environments[J]. Future Generation Computer Systems 2019, 96, 481–489. [Google Scholar] [CrossRef]
- Sheikh N U, Rahman H, Vikram S, et al. A lightweight signature-based IDS for IoT environment[J]. arXiv, 2018; arXiv:1811.04582.
- Lo W W, Layeghy S, Sarhan M, et al. E-graphsage: A graph neural network based intrusion detection system for iot[C]//NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. IEEE, 2022: 1-9. [CrossRef]
- Zhou X, Liang W, Li W, et al. Hierarchical adversarial attacks against graph-neural-network-based IoT network intrusion detection system[J]. IEEE Internet of Things Journal 2021, 9, 9310–9319. [Google Scholar] [CrossRef]
- Almiani M, AbuGhazleh A, Al-Rahayfeh A, et al. Deep recurrent neural network for IoT intrusion detection system[J]. Simulation Modelling Practice and Theory 2020, 101, 102031. [Google Scholar] [CrossRef]
- Wang M, Yang N, Weng N. Securing a smart home with a transformer-based iot intrusion detection system[J]. Electronics 2023, 12, 2100. [Google Scholar] [CrossRef]
- Fraihat S, Makhadmeh S, Awad M, et al. Intrusion detection system for large-scale IoT NetFlow networks using machine learning with modified Arithmetic Optimization Algorithm[J]. Internet of Things 2023, 22, 100819. [Google Scholar] [CrossRef]
- KDD Cup dataset: https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
- Bala R, Nagpal R. A review on kdd cup99 and nsl nsl-kdd dataset[J]. International Journal of Advanced Research in Computer Science 2019, 10. [Google Scholar]






| Type of Attack Data | Selected Feature |
|---|---|
| DoS | protocol_type, flag, wrong_fragment, num_compromised, root_shell, num_shells, same_srv_rate |
| Probe | logged_in, num_root, num_shells, in_host_login, srv_rerror_rate, srv_serror_rate, srv_diff_host_rate |
| R2L | service, urgent, num_root, num_shells, num_access_files, is_guest_login, srv_serror_rate |
| U2R | flag, land, hot, num_failed_login, logged_in, num_compromised, su_attempted, num_root, num_file_creations, num_outbound_cmds, is_guest_login, same_srv_rate, dst_host_srv_count |
| Region | Selected autoencoder | |
| Quasi-normal region | DoS_pos | |
| Quasi-attack region | DoS_pos | |
| Undetermined region | U2R_neg | |
| Divergence region | Region1 | R2L_neg |
| Region2 | U2R_neg | |
| Region3 | DoS_pos | |
| Region4 | R2L_neg | |
| Region5 | R2L_neg | |
| a) Confusion Matrix of previous research | ||
| Attack(Predicted) | Normal(Predicted) | |
| Attack(Actual) | 12136 | 696 |
| Normal(Actual) | 1446 | 8265 |
| b) Confusion Matrix of precise detection models | ||
| Attack(Predicted) | Normal(Predicted) | |
| Attack(Actual) | 12143 | 689 |
| Normal(Actual) | 1442 | 8269 |
| c) Confusion Matrix of precise this study | ||
| Attack(Predicted) | Normal(Predicted) | |
| Attack(Actual) | 12350 | 482 |
| Normal(Actual) | 878 | 8833 |
| Region | Without Precise Detection | With Precise Detection | Increase Magnitude |
|---|---|---|---|
| Quasi-Normal Region | 93.18% | 96.31% | 3.13% |
| Quasi-Attack Region | 89.84% | 94.66% | 4.82% |
| Divergence Region | 68.37% | 89.05% | 20.68% |
| Undertermined Region | 96.48% | 96.48% | 0 |
| Model | Without Precise Detection | Model Size |
|---|---|---|
| RF | 12.6s | 2.29MB |
| SVM(kernel = linear) | 20min+ | |
| This Study | 42.3s | 484KB |
| Previous Research | 29.5s | 242KB |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).