Submitted:
04 March 2024
Posted:
05 March 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction

- Innovative Approach to Feature Selection: Introduced novel feature selection techniques specifically tailored for Intrusion Detection Systems (IDS) in IoT environments. This approach not only significantly reduces the dimensionality of the dataset but also ensures the selection of the most relevant features, enhancing the overall efficiency and effectiveness of intrusion detection.
- Enhanced Intrusion Detection Performance: Demonstrated through rigorous experiments that our methodology outperforms existing methods in extracting optimal feature subsets. It achieves superior classification accuracy with fewer features, significantly reducing computational time and power consumption, thereby addressing the critical constraints of IoT-based IDS.
- Benchmarking Against Outdated Practices: Highlighted the limitations of using non-compatible and outdated datasets, like the KDD’99, in current IDS research. Our results underscore the importance of updated and relevant datasets for developing more accurate and efficient IDS solutions tailored to modern IoT ecosystems.
2. Related Work
3. Proposed Methodology
- Initial Stage (Data Preprocessing).
- Second Stage (Dimension Reduction).
- Feature Selection Stage.
- Final Stage (Model Evaluation).
- Caching mechanism.
- A. Initial Stage (Data Preprocessing).
- B. Second Stage (Dimension Reduction).
- C. Feature Selection Stage.
- Filter Methods: Filter approaches don’t need to train a machine learning model because they rely on statistical metrics. They evaluate each feature’s importance separately, without reference to the learning algorithm. Common filter methods include Correlation-based feature selection, Chi-squared test, and Information gain and mutual information.
- Wrapper Methods: By training and evaluating machine learning models on various feature combinations, wrapper approaches assess feature subsets. Although they require more computing power than filter approaches, they may produce feature subsets that are better. Common wrapper methods include Forward selection, Backward elimination, and Recursive feature elimination (RFE).
- Embedded Methods: Feature selection is a crucial step in the model training process that is carried out by embedded methods. Usually, they are employed with algorithms that facilitate feature selection by default. Common embedded methods include L1 Regularization (Lasso), Tree-based methods, and Feature importance from Support Vector Machines (SVMs).

| Algorithm 1: Rank features using a Random Forest classifier pseudocode. |
|
- Correlation: Calculate the correlation between features and remove highly correlated or redundant ones.
- Variance Threshold: Remove features with low variance as they often contain little information.
- Statistical Tests: Utilize statistical tests (e.g., chi-squared, ANOVA) to select features that significantly contribute to the classification.
- Feature Ranking: In this manner, you might not need to assess every feature in the dataset and instead concentrate on the most promising features first.
- Pruning: Early in the feature subset evaluation process, prune or remove any subsets that don’t seem promising. This expedites the feature selection process and shrinks the search space. In the proposed methodology, feature ranking is performed in each round, while pruning is conducted every n number of iterations.
- Caching: Implementing a caching mechanism can significantly enhance the efficiency of our feature selection methodology. During our proposed method journey, the system meticulously identifies and selects the most significant features, which are then cached for subsequent iterations. The system checks the dataset’s feature count after each iteration. If there is no change in the number of features, the system will swiftly retrieve the previously selected features from the cache, skipping unnecessary steps and advancing directly to the final evaluation stage. Conversely, any alteration in feature count necessitates a fresh execution through all four stages—beginning with Data Preprocessing and culminating in Model Evaluation.
- D. Final Stage (Model Evaluation).
4. Experimental Results and Discussion
- A. Dataset
- B. Experimental Result and Analysis
- C. Evaluation Metrics
- True Positive (TP): The number of correctly detected intrusions or attacks by the IDS.
- True Negative (TN): The number of correctly identified normal or non-malicious network activities by the IDS.
- False Positive (FP): The number of normal activities incorrectly classified as intrusions or attacks by the IDS. This is also known as a “false alarm.”
- False Negative (FN): The number of intrusions or attacks that the IDS failed to detect or classify as normal. This is also known as a “missed detection.”
- Accuracy: The ratio of correctly identified instances (TP + TN) to the total number of instances. It provides an overall measure of the IDS’s correctness.
- Precision: Precision is defined as the ratio of true positives (TP) to the total number of instances classified as positive by the IDS (TP + FP). It measures the ability of the IDS to avoid false positives.
- Recall (Sensitivity or True Positive Rate): The ratio of true positives (TP) to the total number of actual positive instances (TP + FN). It measures the IDS’s ability to identify all positive instances.
- F1 Score: The harmonic mean of precision and recall. It balances the trade-off between false positives and false negatives [1].
5. Future Work and Challenges
6. Conclusions
References
- N. T. Cam and N. G. Trung, “An Intelligent Approach to Improving the Performance of Threat Detection in IoT,” in IEEE Access, vol. 11, pp. 44319-44334, 2023. [CrossRef]
- H. Chen, X. Ma and S. Huang, “A Feature Selection Method for Intrusion Detection Based on Parallel Sparrow Search Algorithm,” 2021 16th International Conference on Computer Science & Education (ICCSE), Lancaster, United Kingdom, 2021, pp. 685-690. [CrossRef]
- R. Zhao, Y. Mu, L. Zou, and X. Wen, “A Hybrid Intrusion Detection System Based on Feature Selection and Weighted Stacking Classifier,” in IEEE Access, vol. 10, pp. 71414-71426, 2022. [CrossRef]
- Harahsheh, K. M., & Chen, C. H. (2023). A Survey of Using Machine Learning in IoT Security and the Challenges Faced by Researchers. Informatica, 47(6). [CrossRef]
- B. Natarajan, S. Bose, N. Maheswaran, G. Logeswari and T. Anitha, “A New High-Performance Feature Selection Method for Machine Learning-Based IOT Intrusion Detection,” 2023 12th International Conference on Advanced Computing (ICoAC), Chennai, India, 2023, pp. 1-8. [CrossRef]
- M. Yesaswini and K. Annapurna, “A Hybrid Approach for Intrusion Detection System to Enhance Feature Selection,” 2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), Trichy, India, 2023, pp. 1301-1307. [CrossRef]
- N. Abbas, Y. Nasser, M. Shehab and S. Sharafeddine, “Attack-Specific Feature Selection for Anomaly Detection in Software-Defined Networks,” 2021 3rd IEEE Middle East and North Africa COMMunications Conference (MENACOMM), Agadir, Morocco, 2021, pp. 142-146. [CrossRef]
- M. S. Elsayed, N. -A. Le-Khac and A. D. Jurcut, “InSDN: A Novel SDN Intrusion Dataset,” in IEEE Access, vol. 8, pp. 165263-165284, 2020. [CrossRef]
- Y. Li, K. Shi, F. Qiao, and H. Luo, “A Feature Subset Selection Method Based on the Combination of PCA and Improved GA,” 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 2020, pp. 191-194. [CrossRef]
- D. P. Hostiadi, Y. P. Atmojo, R. R. Huizen, I. M. D. Susila, G. A. Pradipta and I. M. Liandana, “A New Approach Feature Selection for Intrusion Detection System Using Correlation Analysis,” 2022 4th International Conference on Cybernetics and Intelligent System (ICORIS), Prapat, Indonesia, 2022, pp. 1-6. [CrossRef]
- Z. Zhang, J. Wen, J. Zhang, X. Cai, and L. Xie, “A Many Objective-Based Feature Selection Model for Anomaly Detection in Cloud Environment,” in IEEE Access, vol. 8, pp. 60218-60231, 2020. [CrossRef]
- Yu, T., Liu, Z., Liu, Y., Wang, H., & Adilov, N. (2020, November). A New Feature Selection Method for Intrusion Detection System Dataset–TSDR method. In 2020 16th International Conference on Computational Intelligence and Security (CIS) (pp. 362-365). IEEE. [CrossRef]
- G. Parimala and R. Kayalvizhi, “An Effective Intrusion Detection System for Securing IoT Using Feature Selection and Deep Learning,” 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2021, pp. 1-4. [CrossRef]
- S. Sarvari, N. F. Mohd Sani, Z. Mohd Hanapi and M. T. Abdullah, “An Efficient Anomaly Intrusion Detection Method With Feature Selection and Evolutionary Neural Network,” in IEEE Access, vol. 8, pp. 70651-70663, 2020. [CrossRef]
- Zainudin, R. Akter, D. -S. Kim and J. -M. Lee, “Towards Lightweight Intrusion Identification in SDN-based Industrial Cyber-Physical Systems,” 2022 27th Asia Pacific Conference on Communications (APCC), Jeju Island, Korea, Republic of, 2022, pp. 610-614. [CrossRef]
- M. N. Yilmaz and B. Bardak, “An Explainable Anomaly Detection Benchmark of Gradient Boosting Algorithms for Network Intrusion Detection Systems,” 2022 Innovations in Intelligent Systems and Applications Conference (ASYU), Antalya, Turkey, 2022, pp. 1-6. [CrossRef]
- D. Firdaus, R. Munadi and Y. Purwanto, “DDoS Attack Detection in Software Defined Network using Ensemble K-means++ and Random Forest,” 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 2020, pp. 164-169. [CrossRef]
- Zainudin, R. Akter, D. -S. Kim and J. -M. Lee, “Federated Learning Inspired Low-Complexity Intrusion Detection and Classification Technique for SDN-Based Industrial CPS,” in IEEE Transactions on Network and Service Management, vol. 20, no. 3, pp. 2442-2459, Sept. 2023. [CrossRef]
- Maulana Ibrahimy, F. Dewanta, and M. Erza Aminanto, “Lightweight Machine Learning Prediction Algorithm for Network Attack on Software Defined Network,” 2022 IEEE Asia Pacific Conference on Wireless and Mobile (APWiMob), Bandung, Indonesia, 2022, pp. 1-6. [CrossRef]
- V. Hnamte and J. Hussain, “Network Intrusion Detection using Deep Convolution Neural Network,” 2023 4th International Conference for Emerging Technology (INCET), Belgaum, India, 2023, pp. 1-6. [CrossRef]
- G. Fu, B. Li, Y. Yang, and Q. Wei, “A Multi-Distance Ensemble and Feature Clustering Based Feature Selection Approach for Network Intrusion Detection,” 2022 International Symposium on Sensing and Instrumentation in 5G and IoT Era (ISSI), Shanghai, China, 2022, pp. 160-164. [CrossRef]
- T. -C. Vuong, H. Tran, M. X. Trang, V. -D. Ngo and T. V. Luong, “A Comparison of Feature Selection and Feature Extraction in Network Intrusion Detection Systems,” 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Chiang Mai, Thailand, 2022, pp. 1798-1804. [CrossRef]
- Raman, S. K. Jha and A. Arora, “An Enhanced Intrusion Detection System Using Combinational Feature Ranking and Machine Learning Algorithms,” 2022 2nd International Conference on Intelligent Technologies (CONIT), Hubli, India, 2022, pp. 1-8. [CrossRef]
- P. H. Prastyo, I. Ardiyanto and R. Hidayat, “A Review of Feature Selection Techniques in Sentiment Analysis Using Filter, Wrapper, or Hybrid Methods,” 2020 6th International Conference on Science and Technology (ICST), Yogyakarta, Indonesia, 2020, pp. 1-6. [CrossRef]
- T. Bubolz, M. Grellert, B. Zatt, and G. Correa, “Coding Tree Early Termination for Fast HEVC Transrating Based on Random Forests,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 1802-1806. [CrossRef]
- M. Bakro et al., “An Improved Design for a Cloud Intrusion Detection System Using Hybrid Features Selection Approach With ML Classifier,” in IEEE Access, vol. 11, pp. 64228-64247, 2023. [CrossRef]





| Ref # | ML Tech. | Accuracy | # of features | Time (second) |
|---|---|---|---|---|
| [8] | RF | 99.42% | 48 | 226.151 |
| [15] | CNN | 98.98% | 30 | 0.164 |
| [16] | RF | 98% | 20 | 20.50 |
| [17] | KNN and RF | 100% | 15 | 1.5 |
| [18] | CNN | 90.83% | 19 | 21.0506 |
| [19] | KNN | 99.9961% | 56 | 18.69 |
| [20] | DCNN | 99.94% | - | - |
| Our paper | Hybrid Feature Selection | 99.99% | 11 | 0.8599 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).