3. Related Work
The rising danger of cyberattacks on Internet of Things (IoT) systems has highlighted the limitations of intrusion detection systems, especially when challenged with advanced threats in complex environments. (Jullian et al. 2023) . To address these threats, the authors propose a new distributed framework based on deep learning attack detection techniques. This framework relies on deep learning models that primarily focus on feedforward neural networks and LSTMs to examine network traffic behaviors for detecting illegitimate performance. The proposed solution distributes detection workloads across various network nodes, resulting in improved scalability, more efficient operation, and greater network resilience. The research demonstrates how the framework accurately detects different types of attacks through performance evaluation with real-world testing. Overall, the research significantly enhances IoT system security by presenting an advanced strategy to detect cyber threats more effectively.
The limitations of intrusion detection systems in identifying various cyberattack types have prompted exploration into more advanced techniques. Deep learning shows significant promise for enhancing these systems, which is why a new model for multi-attack classification is proposed (Silivery et al. 2023). This model used convolutional neural networks (CNNs) and recurrent neural networks (RNNs) with long short-term memory (LSTM) units to analyze network traffic data for identifying multiple attack types. The model is trained on an extensive dataset, with evaluation based on performance metrics such as accuracy, precision, recall, and F1-score. The approach aims to improve both the operational effectiveness and predictive accuracy of intrusion detection systems, thereby strengthening cyber defense mechanisms for digital networks.
Traditional techniques struggle to detect and prevent advanced DDoS attacks, prompting the exploration of deep-learning methods (Aktar and Yasin Nur 2023). The authors propose utilizing deep learning approaches to enhance DDoS attack detection, given their ability to better handle complex attack patterns. This research develops a deep learning model through the implementation of neural networks, including recurrent neural networks (RNNs) with long short-term memory (LSTM) units and convolutional neural networks (CNNs), to effectively analyze network traffic data and distinguish between normal and DDoS attack traffic patterns. The model is trained on relevant data and evaluated using performance metrics such as accuracy, precision, recall, and F1-score. By applying deep learning techniques, this research aims to strengthen protection mechanisms against DDoS attacks, ultimately improving the resilience of network infrastructures.
GPU-accelerated machine learning technology is explored in the context of improving botnet attack detection capabilities (Motylinski et al. 2022). Detection methods often struggle with speed and efficiency, leading to the development of a new system that utilizes the parallel processing power of GPUs. The research investigates the performance of various machine learning algorithms, such as support vector machines (SVMs) and deep learning models, when implemented on GPU hardware. Effective feature engineering and thorough evaluations using real-world datasets are central to the methodology. By incorporating GPU acceleration, the proposed system enhances detection accuracy and speed, offering significant improvements in botnet activity identification.
Recognizing and addressing Distributed Denial of Service (DDoS) attacks continues to be a significant challenge, especially as conventional methods fall short against increasingly complex and evolving threats (Ismail et al. 2022). The study establishes a framework that creates models for detecting DDoS attacks by experimenting with various algorithms, including support vector machines (SVMs), XGBoost, Random Forests, and deep learning networks, utilizing network traffic data. These models are trained on suitable datasets and assessed using metrics like precision, recall, accuracy, and F1-score. Similarly, the goal of this research (Ahmad, Wan, and Ahmad 2023) is to harness machine learning to enhance DDoS protection systems, thereby increasing the operational resilience of networks against such attacks.
DDoS attack detection in the rapidly growing Internet of Things (IoT) networks presents a significant challenge due to the limitations of existing methods in resource-constrained environments (Hariprasad 2022). It proposes a hybrid detection solution that combines recurrent neural networks (RNNs) and extreme learning machines (ELMs). RNNs are utilized for their ability to model temporal dependencies, while ELMs contribute fast training and strong generalization capabilities. The model is further optimized through a data point selection mechanism, which identifies the most informative data points for training. The resulting hybrid system offers a more accurate and efficient approach to detecting DDoS attacks, thus improving the security of IoT networks.
Explainable AI (XAI) techniques are integrated into a novel framework proposed (Almadhor et al. 2024). Centralized methods have shown to be limited due to privacy concerns and resource constraints, leading the authors to integrate federated learning with explainable artificial intelligence (XAI). Federated learning enables collaborative model development across IoT devices while maintaining data privacy. Incorporating XAI improves the transparency of the model’s decision-making processes, fostering both trust and understanding. This combined approach effectively addresses DDoS attack detection in diverse IoT environments, preserving privacy while providing explainable results for enhanced decision-making.
Fine-tuning Multi-Layer Perceptron (MLP) neural networks was explored in this paper (Sanmorino, Marnisah, and Kesuma 2024) as a method to enhance their detection capabilities for Distributed Denial-of-Service (DDoS) attacks. The authors focus on adapting pre-trained MLP models for specific use in DDoS attack detection, addressing the shortcomings of traditional methods that often fail to identify such attacks accurately. Network traffic data is preprocessed before evaluating the MLP model’s performance using accuracy, precision, and recall metrics. This research aims to improve DDoS detection systems by leveraging pre-trained models tailored to strengthen security technologies against these cyber vulnerabilities. Critical aspect of training these models is the loss function, and for binary classification problems like DDoS detection, the binary cross-entropy loss is commonly used. It is defined as:
“ML-DDoSnet” is an intrusion detection system developed to combat Denial-of-Service (DDoS) attacks in Internet of Things (IoT) environments, (Revathi, Ramalingam, and Amutha 2021). IoT networks are vulnerable to various attacks, and the researchers apply machine learning technology to analyze network traffic and detect malicious patterns indicative of DDoS attacks. The system explores several machine learning algorithms, including support vector machines (SVMs) and decision trees, to classify network traffic as normal or malicious. The NSL-KDD dataset is used to test and validate the performance of ML-DDoSnet through training algorithms within the framework. The development and evaluation of ML-DDoSnet aim to enhance security mechanisms across IoT networks, improving their ability to defend against DDoS attacks.
Table 1.
Summary of related work.
Table 1.
Summary of related work.
| Reference |
Datasets |
Objective |
Methodology |
Limitations |
| (Jullian et al. 2023) |
NSL KDD |
To develop a robust and efficient deep learning-based distributed attack detection framework for IoT networks. |
Feedforward neural networks and recurrent neural networks (RNNs) |
Deploying and managing a distributed system across an extensive, diverse IoT network can be complex. |
| (Silivery et al. 2023) |
NSL KDD |
To develop a deep learning-based model for classifying multiple cyber-attacks, aiming to improve the accuracy and effectiveness of intrusion detection systems. |
Long-Short-Term Memory Recurrent Neural Network (LSTM-RNN) |
Complex Model High False Alarm Rate |
| (Aktar and Yasin Nur 2023) |
NSL KDD |
To investigate the use of a deep learning approach for detecting DDoS attacks, potentially improving the accuracy and effectiveness of DDoS detection methods. |
Deep Contractive Autoencoder (DCAE)
|
The availability and quality of the training data may limit the model’s performance, potentially leading to inaccurate or biased detection results. |
| (Motylinski et al. 2022) |
NSL KDD |
To enhance the speed of detection while upholding a commendable level of accuracy. |
SVM, logistic regression, KNN |
The use of GPU technology results in decreased training and prediction time. |
| (Ismail et al. 2022) |
NSL KDD |
To categorize and predict various types of DDoSattacks through the application of machine learning. |
Random forest, XGBoost |
Improved accuracy may be achieved using an enhanced suggested model. |
| (Hariprasad 2022) |
NSL KDD |
To develop a precise and effective DDoS attack detection system for IoT networks with a hybrid Sample Selected RNN-ELM model. |
Recurrent Neural Networks (RNNs) and Extreme Learning Machines (ELMs) |
The model’s efficiency may hinge on the quality and variety of the training data and the particular attributes of the IoT network environment. |
| (Almadhor et al. 2024) |
NSL KDD |
To provide a resilient and privacy-conscious DDoS attack detection solution for diverse IoT contexts by integrating federated learning with explainable a50rtificial intelligence approaches. |
Explainable Artificial Intelligence (XAI) with Federated Deep Neural Networks (FDNNs) |
The efficiency of this methodology may be affected by issues including communication latency, variability in device capabilities, and the intricacy of incorporating XAI algorithms into the federated learning framework. |
| (Ahmad, Wan, and Ahmad 2023) |
NSL KDD |
To develop an optimized ensemble framework using big data analytics to effectively detect DDoS attacks targeting (IoT) |
Convolutional Neural Network (CNN) embedded with a Gated Recurrent Unit (GRU) |
The model’s complexity results in high computational time |
| (Sanmorino, Marnisah, and Kesuma 2024) |
NSL KDD |
To develop a DDoS attack detection system using fine-tuned Multi-Layer Perceptrons. |
fine-tuned Multi-Layer Perceptron models |
They are computationally intensive and slow |
| (Revathi, Ramalingam, and Amutha 2021) |
NSL KDD |
To develop a system by integrating machine learning techniques with an SDN controller framework. |
Support Vector Machines, Decision Trees |
The system may need to be continuously updated and adapted to address new and evolving DDoS attack techniques effectively. |
12. Evaluation Metrics
A set of evaluation metrics measures both the performance and effectiveness of the proposed method throughout this study. These metrics, commonly used in machine learning research, enable a quantitative assessment of classification model success. The formulas for these performance assessment metrics are derived from standard methodologies. Our experimental results demonstrate an improvement over the baseline, highlighting the effectiveness of our proposed technique for detecting DDoS attacks on IoT devices.
In the equations presented below, various parameters are defined: TP, TN, FP, FN, L, and M. Specifically, TP stands for true positives (correctly predicted normal class), TN stands for true negatives (correctly predicted attack class), FP signifies false positives (incorrectly predicted normal class), and FN indicates false negatives (incorrectly predicted attack class). L and M are the actual and predicted class labels, respectively. These metrics are expressed by the following formulas:
Detection Accuracy: Accuracy measures the percentage of correctly identified instances, encompassing both normal and attack instances, within a dataset. This metric is defined as the ratio of correctly classified instances to the total number of instances, as shown in Eq. (1). High accuracy signifies that the model is performing effectively, closely aligning its predictions with actual observations. It is essential to consider accuracy in cases where the dataset may be imbalanced to prevent an overrepresentation of false positives (FP) or false negatives (FN).
Precision: Precision calculates the model’s ability to accurately identify positive instances, particularly in differentiating attacks from irrelevant data. As shown in Eq. (2), precision is the ratio of True Positive Rate (TPR) to the sum of True Positive Rate (TPR) and False Positive Rate (FPR). A higher precision indicates that the model is accurate in its positive predictions and reduces the occurrence of false positives. The presence of false positives significantly affects the diagnostic accuracy of DDoS attack detection, potentially resulting in unnecessary alerts or actions.
Recall: Recall evaluates the efficacy of the model in accurately identifying all genuine attacks, with a priority on minimizing the number of missed true positives. According to Equation (3), recall is defined as the ratio of the True Positive Rate (TPR) to the total of the True Positive Rate (TPR) and the False Negative Rate (FNR). Recall underscores the model’s proficiency in detecting attacks with precision, specifically emphasizing the reduction of false negatives. While precision quantifies the accurate positive predictions, recall signifies the fraction of positive instances that have been successfully identified. A diminished recall may suggest instances of undetected attacks, potentially undermining the overall effectiveness of the attack detection system.
F1-Score: The F1-Score is a metric that combines both precision and recall into a single value, emphasizing their symmetry in Eq. (4). It is the harmonic mean of precision and recall, providing a balanced measure of a model’s performance. The F1-Score takes into account both false positives and false negatives, offering a more comprehensive evaluation than relying solely on precision or recall. It is computed using the formula below:
Computational Time: Computational time refers to the total time required for a machine learning model to process data and produce results. It is an important metric for evaluating the efficiency and speed of the model, especially in real-time applications. The average computational time is calculated as the total processing time, representing the overall time taken to complete all necessary computations for a given task.
Several classifiers support the proposed technique. Specifically, this method employs three classifiers, Random Forest, Naive Bayes, and Logistic Regression, which have been meticulously selected for their efficacy in distinguishing normal traffic patterns from those indicative of Distributed Denial of Service (DDoS) attack data. The verification of each algorithm was conducted through a comprehensive performance analysis encompassing a range of metrics, including accuracy, true positive rate (TPR), false positive rate (FPR), true negative rate (TNR), false negative rate (FNR), as well as computational time.
The comparative performance analysis evaluates these algorithms across seven distinct evaluation metrics. The objective of this research is to ascertain which classifier yields the highest accuracy in the context of DDoS attack detection. Table 4 delineates the performance attributes, thereby providing readers with a succinct overview of the assessment results detailed in Table 3. This analytical framework serves to assist in identifying the most appropriate classifier for safeguarding Internet of Things (IoT) networks against DDoS attacks.
Table 2.
Performance measures of selected lightweight Machine learning models.
Table 2.
Performance measures of selected lightweight Machine learning models.
| Evaluation Metrics |
Random Forest (%) |
Logistic Regression (%) |
Naive Bayes (%) |
| Detection Accuracy |
99.88 |
91.61 |
87.62 |
| Precision |
99.93 |
92.53 |
83.57 |
| Recall |
99.81 |
91.61 |
89.30 |
| F1 Score |
99.87 |
90.89 |
87.40 |
The findings indicate that the Random Forest algorithm surpasses both Logistic Regression and Naive Bayes in all four key evaluation metrics: Accuracy, Precision, Recall, and F1-score, as showed in
Figure 4. Notably, it achieves the highest classification accuracy of 99.88%, establishing Random Forest as the most effective model for identifying normal attack combinations. The Random Forest attains maximum precision with a score of 99.93%, reflecting an optimal balance between true attack detection and minimal false alarm rates. According to the Recall measurement, Random Forest’s capability to detect actual attacks is recorded at 91.81%. Furthermore, Random Forest exhibits the leading F1-score of 99.87%, which underscores its proficiency in maintaining a favourable relationship between precision and recall rates. In contrast, Logistic Regression yields satisfactory results, with an Accuracy of 91.61% and an F1-score of 90.89%; however, its Precision of 92.53% and Recall of 89.3% fall short when compared to those of the Random Forest model. The F1-score of 87.4% for Naive Bayes highlights its insufficient performance metrics across all evaluation criteria in the detection of DDoS attacks relative to the other classifiers analyzed. Therefore, Random Forest emerges as the most viable model for detecting DDoS attacks within IoT networks.
Figure 8.
Performance measure chart of selected Machine learning models.
Figure 8.
Performance measure chart of selected Machine learning models.
The comparative data evaluation of Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), and the CNN-GRU_SMA architecture reveals considerable advancements in model performance in
Figure 5. The analysis indicates that the Random Forest model achieves the highest performance across all evaluation metrics, with an accuracy rate of 99.88%, followed by CNN-GRU_SMA, Logistic Regression, and Naive Bayes. This finding is further emphasized by the corresponding precision, recall, and F1 score values. Random Forest ranks highest in precision, F1 score, and recall, followed by CNN-GRU, Logistic Regression, and Naive Bayes, in that order.
The precision capabilities of Random Forest outperform the CNN-GRU_SMA architecture, with Random Forest achieving precision levels of 99.93%. CNN-GRU_SMA, on the other hand, exhibits enhanced performance but still falls slightly behind Random Forest. Logistic Regression and Naive Bayes show satisfactory results, but they produce significantly lower values in comparison to the other models, with Naive Bayes reaching a precision of 83.57% and Logistic Regression exhibiting relatively lower recall values. The F1-score attained by Random Forest, at 99.87%, surpasses CNN-GRU_SMA’s score of 98.45%, as well as the scores of the other models assessed, indicating a high standard of precision and recall in the context of DDoS attack detection. This study states that while CNN-GRU_SMA provides a promising methodology for the detection of DDoS attacks, Random Forest remains the highest-performing model across all evaluated metrics.
Figure 9.
Performance measure with base paper.
Figure 9.
Performance measure with base paper.
Computational time tests show that CNN-GRU_SMA requires the longest runtime, despite its impressive classification accuracy. As shown in
Figure 6, the Random Forest model achieves the best classification accuracy, but it operates with a computational time of 32.0 seconds, which is significantly faster than CNN-GRU_SMA’s 70.3 seconds. The longer processing time for CNN-GRU_SMA is due to its complex model structure and advanced architectural design. Although CNN-GRU_SMA provides excellent classification accuracy, Random Forest delivers comparable precision with a substantial speed advantage. The fastest model in the comparison is Naive Bayes, completing execution in just 8.0 seconds, demonstrating its adaptability for less complex applications or lower-data volume scenarios. Logistic Regression, with a runtime of 92.0 seconds, is the most time-consuming approach among the four models. The analysis of processing times highlights the need for a balance between computational performance and model accuracy, particularly in time-sensitive DDoS detection systems. While CNN-GRU_SMA requires longer processing times, it offers strong accuracy, but Random Forest stands out with both high accuracy and faster processing times. Naive Bayes excels in speed, making it suitable for simpler applications.
Figure 10.
Computational time comparison.
Figure 10.
Computational time comparison.
Each classifier was trained using the designated training set and subsequently evaluated on the corresponding testing set to ascertain their accuracy, precision, recall, and F1 score in detecting Distributed Denial of Service (DDoS) attacks. A confusion matrix serves as a valuable tool in assessing the effectiveness of a machine learning-based detection system for identifying DDoS attacks, particularly in Internet of Things (IoT) devices.
Figure 7,
Figure 8, and
Figure 9 present the confusion matrices for the selected lightweight machine learning models.
Figure 11.
Confusion matrix of Random Forest.
Figure 11.
Confusion matrix of Random Forest.
Figure 12.
Confusion matrix of Logistics Regression.
Figure 12.
Confusion matrix of Logistics Regression.
Figure 13.
Confusion matrix of Naïve bayes.
Figure 13.
Confusion matrix of Naïve bayes.
The implementation of Random Forest, complemented by enhanced feature selection methodologies such as Error-Tree Correction (ETC) and Principal Component Analysis (PCA), yielded superior outcomes compared to the initial models delineated in the foundational paper. Specifically, the Random Forest model leveraging ETC achieved remarkable performance metrics with an Accuracy of 99.88%, Precision of 99.93%, and Recall of 99.81%, culminating in an F1-score of 99.87%. Additionally, when PCA was employed for feature extraction, the Random Forest model sustained commendable performance, achieving an Accuracy of 99.87%, Precision of 99.79%, Recall of 99.94%, and an F1-score of 99.86%. The findings of this research clearly found that the Random Forest methodology, in conjunction with ETC-based feature extraction, offers enhanced accuracy in the detection of DDoS attacks when compared with the detection systems detailed in the foundational study.
Figure 14.
Performance measure of Random Forest.
Figure 14.
Performance measure of Random Forest.
The comparative analysis of Logistic Regression employing two feature selection methodologies, namely Embedded Tree Classifier (ETC) and Principal Component Analysis (PCA), elucidates the significant influence of these techniques on the model’s overall performance. The findings indicate that ETC consistently outperforms PCA across all evaluative metrics: Accuracy, Precision, Recall, and F1-score. Specifically, ETC facilitates Logistic Regression in achieving an impressive Accuracy of 91.61%, accompanied by 92.53% Precision, 89.3% Recall, and an F1-score of 90.89%. Conversely, the PCA technique results in a comparatively lower performance output when applied to the sample dataset. The resultant metrics comprising Accuracy at 90.09%, Precision at 90.41%, Recall at 88.61%, and an F1-score at 89.51% reflect this disparity.
In
Figure 11, the results substantiate that ETC surpasses the efficiency of PCA, optimizing both Precision and Accuracy metrics for Logistic Regression models, despite PCA exhibiting an acceptable level of effectiveness in model construction. The selected feature selection strategy, namely ETC, demonstrates superior attributes that empower Logistic Regression to achieve heightened detection capabilities concerning DDoS attacks on Internet of Things (IoT) devices. Comprehensive analysis indicates that both ETC and PCA contribute significantly to performance enhancements for Logistic Regression models, thereby underlining the importance of feature selection methodologies in model optimization.
Figure 15.
Performance measure of Logistics Regression.
Figure 15.
Performance measure of Logistics Regression.
The comparative analysis of the Naive Bayes classifier reveals several significant performance discrepancies, as demonstrated in
Figure 12. The Enhanced Tree Classifier (ETC) consistently yields superior results across all evaluated metrics: Accuracy, Precision, Recall, and F1-score. Specifically, when utilizing ETC, Naive Bayes attains an Accuracy of 87.62%, Precision of 83.57%, Recall of 91.61%, and an F1-score of 87.4%. Conversely, the application of Principal Component Analysis (PCA) for feature selection results in marginally lower performance metrics, with Accuracy recorded at 87.14%, Precision at 80.11%, Recall at 91.29%, and an F1-score of 85.34%.
The ETC methodology consistently outperforms PCA in terms of Precision and F1-score, while PCA exhibits a slight advantage in Recall. However, the reduction in Precision associated with PCA underscores the inherent trade-offs involved in employing various feature selection techniques. When juxtaposed with the findings documented in the foundational paper, Naive Bayes utilizing the ETC validates a pronounced enhancement in Precision and F1-score, thus establishing it as a more dependable model for the detection of Distributed Denial-of-Service (DDoS) attacks, although it is still less proficient than more sophisticated models such as Random Forest. These results indicate that the ETC constitutes a more effective feature selection strategy for Naive Bayes, significantly augmenting its capacity to identify attacks without detracting from its accuracy.
Figure 16.
Performance measure of Naïve bayes.
Figure 16.
Performance measure of Naïve bayes.
An analysis of the performance of the Random Forest model, both with and without the implementation of the Extra Trees Classifier (ETC) for feature selection purposes, shown in
Figure 13. The findings reveal a substantial improvement in the model’s performance upon the application of feature selection through ETC. Specifically, the accuracy exhibits a marked enhancement from 91.39% in the absence of feature selection to 99.88% with feature selection implemented. Precision demonstrates a significant increase from 87.03% to 99.93% following the inclusion of ETC. In a similar vein, recall escalates from 91.34% to 99.81%, and the F1 score also reflects an advancement from 87.65% to 99.87%. These results indicate that the utilization of ETC for feature selection markedly elevates the performance of the Random Forest model across all assessed metrics, thereby emphasizing the effectiveness of ETC in enhancing model accuracy, precision, recall, and F1 score.
Figure 17.
Comparison of RF performance with and without feature selection.
Figure 17.
Comparison of RF performance with and without feature selection.
The performance comparison of the Logistic Regression (LR) model, both with and without the Extra Trees Classifier (ETC) for feature selection, evaluated across four key metrics: Accuracy, Precision, Recall, and F1 Score. With feature selection, the model exhibits a significant rise in Accuracy, increasing from 74.53% without feature selection to 91.61% with it. Precision shows in
Figure 14 a notable increase as well, climbing from 62.65% to 92.53%. Recall improves from 74.43% to 91.61%, and the F1 Score experiences a considerable enhancement from 66.5% to 90.89%. These findings indicate that utilizing ETC for feature selection greatly enhances the performance of the Logistic Regression model, boosting all essential metrics.
Figure 18.
Comparison of LR performance with and without feature selection.
Figure 18.
Comparison of LR performance with and without feature selection.
Naive Bayes (NB) model’s performance with and without feature selection via the Extra Trees Classifier (ETC), evaluated across four metrics: Accuracy, Precision, Recall, and F1 Score, shown in
Figure 15. Implementing feature selection leads to a marked enhancement in Accuracy, which increases from 35.89% (without feature selection) to 87.62% (with feature selection). Precision also grows significantly, from 51.12% to 83.57% post feature selection. Recall sees a rise from 35.87% to 91.61%, and the F1 Score shows impressive growth from 26.96% to 87.4%. These findings indicate that utilizing ETC for feature selection significantly boosts the Naive Bayes model’s performance, notably enhancing all primary metrics, particularly Recall and F1 Score.
Figure 19.
Comparison of NB performance with and without feature selection.
Figure 19.
Comparison of NB performance with and without feature selection.
A visual comparison known as the ROC (Receiver Operating Characteristic) curve illustrates the detection accuracy of DDoS attacks by examining Gaussian Naive Bayes alongside Logistic Regression and Random Forest classifiers, as shown in
Figure 13. AUC values accompany each model to provide a numerical assessment of performance metrics. Random Forest proves to be superior to both Logistic Regression and Gaussian Naive Bayes in balancing the True Positive Rate, achieving an area under the curve (AUC) of 0.92 according to the ROC curve analysis. Random Forest shows the strongest ability to distinguish between attack and normal traffic, indicated by its position utmost from the random guessing diagonal line. The performance metrics for Logistic Regression and Naive Bayes yield lower results, although Logistic Regression maintains a slight advantage by attaining a better AUC than Naive Bayes.
Figure 20.
Comparative performance measure of machine learning models.
Figure 20.
Comparative performance measure of machine learning models.
The overall results illustrate that the application of the Extra Trees classifier for feature selection significantly enhances the detection of Distributed Denial of Service (DDoS) attacks targeting Internet of Things (IoT) devices. This approach not only optimizes feature representation but also leads to improved performance metrics of machine learning classifiers. Our study utilized the NSL KDD dataset, which is crucial for analyzing patterns associated with DDoS attacks.
Among the classifiers examined, the Random Forest model emerged as the most effective, outperforming Logistic Regression and Naive Bayes in terms of accuracy, precision, recall, and F1-score. Random Forest exhibited superior overall performance, characterized by reduced training and prediction durations, thereby affirming its suitability for real-time DDoS detection applications. Additionally, feature selection utilizing the Extra Trees methodology demonstrated enhanced efficiency compared to outdated techniques like the Pearson correlation coefficient, resulting in a decreased computational burden while maintaining high levels of accuracy.