Submitted:
03 September 2025
Posted:
04 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A hybrid architecture integrating EFMS-KMeans pre-clustering with a sequential CNN-GRU classifier for unsupervised anomaly traffic detection.
- A feature engineering pipeline including risk encoding, reputation-based transformations, temporal decomposition, and behavioral ratios.
- The application of SMOTE to mitigate class imbalance and improve generalization in large-scale traffic datasets.
- An empirical validation on real network traffic, demonstrating high accuracy and scalability in encrypted environments.
2. Related Work
3. Materials and Methods
3.1. Clustering and Anomaly Detection Based on EFMS-KMeans
3.2. Dimensionality Reduction and Feature Engineering
3.2.1. Dimensionality Reduction
- is the original feature matrix.
- contains the eigenvectors of the covariance matrix of X, associated with the largest eigenvalues.
- is the reduced projection with .
3.2.2. Feature Engineering
- bytes_total: Represents the total traffic volume in a session, summing sent and received bytes:
- pkt_ratio: The ratio between sent and received packets, useful for detecting unbalanced or suspicious sessions:
- distance_to_centroid: The Euclidean distance of each point to the centroid of its cluster (calculated after EFMS-KMeans), used as a predictive variable:
3.3. Class Balancing with SMOTE
- x is a sample of the minority class
- is one of its k nearest neighbors
- is a random coefficient.
3.4. Sequential Modeling with CNN and GRU
3.4.1. Convolutional Neural Network (CNN)
- x represents the input sequence,
- are the weights of filter k,
- is the bias,
- is the activation function (ReLU in this case),
- is the output of neuron i of filter k,
- and m is the kernel size.
3.4.2. Gated Recurrent Unit (GRU)
- is the input vector at time t,
- is the updated hidden state,
- is the update gate,
- is the reset gate,
- ⊙ denotes the element-wise product,
- and W, U are weight matrices learned during training.
3.5. Comparison with Unsupervised Models
3.6. Application in Edge Computing and Distributed Environments
- Low computational cost. The EFMS-KMeans algorithm, with its linear complexity nature, enables automatic anomaly detection without requiring supervised training or intensive computation, making it ideal for deployment on perimeter devices such as firewalls or routers. The CNN-GRU model uses lightweight structures (a single convolutional layer and GRU units instead of LSTM), reducing memory and processing requirements without compromising detection quality.
- Modular and federated designs. By separating the clustering, labeling, sequencing, and classification stages, the proposed design facilitates modular implementation and independent updating of each component. This modularity is critical for integration into federated learning architectures in distributed environments, preserving data privacy [30].
- Scalability in distributed environments. The architecture can be easily replicated across heterogeneous edge nodes, enabling the deployment of hierarchical solutions that adjust model execution according to available computational load, ensuring adaptability and efficiency under changing network traffic conditions.
3.7. Hybrid EFMS-KMeans + CNN-GRU Architecture
- Feature Engineering: New variables such as pkt_ratio and distance_to_centroid are generated, along with temporal variables derived from timestamps.
- EFMS-KMeans Clustering: Patterns are identified using density-based initialization with EFMS, which prevents convergence to local minima and improves segmentation compared to traditional K-Means [31].
- SMOTE Balancing: The minority class is expanded to avoid bias during supervised training.
- Sequencing: The labeled data are transformed into temporal sequences using sliding windows.
- Final Classification: A dense layer with sigmoid activation classifies the sequences as anomalous or normal.
3.7.1. CNN-GRU Model Architecture
- A one-dimensional convolutional layer (Conv1D) with 64 filters and a kernel size of 3, for extracting local spatial patterns.
- A MaxPooling1D layer for intermediate dimensionality reduction.
- A GRU layer with 64 units, responsible for modeling long-term dependencies.
- A Dropout layer (rate 0.3) to mitigate overfitting.
- A final dense layer with sigmoid activation, providing the binary prediction (normal vs. anomalous).
3.8. Comparison and Experimental Validation
3.9. Dataset Preparation and EFMS-KMeans Clustering
3.10. Class Balancing with SMOTE
3.11. Proposed Algorithms (Pseudocode)
| Algorithm 1: EFMS-KMeans Anomaly Detection |
![]() |
| Algorithm 2: CNN-GRU Classification with SMOTE and Sliding Window |
![]() |
4. Results and Discussion
4.1. Clustering Performance with EFMS-KMeans
- Silhouette Score: 0.9451559
- Calinski–Harabasz Index: 163,903.61
- Davies–Bouldin Index: 0.8181213997241903
4.2. Class Balancing and Feature Distribution
4.3. CNN-GRU Model Performance
- Accuracy: 0.98
- F1-Score (anomalous class): 0.98
- Precision (anomalous class): 1.00
- Recall (anomalous class): 0.95
4.4. Comparative Analysis with Hybrid Architectures
| Hybrid Model | Dataset / Context | Accuracy | F1-Score | AUC | Avg. Inference Time (s) |
|---|---|---|---|---|---|
| EFMS-KMeans + CNN-GRU (this study) | Real FortiGate logs (TLS/HTTPS) | 0.98 | 0.97 | 0.97 | 4.8 |
| CNN-GRU (IIoT Traffic) [26] | BoT-IoT (IIoT) | 0.949 | 0.944 | 0.96 | 5.1 |
| CNN-LSTM (Hybrid IDS) [24] | NF-BoT-IoT | 0.942 | 0.939 | 0.95 | 6.3 |
| CNN-BiLSTM (MindFlow) [45] | NF-BoT-IoT | 0.99 | 0.99 | — | — |
4.5. Critical Discussion
- Generalization capability: The model exhibits robust behavior against noisy traffic and multiclass scenarios, outperforming comparable hybrid architectures in both real-world and controlled environments [47].
- Scalability for edge computing: Its modularity and low inference cost facilitate implementation in distributed infrastructures, supporting its applicability in perimeter cybersecurity and edge-oriented architectures [48].
5. Conclusions
- Efficiency in detecting encrypted traffic: The architecture identifies anomalies without content inspection, ensuring privacy and reducing computational load.
- High generalization capability: The model exhibits robust performance in multiclass and noisy traffic scenarios, surpassing the results of previously reported hybrid approaches.
- Scalability and applicability in edge computing: Its modular design and low inference cost facilitate deployment in distributed infrastructures, aligning with emerging trends in perimeter cybersecurity.
- Integrating explainable AI (XAI) mechanisms to improve model transparency.
- Incorporating federated architectures to preserve privacy and enhance system resilience.
- Developing continuous learning schemes that allow dynamic updates to adapt to new traffic patterns.

References
- Fotiadou, K.; Velivassaki, T.H.; Voulkidis, A.; Skias, D.; Tsekeridou, S.; Zahariadis, T. Network traffic anomaly detection via deep learning. Information 2021, 12(5). [CrossRef]
- Sarker, I.H. Deep Cybersecurity: A Comprehensive Overview from Neural Network and Deep Learning Perspective. SN Computer Science 2021, 2(3). [CrossRef]
- Wang, Y.-C.; Houng, Y.-C.; Chen, H.-X.; Tseng, S.-M. Network Anomaly Intrusion Detection Based on Deep Learning Approach. Sensors 2023, 23(4). [CrossRef]
- Landauer, M.; Onder, S.; Skopik, F.; Wurzenberger, M. Deep Learning for Anomaly Detection in Log Data: A Survey. Mach. Learn. Appl. 2023, 12, 100470maly Detection in Log Data: A Survey.
- Darban, Z.Z.; Webb, G.I.; Pan, S.; Aggarwal, C.C.; Salehi, M. Deep Learning for Time Series Anomaly Detection: A Survey. ACM Comput. Surv. 2023, 55, 1–42. arXiv:2211.05244v3 [cs.LG], 28 May 2024. [CrossRef]
- Fang, Z.; Gu, M.; Zhou, S.; Chen, J.; Tan, Q.; Wang, H.; Bu, J. Towards a Unified Framework of Clustering-based Anomaly Detection. arXiv 2024, arXiv:2406.00452. arXiv:2406.00452.
- Miguel-Diez, A.; Campazas-Vega, A.; Álvarez-Aparicio, C.; Esteban-Costales, G.; Guerrero-Higueras, Á.M. A Systematic Literature Review of Unsupervised Learning Algorithms for Anomalous Traffic Detection Based on Flows. arXiv 2025, arXiv:2503.08293. arXiv:2503.08293.
- Huo, Y.; Cao, Y.; Wang, Z.; Yan, Y.; Ge, Z.; Yang, Y. Traffic Anomaly Detection Method Based on Improved GRU and EFMS-KMeans Clustering. Comput. Model. Eng. Sci. 2021, 126(3), 1053–1085. [CrossRef]
- Komadina, A.; Kovačević, I.; Štengl, B.; Groš, S. Comparative Analysis of Anomaly Detection Approaches in Firewall Logs: Integrating Light-Weight Synthesis of Security Logs and Artificially Generated Attack Detection. Sensors 2024, 24, 2636. [Google Scholar] [CrossRef] [PubMed]
- Bacevicius, M.; et al. Comparative Analysis of Perturbation Techniques in LIME for Intrusion Detection Enhancement. Mach. Learn. Knowl. Extr. 2025, 7(1), 21. [CrossRef]
- Liu, Y.; Wang, Z.; Pang, S.; Ju, L. Distributed Malicious Traffic Detection. Electronics 2024, 13(23). [CrossRef]
- Mohammadpour, L.; Ling, T.C.; Liew, C.S.; Aryanfar, A. A Survey of CNN-Based Network Intrusion Detection. Appl. Sci. 2022, 12, 8162. [CrossRef]
- Chen, T.; Chen, Y.; Lv, M.; He, G.; Zhu, T.; Wang, T.; Weng, Z. A payload based malicious http traffic detection method using transfer semi-supervised learning. Applied Sciences 2021, 11(16). [CrossRef]
- Larriva-Novo, X.; et al. Post-Hoc Categorization Based on Explainable AI and Reinforcement Learning for Improved Intrusion Detection. Appl. Sci. 2024, 14(24), 11511. [CrossRef]
- Lv, H.; Ding, Y. A Hybrid Intrusion Detection System with K-Means and CNN+LSTM. EAI Endorsed Trans. Scalable Inf. Syst. 2024, 11, e5667. [CrossRef]
- Valavan, W.T.; Joseph, N. Intrusion Detection System Using K-Means SMOTE Algorithm with Multi-Dense Layer Bidirectional Long Short-Term Memory. Int. J. Intell. Eng. Syst. 2024, 17, 59. [CrossRef]
- Cao, B.; Li, C.; Song, Y.; Qin, Y.; Chen, C. Network Intrusion Detection Model Based on CNN and GRU. Appl. Sci. 2022, 12, 4184. [CrossRef]
- Hu, Y.; Cao, Y.; Wang, Z.; Yan, Y.; Ge, Z.; Yang, Y. Traffic Anomaly Detection Method Based on Improved GRU and EFMS-Kmeans Clustering. Comput. Model. Eng. Sci. 2021, 126, 1001–1017. [CrossRef]
- Zhai, F.; Yang, T.; Chen, H.; He, B.; Li, S. Intrusion Detection Method Based on CNN–GRU–FL in a Smart Grid Environment. Electronics 2023, 12(5), 1164. [CrossRef]
- Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means Clustering Algorithms: A Comprehensive Review, Variants Analysis, and Advances in the Era of Big Data. Inf. Sci. 2023, 624, 440–469view, Variants Analysis, and Advances in the Era of Big Data. [CrossRef]
- Xia, H.; Zhou, Y.; Li, J.; Bai, L.; Li, J.; Zhou, F. Outlier Detection via Optimized Density Peaks Clustering and K-means-Derived Objective Function. Chaos Solitons Fractals 2025, 182, 116791. [CrossRef]
- Alshamrani, A.; Myneni, S.; Chowdhary, A.; Huang, D. A Survey on Advanced Persistent Threats: Techniques, Solutions, Challenges, and Research Opportunities. IEEE Commun. Surv. Tutor. 2019, 21, 1851–1877. [CrossRef]
- Khan, M.Z.; Reshi, A.A.; Shaf, S.; Aljubayri, I. An Adaptive Hybrid Framework for IIoT Intrusion Detection Using Neural Networks and Feature Optimization Using Genetic Algorithms. Discover Sustainability 2025, 6, Article 42. [CrossRef]
- Gadal, S.; Mokhtar, R.; Abdelhaq, M.; Alsaqour, R.; Ali, E.S.; Saeed, R. Machine Learning-Based Anomaly Detection Using K-Mean Array and Sequential Minimal Optimization. Electronics 2022, 11, 2158. [CrossRef]
- Aung, Y.Y.; Min, M.M. Hybrid Intrusion Detection System Using K-Means and K-Nearest Neighbors Algorithms. In Proceedings of the 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), Singapore, 6–8 June 2018; IEEE: Singapore, 2018; pp. 34–38. [CrossRef]
- Ji, C.; Liu, H.; Dai, W. Hybrid Model for Network Traffic Anomaly Detection Based on Parallel Two-stage Feature Fusion. IEEE Access 2025. [CrossRef]
- ALMahadin, G.; Aoudni, Y.; Shabaz, M.; Agrawal, A.V.; Yasmin, G.; Alomari, E.S.; Al-Khafaji, H.M.R.; Dansana, D.; Maaliw, R.R. VANET Network Traffic Anomaly Detection Using GRU-Based Deep Learning Model. IEEE Trans. Consum. Electron. 2024, 70, 4548–4555. [CrossRef]
- Yin, C.; Zhu, Y.; Fei, J.; He, X. A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks. IEEE Access 2017, 5, 21954–21961. [CrossRef]
- Wu, T.; et al. Intrusion Detection System Combined Enhanced Random Forest with SMOTE Algorithm. EURASIP J. Adv. Signal Process. 2022, 39. [CrossRef]
- Mohanarangan, A.; Yallamelli, A.R.G.; Devarajan, V.; Yalla, R.K.M.K.; Ganesan, T.; Mamidala, V.; Kumar, V.R. Hybrid CNN–GRU Network with Edge Computing for Efficient Malware Detection in IIoT. Int. J. Sci. Eng. Appl. 2025, 14(3), 70–76. [CrossRef]
- Andalib, A.; Babamir, S.M. Anomaly Detection of Policies in Distributed Firewalls Using Data Log Analysis. J. Supercomput. 2023, 79, 19473–19514. [CrossRef]
- Komadina, A.; Kovačević, I.; Štengl, B.; Groš, S. Comparative Analysis of Anomaly Detection Approaches in Firewall Logs: Integrating Light-Weight Synthesis of Security Logs and Artificially Generated Attack Detection. Sensors 2024, 24(8), 2636. [CrossRef]
- Xiang, H.; Zhang, X.; Xu, X.; Beheshti, A.; Qi, L.; Hong, Y.; Dou, W. Federated Learning-Based Anomaly Detection with Isolation Forest in the IoT-Edge Continuum. ACM Trans. Multimed. Comput. Commun. Appl. 2024. [CrossRef]
- Bakhshi, T.; Ghita, B. Anomaly Detection in Encrypted Internet Traffic Using Hybrid Deep Learning. Security and Communication Networks 2021. [CrossRef]
- Konatham, P.; Gunda, N.; Merugu, S. A Secure Hybrid Deep Learning Technique for Anomaly Detection in IIoT Edge Computing. TechRxiv 2024. [CrossRef]
- Zhao, Z.; Guo, H.; Wang, Y. A Multi-information Fusion Anomaly Detection Model Based on Convolutional Neural Network and AutoEncoder. Sci. Rep. 2024, 14, 16147. [CrossRef]
- Noor, A. Cloud-Based Deep Learning for Real-Time URL Anomaly Detection: LSTM/GRU and CNN/LSTM Models. Comput. Syst. Sci. Eng. 2025, 49, 259–286. [CrossRef]
- Rashid, A.; Siddique, M.J.; Ahmed, S.M. Machine and Deep Learning-Based Comparative Analysis Using Hybrid Approaches for Intrusion Detection System. In Proceedings of the ICACS, 2020. [CrossRef]
- Ma, M. Research and Application of Firewall Log and Intrusion Detection Log Data Visualization System. IET Softw. 7060. [Google Scholar] [CrossRef]
- Wang, C.; Zhou, H.; Hao, Z.; Hu, S.; Li, J.; Zhang, X.; Jiang, B.; Chen, X. Network Traffic Analysis over Clustering-Based Collective Anomaly Detection. Comput. Netw. 2022, 205, 108760. [CrossRef]
- Chen, L.; Gao, S.; Liu, B. An Improved Density Peaks Clustering Algorithm Based on Grid Screening and Mutual Neighborhood Degree for Network Anomaly Detection. Sci. Rep. 2022, 12, 1409. [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [CrossRef]
- Joloudari, J.H.; Marefat, A.; Nematollahi, M.A.; Oyelere, S.S.; Hussain, S. Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks. Appl. Sci. 2023, 13, 4006. [CrossRef]
- Imrana, Y.; Xiang, Y.; Ali, L.; Noor, A.; Sarpong, K.; Abdullah, M.A. CNN-GRU-FF: A Double-Layer Feature Fusion-Based Network Intrusion Detection System Using Convolutional Neural Network and Gated Recurrent Units. Complex Intell. Syst. 2024, 10, 3353–3370. [CrossRef]
- Xiang, Q.; Wu, S.; Wu, D.; Liu, Y.; Qin, Z. Research on CNN-BiLSTM Network Traffic Anomaly Detection Model Based on MindSpore. arXiv 2025, arXiv:2504.21008.
- Sattar, S.; Khan, S.; Khan, M.I.; Akhmediyarova, A.; Mamyrbayev, O.; Kassymova, D.; Oralbekova, D.; Alimkulova, J. Anomaly Detection in Encrypted Network Traffic Using Self-Supervised Learning. Sci. Rep. 2025, 15, 26585. [CrossRef]
- Ji, I.H.; Lee, J.H.; Kang, M.J.; Park, W.J.; Jeon, S.H.; Seo, J.T. Artificial Intelligence-Based Anomaly Detection Technology over Encrypted Traffic: A Systematic Literature Review. Sensors 2024, 24, 898. [CrossRef]
- Marfo, W.; Rico, E.A.; Tosh, D.K.; Moore, S.V. Network Anomaly Detection in Distributed Edge Computing Infrastructure. arXiv 2025, arXiv:2503.05700. arXiv:2503.05700.
- Patel, J.; Reiner, J.; Stilwell, B.; Wahbeh, A.; Seetan, R. Leveraging K-Means Clustering and Z-Score for Anomaly Detection in Bitcoin Transactions. Informatics 2025, 12(2), 43. [CrossRef]
- Vanem, E.; Brandsæter, A. Unsupervised Anomaly Detection Based on Clustering Methods and Sensor Data on a Marine Diesel Engine. J. Mar. Eng. Technol. 2021, 20(4), 217–234. [CrossRef]









| Model | Accuracy | F1-Score | AUC | Execution Time (s) |
|---|---|---|---|---|
| EFMS-K-Means | 0.961 | 0.97 | 0.972 | 4.8 |
| Isolation Forest | 0.921 | 0.912 | 0.939 | 3.2 |
| Autoencoder | 0.934 | 0.926 | 0.948 | 6.1 |
| Random Forest | 0.949 | 0.939 | 0.956 | 3.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

