Submitted:
23 July 2025
Posted:
25 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- GaussianCopula
- CTGAN
- Tabular Variational Autoencoder (TVAE)
- CopulaGAN
2. Related Works
3. Methodology
3.1. Dataset and Preprocessing Steps
3.1.1. Dataset Description
3.1.2. Preprocessing Procedure
- Feature Selection and Cleaning: Non-informative columns, such as identifiers (Flow ID), source/destination IP addresses, port numbers, and timestamps have been removed. These attributes have been considered irrelevant for pattern-based analysis and could introduce unnecessary noise or risk of memorization.
- Handling Missing and Infinite Values: Instances containing non-available (NaN) or infinite values have been addressed by either imputation or removal, depending on their frequency and overall impact. Columns with excessive missing values have been excluded.
- Normalization of Numerical Features: Continuous numerical features have been normalized to a common scale, using the Min-Max scaling method. This step ensured a balanced contribution of all features during the generative model training.
- Final Dataset Format: The extracted dataset has been structured as a flat table with mixed numerical features and void of identifiers or timestamp dependencies. This standardized format ensures direct applicability to all selected synthesizers in a fair comparative setting.
3.2. Generative Models Description
3.2.1. Gaussian Copula Synthesizer
3.2.2. CTGAN Synthesizer
3.2.3. TVAE Synthesizer
3.2.4. CopulaGAN Synthesizer
3.2.5. Summarization of Model’s Configurations
3.3. Experimental Setup
3.3.1. Training Steps and Practices
3.3.2. Hyperparameter Variation Experiments
3.4. Data Generation Procedure
- Sample Size Matching: The number of synthetic records generated for each configuration is set equal to the size of the real dataset used during training. This matching ensures a fair basis for statistical comparison and metric computation.
- Feature Space Consistency: The synthetic dataset maintains the same dimensionality and feature types (numerical and categorical) as the original, preprocessed one. This consistency ensures that all evaluation metrics, especially those based on statistical tests and deep learning (DL) models, can be applied directly without additional preprocessing.
- Sampling Repetition: To account for randomness inherent in deep generative models, such as weight initialization and sampling variability, each configuration parameterization has been executed in three independent sampling runs. This approach mitigates the impact of outlier detection and supports the assessment of model robustness.
- Post-Generation Processes: The synthetic datasets are inspected for data integrity, ensuring that generated samples do not contain invalid values (e.g., non-available/NaN or infinite values). If such cases have been detected, they are addressed following the same handling or removal strategy applied during the preprocessing stage of the real data.
- Data Preservation for Evaluation: All synthetic datasets have been kept in their raw generated form, without additional transformations applied. These datasets serve as direct input for the evaluation metrics detailed in Section 3.5.
3.5. Evaluation Metrics
3.6. Summary of Experiment Execution Workflow
4. Discussion
4.1. Experimental Dataset Preparations
4.2. Gaussian Copula Evaluation and Results – Experiment 1 (E1)
4.3. CTGAN Evaluation and Results
4.3.1. CTGAN - Experiment 2 (E2)
4.3.2. CTGAN - Experiment 3 (E3)
4.4. TVAE Evaluation and Results
4.4.1. TVAE - Experiment 4 (E4)
4.4.2. TVAE - Experiment 5 (E5)
4.5. CopulaGAN Evaluation and Results
4.5.1. CopulaGAN - Experiment 6 (E6)
4.5.2. CopulaGAN - Experiment 7 (E7)
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| ADASYN | Adaptive Synthetic Sampling Approach |
| AI | Artificial intelligence |
| CIDF | Common Intrusion Detection Framework |
| CNN | Convolutional Neural Network |
| CTGAN | Conditional Tabular Generative Adversarial Network |
| DDoS | Distributed Denial of Service |
| DNN | Deep Neural Network |
| DoS | Denial of Service |
| DQA | Data Quality Assessment |
| DP | Data Pre-processing |
| DT | Decision Tree |
| ELBO | Evidence Lower Bound |
| EM | Expectation Maximization |
| FTP | File Transfer Protocol |
| GAN | Generative Adversarial Network |
| GOA | Gazelle Optimization Algorithm |
| HBA | Honey Badger Algorithm |
| ICMP | Internet Control Message Protocol |
| IDS | Intrusion Detection System |
| IGAN | Imbalanced Generative Adversarial Network |
| IoT | Internet of Things |
| IP | Internet Protocol |
| LR | Logistic Regression |
| LSTM | Long Short-Term Memory |
| MitM | Man-in-the-Middle |
| ML | Machine Learning |
| MLP | Multi-Layer Perceptron |
| MSCNN | Multi-Scale Convolutional Neural Network |
| NaN | Non-Available |
| NIDS | Network Intrusion Detection System |
| PCAP | Packet Capture |
| RF | Random Forest |
| RNN | Recurrent Neural Network |
| ROS | Random Over-Sampling |
| SAPVAGAN | Self-Attention-based Provisional Variational Auto-encoder Generative Adversarial Network |
| SMOTE | Synthetic Minority Oversampling Technique |
| SSH | Secure Shell |
| SVM | Support Vector Machine |
| TCP | Transfer Control Protocol |
| TMG-GAN | Tabular Multi-Generator Generative Adversarial Network |
| TVAE | Tabular Variational Autoencoder |
| VAE | Variational Autoencoder |
| VAWGAN | Variational Autoencoder Wasserstein Generative Adversarial Network |
| WOGRU | Whale Optimized Gate Recurrent Unit |
| WSN | Wireless Sensor Network |
| LD | Linear dichroism |
Appendix A
Experiment 1 (E1)


Experiment 2 (E2)


Experiment 3 (E3)


Experiment 4 (E4)


Experiment 5 (E5)


Experiment 6 (E6)


Experiment 7 (E7)


References
- Park, C.; Lee, J.; Kim, Y.; Park, J.-G.; Kim, H.; Hong, D. An Enhanced AI-Based Network Intrusion Detection System Using Generative Adversarial Networks. IEEE Internet of Things Journal 2023, 10, 2330–2345. [Google Scholar] [CrossRef]
- Hussain, B.; Du, Q.; Sun, B.; Han, Z. Deep Learning-Based DDoS-Attack Detection for Cyber–Physical System over 5G Network. IEEE Transactions on Industrial Informatics 2021, 17, 860–870. [Google Scholar] [CrossRef]
- Kampourakis, V.; Gkioulos, V.; Katsikas, S. A Systematic Literature Review on Wireless Security Testbeds in the Cyber-Physical Realm. Computers & Security 2023, 133, 103383. [Google Scholar] [CrossRef]
- Piqueira, J.R.C.; Cabrera, M.A.M.; Batistela, C.M. Malware Propagation in Clustered Computer Networks. Physica A: Statistical Mechanics and its Applications 2021, 573, 125958. [Google Scholar] [CrossRef]
- Gelgi, M.; Guan, Y.; Arunachala, S.; Samba Siva Rao, M.; Dragoni, N. Systematic Literature Review of IoT Botnet DDOS Attacks and Evaluation of Detection Techniques. Sensors 2024, 24. [Google Scholar] [CrossRef]
- Zhao, X.; Veerappan, C.S.; Loh, Peter.K.K.; Tang, Z.; Tan, F. Multi-Agent Cross-Platform Detection of Meltdown and Spectre Attacks. In Proceedings of the 2018 15th international conference on control, automation, robotics and vision (ICARCV); 2018; pp. 1834–1838.
- Fereidouni, H.; Fadeitcheva, O.; Zalai, M. IoT and Man-in-the-Middle Attacks. SECURITY AND PRIVACY 2025, 8, e70016. [Google Scholar] [CrossRef]
- statista Cybersecurity - Worldwide. Available online: https://www.statista.com/outlook/tmo/cybersecurity/worldwide#cost (accessed on 17 July 2025).
- ΙΒΜ Cost of a Data Breach Report 2024; Cost of a Data Breach Report; IBM Corporation: New Orchard Road Armonk, NY 10504, 2024.
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks 2014.
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes 2022.
- Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Foundations and Trends® in Machine Learning 2019, 12, 307–392. [Google Scholar] [CrossRef]
- Sharafaldin, I.; Habibi Lashkari, A.; Ghorbani, A.A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the Proceedings of the 4th international conference on information systems security and privacy - ICISSP; SciTePress / INSTICC, 2018; pp. 108–116.
- Zhao, X.; Fok, K.W.; Thing, V.L.L. Enhancing Network Intrusion Detection Performance Using Generative Adversarial Networks. Computers & Security 2024, 145, 104005. [Google Scholar] [CrossRef]
- Rao, Y.N.; Suresh Babu, K. An Imbalanced Generative Adversarial Network-Based Approach for Network Intrusion Detection in an Imbalanced Dataset. Sensors 2023, 23. [Google Scholar] [CrossRef]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A Detailed Analysis of the KDD CUP 99 Data Set. In Proceedings of the 2009 IEEE symposium on computational intelligence for security and defense applications; 2009; pp. 1–6.
- Moustafa, N.; Slay, J. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). In Proceedings of the 2015 military communications and information systems conference (MilCIS); 2015; pp. 1–6.
- Ding, H.; Sun, Y.; Huang, N.; Shen, Z.; Cui, X. TMG-GAN: Generative Adversarial Networks-Based Imbalanced Learning for Network Intrusion Detection. IEEE Transactions on Information Forensics and Security 2024, 19, 1156–1167. [Google Scholar] [CrossRef]
- Yang, L.; Shami, A. Towards Autonomous Cybersecurity: An Intelligent AutoML Framework for Autonomous Intrusion Detection. In Proceedings of the Proceedings of the workshop on autonomous cybersecurity; ACM, November 2023; pp. 68–78.
- Samarakoon, S.; Siriwardhana, Y.; Porambage, P.; Liyanage, M.; Chang, S.-Y.; Kim, J.; Kim, J.; Ylianttila, M. 5G-NIDD: A Comprehensive Network Intrusion Detection Dataset Generated over 5G Wireless Network 2022.
- Li, Z.; Huang, C.; Qiu, W. An Intrusion Detection Method Combining Variational Auto-Encoder and Generative Adversarial Networks. Computer Networks 2024, 253, 110724. [Google Scholar] [CrossRef]
- Meenakshi, B.; Karunkuzhali, D. Enhancing Cyber Security in WSN Using Optimized Self-Attention-Based Provisional Variational Auto-Encoder Generative Adversarial Network. Computer Standards & Interfaces 2024, 88, 103802. [Google Scholar] [CrossRef]
- Jiang, S.; Zhao, J.; Xu, X. SLGBM: An Intrusion Detection Mechanism for Wireless Sensor Networks in Smart Environments. IEEE Access 2020, 8, 169548–169558. [Google Scholar] [CrossRef]
- Ravi, V.; Chaganti, R.; Alazab, M. Recurrent Deep Learning-Based Feature Fusion Ensemble Meta-Classifier Approach for Intelligent Network Intrusion Detection System. Computers and Electrical Engineering 2022, 102, 108156. [Google Scholar] [CrossRef]
- Ramana, K.; Revathi, A.; Gayathri, A.; Jhaveri, R.H.; Narayana, C.V.L.; Kumar, B.N. WOGRU-IDS — An Intelligent Intrusion Detection System for IoT Assisted Wireless Sensor Networks. Computer Communications 2022, 196, 195–206. [Google Scholar] [CrossRef]
- Zixu, T.; Liyanage, K.S.K.; Gurusamy, M. Generative Adversarial Network and Auto Encoder Based Anomaly Detection in Distributed IoT Networks. In Proceedings of the GLOBECOM 2020 - 2020 IEEE global communications conference; 2020; pp. 1–7.
- Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset. Future Generation Computer Systems 2019, 100, 779–796. [Google Scholar] [CrossRef]
- Senthilkumar, G.; Tamilarasi, K.; Periasamy, J.K. Cloud Intrusion Detection Framework Using Variational Auto Encoder Wasserstein Generative Adversarial Network Optimized with Archerfish Hunting Optimization Algorithm. Wireless Networks 2024, 30, 1383–1400. [Google Scholar] [CrossRef]
- Krishnaveni, S.; Sivamohan, S.; Sridhar, S.S.; Prabakaran, S. Efficient Feature Selection and Classification through Ensemble Method for Network Intrusion Detection on Cloud Computing. Cluster Computing 2021, 24, 1761–1779. [Google Scholar] [CrossRef]
- Karuppusamy, L.; Ravi, J.; Dabbu, M.; Lakshmanan, S. Chronological Salp Swarm Algorithm Based Deep Belief Network for Intrusion Detection in Cloud Using Fuzzy Entropy. International Journal of Numerical Modelling: Electronic Networks, Devices and Fields 2022, 35, e2948. [Google Scholar] [CrossRef]
- Lou, P.; Lu, G.; Jiang, X.; Xiao, Z.; Hu, J.; Yan, J. Cyber Intrusion Detection through Association Rule Mining on Multi-Source Logs. Applied Intelligence 2021, 51, 4043–4057. [Google Scholar] [CrossRef]
- Chalé, M.; Bastian, N.D. Generating Realistic Cyber Data for Training and Evaluating Machine Learning Classifiers for Network Intrusion Detection Systems. Expert Systems with Applications 2022, 207, 117936. [Google Scholar] [CrossRef]
- Ammara, D.A.; Ding, J.; Tutschku, K. Synthetic Network Traffic Data Generation: A Comparative Study 2025.
- Saka, S.; Al-Ataby, A.; Selis, V. Generating Synthetic Tabular Data for DDoS Detection Using Generative Models. In Proceedings of the 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom); 2023; pp. 1436–1442.
- Sharafaldin, I.; Lashkari, A.H.; Hakak, S.; Ghorbani, A.A. Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy. In Proceedings of the 2019 International Carnahan Conference on Security Technology (ICCST); 2019; pp. 1–8.
- Kotal, A.; Luton, B.; Joshi, A. KiNETGAN: Enabling Distributed Network Intrusion Detection through Knowledge-Infused Synthetic Data Generation. In Proceedings of the 2024 IEEE 44th International Conference on Distributed Computing Systems Workshops (ICDCSW); 2024; pp. 140–145.
- Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling Tabular Data Using Conditional GAN 2019.
- Kim, J.; Jeon, J.; Lee, J.; Hyeong, J.; Park, N. OCT-GAN: Neural ODE-Based Conditional Tabular Gans 2021.
- Yoon, J.; Jordon, J.; van der Schaar, M. PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. In Proceedings of the International conference on learning representations; 2019.
- Park, N.; Mohammadi, M.; Gorde, K.; Jajodia, S.; Park, H.; Kim, Y. Data Synthesis Based on Generative Adversarial Networks. Proceedings of the VLDB Endowment 2018, 11, 1071–1083. [Google Scholar] [CrossRef]
- Kamthe, S.; Assefa, S.; Deisenroth, M. Copula Flows for Synthetic Data Generation 2021.
- Parise, O.; Kronenberger, R.; Parise, G.; de Asmundis, C.; Gelsomino, S.; La Meir, M. CTGAN-Driven Synthetic Data Generation: A Multidisciplinary, Expert-Guided Approach (TIMA). Computer Methods and Programs in Biomedicine 2025, 259, 108523. [Google Scholar] [CrossRef]
- Kiran, A.; Rubini, P.; Kumar, S.S. Challenges and Limitations of TVAE Tabular Synthetic Data Generator. In Proceedings of the Advanced computing. Garg, D., Pendyala, V., Gupta, S.K., Najafzadeh, M., Eds.; Springer Nature Switzerland: Cham, 2025; pp. 243–254. [Google Scholar]
- Miletic, M.; Sariyar, M. Challenges of Using Synthetic Data Generation Methods for Tabular Microdata. Applied Sciences 2024, 14. [Google Scholar] [CrossRef]
- Patki, N.; Wedge, R.; Veeramachaneni, K. The Synthetic Data Vault. In Proceedings of the 2016 IEEE international conference on data science and advanced analytics (DSAA); 2016; pp. 399–410.














| Feature Type | Type | Description |
|---|---|---|
| src port | Numerical | Source port number |
| dst port | Numerical | Destination port number |
| protocol | Categorical | Protocol identifier |
| flow duration | Numerical | Duration of the flow session |
| total fwd packet | Numerical | Total number of packets in forward direction |
| total bwd packets | Numerical | Total number of packets in backward direction |
| total length of fwd packet | Numerical | Total length of forward packets |
| total length of bwd packet | Numerical | Total length of backward packets |
| flow bytes/s | Numerical | Data transfer rate in bytes per second |
| flow packets/s | Numerical | Packet transfer rate per second |
| fwd header length | Numerical | Header length in forward direction |
| bwd header length | Numerical | Header length in backward direction |
| fwd packets/s | Numerical | Forward packet transfer rate |
| bwd packets/s | Numerical | Backward packet transfer rate |
| down/up ratio | Numerical | Download to upload ratio |
| icmp code | Numerical | ICMP code |
| icmp type | Numerical | ICMP type |
| total tcp flow time | Numerical | Total duration of TCP flow |
| label | Categorical | Attack category label |
| Synthesizer | Model Type | Tunable Parameters |
|---|---|---|
| GaussianCopula Synthesizer | Statistical | - |
| CTGAN Synthesizer | Deep Gan | Batch Size, Epochs, Embedding Dimension, Discriminator Dimension |
| TVAE Synthesizer | Deep VAE | Batch Size, Epochs, Embedding Dimension, Decompress Dimension |
| CopulaGAN Synthesizer | Hybrid | Batch Size, Epochs, Embedding Dimension, Discriminator Dimension |
| Parameter | Tested Values | Description |
|---|---|---|
| Epochs | 100, 300 | Number of full passes over the training dataset - higher values allow longer learning but may increase overfitting risk |
| Batch Size | 500, 1000 | Number of samples processed per training iteration - affects convergence speed and stability |
| Network Dimensions | (128,128), (256,256) | Number of neurons in each hidden layer of the generator, discriminator, encoder, or decoder; higher dimensions enable learning of more complex patterns |
| Decompress Dimension | (128,128), (256,256) | Specifies the size of the latent vector output by the decoder when reconstructing synthetic samples from the latent space. |
| Embedding Dimension | 128, 256 | Defines the size of the latent vector used by the generator or encoder to represent compressed data features during training. |
| Model Type - Statistical | Gaussian Copula | Baseline statistical model - no variation applied |
| Model Type – DL | CTGAN, TVAE, CopulaGAN |
Deep generative models - all tested under varied configurations |
| Category | Metric | Description |
|---|---|---|
| Sample Level Metrics | Random Forest Classifier | Binary classifier trained to distinguish real from synthetic samples - realistic synthetic data should make classification accuracy approach random guessing (≈ 50%) |
| Dataset Level Metrics | Kolmogorov-Smirnov Two-Sample (Ks_2amp) Test | Statistical test that compares the cumulative distributions of real and synthetic datasets to assess their overall similarity at the dataset level |
| Detection Metrics | Logistic Detection | Uses a logistic regression model to distinguish real from synthetic data; a lower detection score (closer to random guessing) indicates higher synthetic data quality |
| SVC Detection | Applies a Support Vector Classifier to detect synthetic data; high confusion between real and synthetic samples suggests better data realism. | |
| Utility (TSTR Framework) Metrics | Random Forest Regressor | Evaluates whether patterns learned from synthetic data can generalize to real data by training a regression model on synthetic samples and testing it on real data (TSTR) - performance close to real-to-real training suggests high utility |
| Statistical Metrics | Kolmogorov-Smirnov (KS) Test | Evaluates whether the univariate distributions of continuous features in the synthetic data match those of the real data - detects distributional shifts or mode collapse |
| SDV Diagnostic Metrics | Built-In SDV Diagnostic Test | Performs automated checks on synthetic data validity, such as uniqueness of primary keys, adherence to real data ranges and categorical value consistency |
| Buit-In SDV Data Quality/Statistical Similarity Test | Provides an overall statistical similarity score between real and synthetic datasets, expressed as a percentage (0–100%), reflecting how closely the synthetic data matches the real data's statistical properties | |
| Column Shapes Sub-scores | Reports feature-wise similarity scores, indicating which columns have distributions that closely match the real data and highlight those with the highest and lowest fidelity | |
| Visual Metrics | Pairplots | Visualize bivariate distributions of features to compare relationships and spot discrepancies between real and synthetic data |
| Umap | Dimensionality reduction technique used to visually assess structural similarity between real and synthetic datasets; overlapping clusters suggest high fidelity |
| Experiment ID | Synthesizer | Epoch | Batch Size | Embedding Dimension | Network Dimension |
|---|---|---|---|---|---|
| E1 | GC | - | - | - | |
| Ε2 | CTGAN | 100 | 500 | 128 | (128,128) |
| Ε3 | CTGAN | 300 | 1000 | 256 | (256,256) |
| Ε4 | TVAE | 100 | 500 | 128 | (128,128) |
| Ε5 | TVAE | 300 | 1000 | 256 | (256,256) |
| Ε6 | CGAN | 100 | 500 | 128 | (128,128) |
| E7 | CGAN | 300 | 1000 | 256 | (256,256) |
| Attack Type | Total Records |
|---|---|
| Infiltration | 10,000 |
| Dos | 10,000 |
| Portscan | 10,000 |
| Brute-Force | 6,972 |
| Bot | 4,803 |
| Web Attack | 1,401 |
| Category | Metric | Description |
|---|---|---|
| Sample Level Metrics | Random Forest Classifier | Classification Accuracy: 0.999 |
| Dataset Level Metrics | Kolmogorov-Smirnov Two-Sample (Ks_2amp) Test | src port: 0.0000 (Significantly different), dst port: 0.0000 (Significantly different), protocol: 0.0000 (Not significantly different), flow duration: 0.0000 (Significantly different), total fwd packet: 0.0000 (Significantly different), total bwd packets: 0.0000 (Significantly different), total length of fwd packet: 0.0000 (Significantly different), total length of bwd packet: 0.0000 (Significantly different), flow bytes/s: 0.0000 (Significantly different), flow packets/s: 0.0000 (Significantly different), fwd header length: 0.0000 (Significantly different), bwd header length: 0.0000 (Significantly different), fwd packets/s: 0.0000 (Significantly different), bwd packets/s: 0.0000 (Significantly different), down/up ratio: 0.0000 (Significantly different), icmp code: 1.0000 (Not significantly different), icmp type: 1.0000 (Not significantly different), total tcp flow time: 0.0000 (Significantly different) |
| Detection Metrics | Logistic Detection | Score: 0.788 |
| SVC Detection | Score: 0.877 | |
| Utility (TSTR Framework) Metrics | Random Forest Regressor | TSTR Score: (0.012, -0.038) |
| Statistical Metrics | Kolmogorov-Smirnov (KS) Test | Score: 0.769 |
| SDV Diagnostic Metrics | Built-In SDV Diagnostic Test | Data Validity Score: 1.0 Data Structure Score: 1.0 Overall Score (Average): 1.0 |
| Buit-In SDV Data Quality/Statistical Similarity Test | Column Shapes Score: 0.769 Column Pair Trends Score: 0.907 Overall Score (Average): 0.839 |
|
| Column Shapes Sub-scores | src port: 0.895868, dst port: 0.771725, protocol: 0.971859, flow duration: 0.761581, total fwd packet: 0.823652, total bwd packets: 0.820965, total length of fwd packet: 0.457083, total length of bwd packet: 0.381809, flow bytes/s: 0.451408, flow packets/s: 0.859251, fwd header length: 0.726746, bwd header length: 0.758477, fwd packets/s: 0.866129, bwd packets/s: 0.800537, down/up ratio: 0.753914, icmp code: 0.999768, icmp type: 0.999768, total tcp flow time: 0.757110 |
| Category | Metric | Description |
|---|---|---|
| Sample Level Metrics | Random Forest Classifier | Classification Accuracy: 0.999 |
| Dataset Level Metrics | Kolmogorov-Smirnov Two-Sample (Ks_2amp) Test | src port: 0.0000 (Significantly different), dst port: 0.0000 (Significantly different), protocol: 1.0000 (Not significantly different), flow duration: 0.0000 (Significantly different), total fwd packet: 0.0000 (Significantly different), total bwd packets: 0.0000 (Significantly different), total length of fwd packet: 0.0000 (Significantly different), total length of bwd packet: 0.0000 (Significantly different), flow bytes/s: 0.0000 (Significantly different), flow packets/s: 0.0000 (Significantly different), fwd header length: 0.0000 (Significantly different), bwd header length: 0.0000 (Significantly different), fwd packets/s: 0.0000 (Significantly different), bwd packets/s: 0.0000 (Significantly different), down/up ratio: 0.0000 (Significantly different), icmp code: 1.0000 (Not significantly different), icmp type: 1.0000 (Not significantly different), total tcp flow time: 0.0000 (Significantly different) |
| Detection Metrics | Logistic Detection | Score: 0.788 |
| SVC Detection | Score: 0.877 | |
| Utility (TSTR Framework) Metrics | Random Forest Regressor | TSTR Score: (-0.035, -0.002) |
| Statistical Metrics | Kolmogorov-Smirnov (KS) Test | Score: 0.811 |
| SDV Diagnostic Metrics | Built-In SDV Diagnostic Test | Data Validity Score: 1.0 Data Structure Score: 1.0 Overall Score (Average): 1.0 |
| Buit-In SDV Data Quality/Statistical Similarity Test | Column Shapes Score: 0.809 Column Pair Trends Score: 0.932 Overall Score (Average): 0.871 |
|
| Column Shapes Sub-scores | src port: 0.924935, dst port: 0.745391, protocol: 0.999189, flow duration: 0.686238, total fwd packet: 0.940615, total bwd packets: 0.946359, total length of fwd packet: 0.885538, total length of bwd packet: 0.770196, flow bytes/s: 0.496317, flow packets/s: 0.698606, fwd header length: 0.850982, bwd header length: 0.797943, fwd packets/s: 0.754910, bwd packets/s: 0.697517, down/up ratio: 0.727603, icmp code: 0.999768, icmp type: 0.999768, total tcp flow time: 0.656105 |
| Category | Metric | Description |
|---|---|---|
| Sample Level Metrics | Random Forest Classifier | Classification Accuracy: 1.0 |
| Dataset Level Metrics | Kolmogorov-Smirnov Two-Sample (Ks_2amp) Test | src port: 0.0000 (Significantly different), dst port: 0.0000 (Significantly different), protocol: 1.0000 (Not significantly different), flow duration: 0.0000 (Significantly different), total fwd packet: 0.0000 (Significantly different), total bwd packets: 0.0000 (Significantly different), total length of fwd packet: 0.0000 (Significantly different), total length of bwd packet: 0.0000 (Significantly different), flow bytes/s: 0.0000 (Significantly different), flow packets/s: 0.0000 (Significantly different), fwd header length: 0.0000 (Significantly different), bwd header length: 0.0000 (Significantly different), fwd packets/s: 0.0000 (Significantly different), bwd packets/s: 0.0000 (Significantly different), down/up ratio: 0.0000 (Significantly different), icmp code: 1.0000 (Not significantly different), icmp type: 1.0000 (Not significantly different), total tcp flow time: 0.0000 (Significantly different) |
| Detection Metrics | Logistic Detection | Score: 0.806 |
| SVC Detection | Score: 0.753 | |
| Utility (TSTR Framework) Metrics | Random Forest Regressor | TSTR Score: (0.072, -0.003) |
| Statistical Metrics | Kolmogorov-Smirnov (KS) Test | Score: 0.797 |
| SDV Diagnostic Metrics | Built-In SDV Diagnostic Test | Data Validity Score: 1.0 Data Structure Score: 1.0 Overall Score (Average): 1.0 |
| Buit-In SDV Data Quality/Statistical Similarity Test | Column Shapes Score: 0.797 Column Pair Trends Score: 0.929 Overall Score (Average): 0.863 |
|
| Column Shapes Sub-scores | src port: 0.910066, dst port: 0.744928, protocol: 0.999189, flow duration: 0.618399, total fwd packet: 0.938438, total bwd packets: 0.950204, total length of fwd packet: 0.854734, total length of bwd packet: 0.737308, flow bytes/s: 0.462757, flow packets/s: 0.648323, fwd header length: 0.899389, bwd header length: 0.829188, fwd packets/s: 0.647976, bwd packets/s: 0.777562, down/up ratio: 0.683111, icmp code: 0.999768, icmp type: 0.999768, total tcp flow time: 0.649018 |
| Category | Metric | Description |
|---|---|---|
| Sample Level Metrics | Random Forest Classifier | Classification Accuracy: 0.999 |
| Dataset Level Metrics | Kolmogorov-Smirnov Two-Sample (Ks_2amp) Test | src port: 0.0000 (Significantly different), dst port: 0.0000 (Significantly different), protocol: 1.0000 (Not significantly different), flow duration: 0.0000 (Significantly different), total fwd packet: 0.0000 (Significantly different), total bwd packets: 0.0000 (Significantly different), total length of fwd packet: 0.0000 (Significantly different), total length of bwd packet: 0.0000 (Significantly different), flow bytes/s: 0.0000 (Significantly different), flow packets/s: 0.0000 (Significantly different), fwd header length: 0.0000 (Significantly different), bwd header length: 0.0000 (Significantly different), fwd packets/s: 0.0000 (Significantly different), bwd packets/s: 0.0000 (Significantly different), down/up ratio: 0.0000 (Significantly different), icmp code: 1.0000 (Not significantly different), icmp type: 1.0000 (Not significantly different), total tcp flow time: 0.0000 (Significantly different) |
| Detection Metrics | Logistic Detection | Score: 0.755 |
| SVC Detection | Score: 0.838 | |
| Utility (TSTR Framework) Metrics | Random Forest Regressor | TSTR Score: (0.918, 0.564) |
| Statistical Metrics | Kolmogorov-Smirnov (KS) Test | Score: 0.826 |
| SDV Diagnostic Metrics | Built-In SDV Diagnostic Test | Data Validity Score: 1.0 Data Structure Score: 1.0 Overall Score (Average): 1.0 |
| Buit-In SDV Data Quality/Statistical Similarity Test | Column Shapes Score: 0.826 Column Pair Trends Score: 0.925 Overall Score (Average): 0.876 |
|
| Column Shapes Sub-scores | src port: 0.902677, dst port: 0.774134, protocol: 0.999189, flow duration: 0.734528, total fwd packet: 0.909325, total bwd packets: 0.928062, total length of fwd packet: 0.764128, total length of bwd packet: 0.741824, flow bytes/s: 0.786316, flow packets/s: 0.701177, fwd header length: 0.852163, bwd header length: 0.806189, fwd packets/s: 0.728877, bwd packets/s: 0.841393, down/up ratio: 0.726214, icmp code: 0.999768, icmp type: 0.999768, total tcp flow time: 0.667315 |
| Category | Metric | Description |
|---|---|---|
| Sample Level Metrics | Random Forest Classifier | Classification Accuracy: 0.999 |
| Dataset Level Metrics | Kolmogorov-Smirnov Two-Sample (Ks_2amp) Test | src port: 0.0000 (Significantly different), dst port: 0.0000 (Significantly different), protocol: 1.0000 (Not significantly different), flow duration: 0.0000 (Significantly different), total fwd packet: 0.0000 (Significantly different), total bwd packets: 0.0000 (Significantly different), total length of fwd packet: 0.0000 (Significantly different), total length of bwd packet: 0.0000 (Significantly different), flow bytes/s: 0.0000 (Significantly different), flow packets/s: 0.0000 (Significantly different), fwd header length: 0.0000 (Significantly different), bwd header length: 0.0000 (Significantly different), fwd packets/s: 0.0000 (Significantly different), bwd packets/s: 0.0000 (Significantly different), down/up ratio: 0.0000 (Significantly different), icmp code: 1.0000 (Not significantly different), icmp type: 1.0000 (Not significantly different), total tcp flow time: 0.0000 (Significantly different) |
| Detection Metrics | Logistic Detection | Score: 0.708 |
| SVC Detection | Score: 0.841 | |
| Utility (TSTR Framework) Metrics | Random Forest Regressor | TSTR Score: (0.918, 0.564) |
| Statistical Metrics | Kolmogorov-Smirnov (KS) Test | Score: 0.849 |
| SDV Diagnostic Metrics | Built-In SDV Diagnostic Test | Data Validity Score: 1.0 Data Structure Score: 1.0 Overall Score (Average): 1.0 |
| Buit-In SDV Data Quality/Statistical Similarity Test | Column Shapes Score: 0.85 Column Pair Trends Score: 0.9234 Overall Score (Average): 0.8922 |
|
| Column Shapes Sub-scores | src port: 0.876390, dst port: 0.778025, protocol: 0.999189, flow duration: 0.733046, total fwd packet: 0.940986, total bwd packets: 0.957083, total length of fwd packet: 0.837711, total length of bwd packet: 0.772698, flow bytes/s: 0.771378, flow packets/s: 0.786386, fwd header length: 0.896262, bwd header length: 0.881161, fwd packets/s: 0.811145, bwd packets/s: 0.844659, down/up ratio: 0.724917, icmp code: 0.999768, icmp type: 0.999768, total tcp flow time: 0.689040 |
| Category | Metric | Description |
|---|---|---|
| Sample Level Metrics | Random Forest Classifier | Classification Accuracy: 0.999 |
| Dataset Level Metrics | Kolmogorov-Smirnov Two-Sample (Ks_2amp) Test | src port: 0.0000 (Significantly different), dst port: 0.0000 (Significantly different), protocol: 1.0000 (Not significantly different), flow duration: 0.0000 (Significantly different), total fwd packet: 0.0000 (Significantly different), total bwd packets: 0.0000 (Significantly different), total length of fwd packet: 0.0000 (Significantly different), total length of bwd packet: 0.0000 (Significantly different), flow bytes/s: 0.0000 (Significantly different), flow packets/s: 0.0000 (Significantly different), fwd header length: 0.0000 (Significantly different), bwd header length: 0.0000 (Significantly different), fwd packets/s: 0.0000 (Significantly different), bwd packets/s: 0.0000 (Significantly different), down/up ratio: 0.0000 (Significantly different), icmp code: 1.0000 (Not significantly different), icmp type: 1.0000 (Not significantly different), total tcp flow time: 0.0000 (Significantly different) |
| Detection Metrics | Logistic Detection | Score: 0.788 |
| SVC Detection | Score: 0.877 | |
| Utility (TSTR Framework) Metrics | Random Forest Regressor | TSTR Score: (0.178, -0074.) |
| Statistical Metrics | Kolmogorov-Smirnov (KS) Test | Score: 0.893 |
| SDV Diagnostic Metrics | Built-In SDV Diagnostic Test | Data Validity Score: 1.0 Data Structure Score: 1.0 Overall Score (Average): 1.0 |
| Buit-In SDV Data Quality/Statistical Similarity Test | Column Shapes Score: 0.893 Column Pair Trends Score: 0.951 Overall Score (Average): 0.922 |
|
| Column Shapes Sub-scores | src port: 0.903233, dst port: 0.739740, protocol: 0.999189, flow duration: 0.877455, total fwd packet: 0.930170, total bwd packets: 0.952890, total length of fwd packet: 0.944645, total length of bwd packet: 0.878914, flow bytes/s: 0.950065, flow packets/s: 0.873911, fwd header length: 0.892510, bwd header length: 0.845910, fwd packets/s: 0.866662, bwd packets/s: 0.914559, down/up ratio: 0.628034, icmp code: 0.999768, icmp type: 0.999768, total tcp flow time: 0.872846 |
| Category | Metric | Description |
|---|---|---|
| Sample Level Metrics | Random Forest Classifier | Classification Accuracy: 0.999 |
| Dataset Level Metrics | Kolmogorov-Smirnov Two-Sample (Ks_2amp) Test | src port: 0.0000 (Significantly different), dst port: 0.0000 (Significantly different), protocol: 1.0000 (Not significantly different), flow duration: 0.0000 (Significantly different), total fwd packet: 0.0000 (Significantly different), total bwd packets: 0.0000 (Significantly different), total length of fwd packet: 0.0000 (Significantly different), total length of bwd packet: 0.0000 (Significantly different), flow bytes/s: 0.0000 (Significantly different), flow packets/s: 0.0000 (Significantly different), fwd header length: 0.0000 (Significantly different), bwd header length: 0.0000 (Significantly different), fwd packets/s: 0.0000 (Significantly different), bwd packets/s: 0.0000 (Significantly different), down/up ratio: 0.0000 (Significantly different), icmp code: 1.0000 (Not significantly different), icmp type: 1.0000 (Not significantly different), total tcp flow time: 0.0000 (Significantly different) |
| Detection Metrics | Logistic Detection | Score: 0.813 |
| SVC Detection | Score: 0.858 | |
| Utility (TSTR Framework) Metrics | Random Forest Regressor | TSTR Score: (0.075, -0045.) |
| Statistical Metrics | Kolmogorov-Smirnov (KS) Test | Score: 0.901 |
| SDV Diagnostic Metrics | Built-In SDV Diagnostic Test | Data Validity Score: 1.0 Data Structure Score: 1.0 Overall Score (Average): 1.0 |
| Buit-In SDV Data Quality/Statistical Similarity Test | Column Shapes Score: 0.903 Column Pair Trends Score: 0.958 Overall Score (Average): 0.931 |
|
| Column Shapes Sub-scores | src port: 0.928826, dst port: 0.767348, protocol: 0.999189, flow duration: 0.856402, total fwd packet: 0.934825, total bwd packets: 0.960997, total length of fwd packet: 0.944043, total length of bwd packet: 0.891120, flow bytes/s: 0.925560, flow packets/s: 0.879655, fwd header length: 0.882944, bwd header length: 0.933250, fwd packets/s: 0.833148, bwd packets/s: 0.907449, down/up ratio: 0.720516, icmp code: 0.999768, icmp type: 0.999768, total tcp flow time: 0.882435 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).