Submitted:
22 July 2025
Posted:
24 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Datasets and Traffic Classification Models
1.2. Protocol-Specific Behavioral Analysis
1.3. Foundational and Contextual Studies
1.4. Gaps in Existing Work
1.5. Contribution of This Work
1.6. Paper Structure
2. Dataset Description
2.1. Dataset Collection Methodology

- Tor (The Onion Router): We configured Tor on a Raspberry Pi 5 (Kali Linux), Windows machines, and Azure-based VPNs. Behaviors captured included web browsing, email via Thunderbird (through proxychains), chatting and VoIP via Telegram, file transfers, audio, and video streaming (including YouTube).
- Freenet: We used the Freenet platform to simulate browsing, file sharing, chatting (via FMS), video streaming (via FreeTube), and decentralized email (via Web of Trust).
- ZeroNet: Deployed on a Windows 10 environment, this protocol allowed the collection of traffic related to ZeroMail, ZeroChat, ZeroTube, decentralized file transfers (IFS), and general browsing behaviors through peer-to-peer hosting.
- VPN: An OpenVPN server was deployed on Microsoft Azure to simulate encrypted communication and traffic redirection. Services accessed included Spotify (audio), YouTube (video), Discord, WhatsApp, and Telegram for chat, VoIP, and file transfer sessions.
- I2P (Invisible Internet Project): We installed I2P using Docker, simulating access to hidden services like ramble.i2p and i2pforum.i2p. Behaviors recorded included browsing, email via i2pmail, chatting via i2pchat, video streaming (via invidious front-end), and torrent-based file transfers.
2.2. Dataset Characteristics
2.3. Data Annotation and Labeling
3. Experimental Setup and Benchmarking
3.1. Data Preprocessing
- Feature Inspection: All dataset features across the three layers were initially examined.
- Handling Missing Values: Null values were identified and removed to ensure data completeness.
- Removing Duplicates: Duplicate records were checked and addressed.
- Zero-Value Features: Features with only zero values were detected and excluded from the dataset.
- Filtering Misclassified and Non-Relevant Flows: (a) Traffic involving well-known DNS servers (e.g., Google, Cloudflare) was removed. (b) Communications between private IPs were also excluded.
- Eliminating String-Based Features: Features containing non-numeric (string) values unsuitable for machine learning were removed.
- Feature Selection via ANOVA: (a) Analysis of Variance (ANOVA) was applied to select the most relevant numerical features. (b) Features with a p-value ≤ 0.05 were retained as they showed statistically significant differences between label groups.
3.2. Machine Learning Training
- Gaussian Naive Bayes (GNB) is a type of Naive Bayes method based on continuous attributes and the data features that follow a Gaussian distribution throughout the dataset. This “naive” assumption simplifies calculations and makes the model fast and efficient. Gaussian Naive Bayes is widely used because it performs well even with small datasets and is easy to implement and interpret
- Decision Tree Classifier (DT): is a flowchart-like tree structure where an internal node represents a feature (or attribute), the branch represents a decision rule, and each leaf node represents the outcome.
- Random Forest (RF) is a method that combines the predictions of multiple decision trees to produce a more accurate and stable result. It can be used for both classification and regression tasks.
- Logistic Regression is a supervised machine learning algorithm used for classification tasks where the goal is to predict the probability that an instance belongs to a given class or not. Logistic regression is a statistical algorithm that analyzes the relationship between two data factors.
- XGBoost is a high-performance machine learning algorithm that utilizes gradient boosting and ensemble learning techniques to optimize predictive accuracy and computational speed by efficiently combining multiple decision trees for improved model performance.
- Multilayer Perceptron (MLP): is an artificial neural network consisting of multiple layers of interconnected nodes, each performing specific computations to process input data and make predictions.
- A Support Vector Machine (SVM) is a supervised machine learning algorithm for classification and regression tasks. It finds the optimal hyperplane (or line in 2D space) that best separates different classes in a dataset, maximizing the margin between them.
- Bagging Decision Trees (Bag-DT) is an ensembled modelling technique, used to combine multiple Decision Trees in parallel into memory, to build a stronger model
- Ensemble Learning (DT, RF, XGBoost): is a method where we use many small models instead of just one. Each model may not be strong, but when we combine their results, we get a better and more accurate answer. It's like asking a group of people for advice instead of just one person. Each one might be a little wrong, but together, they usually give a better answer
4. Potential Applications
4.1. Intrusion Detection and Anomaly Detection
4.2. Cyber Threat Intelligence
4.3. Behavioral Traffic Classification
4.4. Network Traffic Analysis Research
4.5. Curriculum and Educational Use
5. Ethical Considerations and Limitations
- Ethical concerns related to darknet data collection.
- Steps taken to ensure privacy and compliance with regulations.
- Limitations of the dataset (e.g., biases, data representativeness).
5.1. Ethical Considerations
5.2. Limitations
- Synthetic Environment Bias: The traffic was generated through real applications and intentional behaviors but was still conducted in a controlled and partially simulated environment. This may not fully reflect the complexity, noise, or unpredictability of actual darknet communications in the wild, potentially affecting the robustness of models trained exclusively on this dataset.
- Representativeness and Overfitting Risk: The dataset, while diverse, may not capture the full spectrum of darknet traffic variations due to differences in user behavior, time, geography, and network conditions. As a result, machine learning models trained on this dataset may risk overfitting to its specific traffic patterns, particularly when applied to unseen or real-world environments. Researchers are encouraged to use this dataset alongside other sources or perform cross-dataset validation when possible.
- Incomplete Behavioral Coverage: In some darknet technologies, we could not simulate or access certain behaviors, especially those in deeper functional layers (e.g., decentralized email, P2P file sharing, or video streaming in inactive or partially deprecated networks). This was due to service unavailability, inactive peer networks, or technical restrictions in protocols like Freenet and I2P during the data collection.
- Protocol and Platform Scope: The dataset includes a targeted set of darknet platforms (Tor, Freenet, ZeroNet, I2P, VPN), but omits other technologies such as GNUnet, LokiNet, or RetroShare. This may limit its comprehensiveness in representing the entire darknet ecosystem.
- Temporal Constraints: All traffic was collected over a specific period, which may exclude evolving usage patterns or emerging darknet platforms. Periodic updates would be needed to maintain relevance and adaptability to current threats.
- Encrypted Traffic Complexity: As with many flow-based datasets, payload content is unavailable. While this enhances privacy and ethics compliance, it also restricts the depth of semantic analysis possible in applications like content-based filtering or deep behavioral profiling.
6. Conclusions and Future Work
- Expansion of Behavioral Coverage: Additional behaviors that could not be captured due to technical or network limitations—such as decentralized email or video streaming on inactive darknet services—can be included in future iterations.
- Inclusion of More Protocols: Future versions may incorporate other darknet technologies such as GNUnet, LokiNet, or Retroshare to broaden the representational coverage of darknet traffic.
- Long-Term Data Collection: Extending the capture period will enable the observation of temporal changes in darknet traffic, allowing for the study of long-term trends and protocol evolution.
- Machine Learning Benchmarking: While initial tests showed promising results, future work may include formal benchmarking of classification models across multiple behaviors, feature sets, and darknet technologies. This could also include comparisons with other datasets to evaluate generalization.
- Cross-Dataset Validation: To reduce overfitting risks and improve robustness, future studies may combine this dataset with others in joint evaluations.
- Payload-Inclusive Variant(If ethically and legally viable) Consider a secure variant with limited payload samples for advanced analysis or encrypted traffic modeling.
- Public Release and Community Support: We aim to release the dataset with full documentation, usage guidelines, and version control to encourage reproducibility, community feedback, and collaborative improvements.
Data Availability Statement
References
- Al-Fayoumi, M.; Elayyan, A.; Odeh, A.; Al-Haija, Q.A. Tor network traffic classification using machine learning based on time-related feature. 6th Smart Cities Symposium (SCS 2022), Hybrid Conference, Bahrain, 2022, pp. 92-97. [CrossRef]
- Al-Haija, A.; Krichen, Q.; Elhaija, A.W. Machine-Learning-Based Darknet Traffic Detection System for IoT Applications. Electronics 2022, 11, 556. [Google Scholar] [CrossRef]
- Hu, Y.; Zou, F.; Li, L.; Yi, P. Traffic Classification of User Behaviors in Tor, I2P, ZeroNet, Freenet. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 2020, pp. 418-424. [CrossRef]
- Alimoradi, M.; Zabihimayvan, M.; Daliri, A.; Sledzik, R.; Sadeghi, R. Deep Neural Classification of Darknet. Frontiers in Artificial Intelligence and Applications, Volume 356: Artificial Intelligence Research and Development.
- Rawashdeh, M.; Al-Haija, Q.A.; Qasaimeh, M. Analysis of TOR Artifacts and Traffic in Windows 11: A Virtual Lab Approach and Dataset Creation," 2023 14th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 2023, pp. 1-6. [CrossRef]
- Yin, H.; He, Y. I2P Anonymous Traffic Detection and Identification. 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), Coimbatore, India, 2019, pp. 157-162. [CrossRef]
- Wang, S.; Gao, Y.; Shi, J.; Wang, X.; Zhao, C.; Yin, Z. 2020. Look deep into the new deep network: A measurement study on the ZeroNet. In Computational Science–ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part I 20 (pp. 595-608). Springer International Publishing.
- Al-Haija, Q.A.; Ibrahim, R. Introduction to Dark Web. in Perspectives on Ethical Hacking and Penetration Testing, IGI Global, 2023, p. 445.







| Darknet type | Behaviors captured |
| TOR | Browsing, Email, Chatting, VOIP, File Transfer, Streaming |
| Freenet | Browsing, Chatting, Email, File Sharing, Video |
| Zeronet | Email, Chatting, File Sharing, Video |
| VPN | Streaming, File Transfer, Chatting, VOIP |
| I2P | Browsing, Email, Chatting, File Sharing, Video |
| Label | Count |
| Normal | 360,358 |
| Darknet | 91,404 |
| Label | Count |
| Freenet | 26,284 |
| Zeronet | 25,499 |
| I2P | 22,958 |
| Tor | 12,546 |
| VPN | 4,117 |
| Label | Count |
| Browsing | 33,586 |
| FTP | 20,214 |
| Video | 9,559 |
| P2P | 9,392 |
| 7,873 | |
| Audio | 5,953 |
| Chat | 3,489 |
| VOIP | 1,338 |
| Layer | N Estimators | Maximum Leaf Nodes |
| Layer 1 | 300 | 50 |
| Layer 2 | 300 | 50 |
| Layer 3 | 100 | 50 |
| Model | Layer 1 Accuracy | Layer 2 Accuracy | Layer 3 Accuracy |
| Decision Trees | 0.994626903 | 0.953850251 | 0.849317619 |
| Ensemble (XGBoost, DT, RF) | 0.992257785 | 0.957730939 | 0.80936156 |
| XGBoost | 0.991808736 | 0.962068178 | 0.804557895 |
| Random Forests | 0.976448181 | 0.870453508 | 0.535347893 |
| MLP | 0.972066088 | 0.906787399 | 0.598763708 |
| Bagging-DT | 0.953809944 | 0.863491097 | 0.570835428 |
| SVM - Minmax Scaler | 0.882535111 | 0.731281388 | 0.476884693 |
| GaussianNB | 0.667456373 | 0.385063156 | 0.178573423 |
| Logistic Regression | 0.64958734 | 0.365659717 | 0.188013182 |
| Model | Layer 1 Time (s) | Layer 2 Time (s) | Layer 3 Time (s) |
| GaussianNB | 0.121964693 | 0.143498898 | 0.418830156 |
| Decision Trees | 0.045069218 | 0.024540663 | 0.057661057 |
| Random Forests | 0.279803038 | 0.203600407 | 0.223201990 |
| Bagging-DT | 9.853935242 | 2.273411274 | 6.706970453 |
| MLP | 0.655432940 | 0.320156097 | 0.702657700 |
| SVM - Minmax Scaler | 0.017041683 | 0.021026850 | 0.062469721 |
| XGBoost | 0.053077459 | 0.092761278 | 0.219833136 |
| Logistic Regression | 0.047414303 | 0.038285732 | 0.107613802 |
| Ensemble (XGBoost, DT, RF) | 1.206273079 | 0.638847589 | 1.271958113 |
| Model | Class | Precision | Recall | F1-score | Support |
| GaussianNB | Darknet | 0.97 | 0.35 | 0.51 | 32147 |
| Normal | 0.6 | 0.99 | 0.75 | 32434 | |
| Decision Trees | Darknet | 1.0 | 0.99 | 0.99 | 32147 |
| Normal | 0.99 | 1.0 | 0.99 | 32434 | |
| Random Forests | Darknet | 0.97 | 0.98 | 0.98 | 32147 |
| Normal | 0.98 | 0.97 | 0.98 | 32434 | |
| Bagging-DT | Normal | 0.96 | 0.95 | 0.95 | 32147 |
| Darknet | 0.95 | 0.96 | 0.95 | 32434 | |
| MLP | Normal | 0.98 | 0.97 | 0.97 | 32147 |
| Darknet | 0.97 | 0.98 | 0.97 | 32434 | |
| SVM | Darknet | 0.97 | 0.8 | 0.87 | 32147 |
| Normal | 0.83 | 0.97 | 0.89 | 32434 | |
| XGBoost | Normal | 0.99 | 0.99 | 0.99 | 32434 |
| Darknet | 0.99 | 0.99 | 0.99 | 32147 | |
| Ensemble | Normal | 0.99 | 0.99 | 0.99 | 32434 |
| Darknet | 0.99 | 0.99 | 0.99 | 32147 | |
| Logistic Regression | Darknet | 0.63 | 0.74 | 0.68 | 32147 |
| Normal | 0.69 | 0.57 | 0.62 | 32434 |
| Model | Class | Precision | Recall | F1-score | Support |
| GaussianNB | Freenet | 0.57 | 0.47 | 0.52 | 5290 |
| I2P | 0.33 | 0.09 | 0.14 | 5315 | |
| Tor | 0.76 | 0.15 | 0.25 | 5249 | |
| VPN | 0.31 | 0.95 | 0.47 | 5157 | |
| Zeronet | 0.38 | 0.27 | 0.32 | 5273 | |
| Decision Trees | Freenet | 0.96 | 0.96 | 0.96 | 5290 |
| I2P | 0.93 | 0.93 | 0.93 | 5315 | |
| Tor | 0.92 | 0.92 | 0.92 | 5249 | |
| VPN | 0.99 | 0.99 | 0.99 | 5157 | |
| Zeronet | 0.96 | 0.97 | 0.96 | 5273 | |
| Random Forests | Freenet | 0.94 | 0.91 | 0.93 | 5290 |
| I2P | 0.86 | 0.74 | 0.8 | 5315 | |
| Tor | 0.81 | 0.83 | 0.82 | 5249 | |
| VPN | 0.91 | 0.98 | 0.94 | 5157 | |
| Zeronet | 0.83 | 0.89 | 0.86 | 5273 | |
| Bagging-DT | Tor | 0.9 | 0.9 | 0.9 | 5290 |
| VPN | 0.86 | 0.75 | 0.8 | 5315 | |
| Freenet | 0.78 | 0.86 | 0.82 | 5249 | |
| I2P | 0.88 | 0.93 | 0.91 | 5157 | |
| Zeronet | 0.88 | 0.85 | 0.87 | 5273 | |
| MLP | Tor | 0.96 | 0.93 | 0.94 | 5290 |
| VPN | 0.91 | 0.8 | 0.85 | 5315 | |
| Freenet | 0.83 | 0.91 | 0.87 | 5249 | |
| I2P | 0.93 | 0.99 | 0.96 | 5157 | |
| Zeronet | 0.89 | 0.89 | 0.89 | 5273 | |
| SVM | Freenet | 0.71 | 0.87 | 0.78 | 5290 |
| I2P | 0.82 | 0.41 | 0.55 | 5315 | |
| Tor | 0.74 | 0.67 | 0.7 | 5249 | |
| VPN | 0.74 | 0.91 | 0.82 | 5157 | |
| Zeronet | 0.7 | 0.79 | 0.74 | 5273 | |
| XGBoost | Tor | 0.91 | 0.95 | 0.93 | 5249 |
| VPN | 0.99 | 1.0 | 0.99 | 5157 | |
| Freenet | 0.98 | 0.96 | 0.97 | 5290 | |
| I2P | 0.95 | 0.94 | 0.94 | 5315 | |
| Zeronet | 0.98 | 0.97 | 0.97 | 5273 | |
| Logistic Regression | Freenet | 0.37 | 0.44 | 0.4 | 5290 |
| I2P | 0.26 | 0.47 | 0.34 | 5315 | |
| Tor | 0.49 | 0.34 | 0.41 | 5249 | |
| VPN | 0.79 | 0.22 | 0.34 | 5157 | |
| Zeronet | 0.34 | 0.36 | 0.35 | 5273 | |
| Ensemble | Tor | 0.89 | 0.96 | 0.92 | 5249 |
| VPN | 0.98 | 1.0 | 0.99 | 5157 | |
| Freenet | 0.98 | 0.95 | 0.97 | 5290 | |
| I2P | 0.96 | 0.92 | 0.94 | 5315 | |
| Zeronet | 0.98 | 0.96 | 0.97 | 5273 |
| Model | Class | Precision | Recall | F1-score | Support |
| GaussianNB | Audio | 0.47 | 0.07 | 0.12 | 6674 |
| Browsing | 0.2 | 0.03 | 0.06 | 6593 | |
| Chat | 0.27 | 0.04 | 0.07 | 6809 | |
| 0.18 | 0.24 | 0.21 | 6786 | ||
| FTP | 0.24 | 0 | 0.01 | 6681 | |
| P2P | 0.35 | 0.03 | 0.05 | 6745 | |
| VOIP | 0.16 | 0.92 | 0.27 | 6798 | |
| Video | 0.34 | 0.08 | 0.13 | 6623 | |
| Decision Trees | Audio | 0.92 | 0.93 | 0.93 | 6674 |
| Browsing | 0.8 | 0.8 | 0.8 | 6593 | |
| Chat | 0.86 | 0.85 | 0.86 | 6809 | |
| 0.8 | 0.81 | 0.8 | 6786 | ||
| FTP | 0.75 | 0.75 | 0.75 | 6681 | |
| P2P | 0.87 | 0.85 | 0.86 | 6745 | |
| VOIP | 0.97 | 0.97 | 0.97 | 6798 | |
| Video | 0.82 | 0.83 | 0.83 | 6623 | |
| Random Forests | Audio | 0.69 | 0.75 | 0.72 | 6674 |
| Browsing | 0.53 | 0.41 | 0.46 | 6593 | |
| Chat | 0.49 | 0.37 | 0.42 | 6809 | |
| 0.44 | 0.36 | 0.4 | 6786 | ||
| FTP | 0.44 | 0.26 | 0.32 | 6681 | |
| P2P | 0.45 | 0.6 | 0.52 | 6745 | |
| VOIP | 0.73 | 0.94 | 0.83 | 6798 | |
| Video | 0.43 | 0.59 | 0.49 | 6623 | |
| Bagging-DT | Audio | 0.73 | 0.76 | 0.74 | 6674 |
| Chat | 0.53 | 0.43 | 0.47 | 6593 | |
| FTP | 0.62 | 0.37 | 0.46 | 6809 | |
| Browsing | 0.46 | 0.37 | 0.41 | 6786 | |
| 0.51 | 0.35 | 0.41 | 6681 | ||
| Video | 0.49 | 0.69 | 0.57 | 6745 | |
| P2P | 0.7 | 0.95 | 0.81 | 6798 | |
| VOIP | 0.47 | 0.6 | 0.53 | 6623 | |
| MLP | Audio | 0.76 | 0.75 | 0.76 | 6674 |
| Chat | 0.5 | 0.57 | 0.54 | 6593 | |
| FTP | 0.55 | 0.53 | 0.54 | 6809 | |
| Browsing | 0.55 | 0.46 | 0.5 | 6786 | |
| 0.47 | 0.37 | 0.41 | 6681 | ||
| Video | 0.58 | 0.58 | 0.58 | 6745 | |
| P2P | 0.8 | 0.96 | 0.87 | 6798 | |
| VOIP | 0.54 | 0.59 | 0.57 | 6623 | |
| SVM | Audio | 0.64 | 0.71 | 0.67 | 6674 |
| Browsing | 0.36 | 0.32 | 0.34 | 6593 | |
| Chat | 0.43 | 0.26 | 0.33 | 6809 | |
| 0.42 | 0.16 | 0.23 | 6786 | ||
| FTP | 0.37 | 0.23 | 0.28 | 6681 | |
| P2P | 0.4 | 0.62 | 0.49 | 6745 | |
| VOIP | 0.66 | 0.97 | 0.78 | 6798 | |
| Video | 0.38 | 0.54 | 0.45 | 6623 | |
| XGBoost | Audio | 0.91 | 0.89 | 0.9 | 6674 |
| Chat | 0.8 | 0.81 | 0.8 | 6809 | |
| FTP | 0.75 | 0.66 | 0.7 | 6681 | |
| Browsing | 0.82 | 0.73 | 0.77 | 6593 | |
| 0.78 | 0.7 | 0.74 | 6786 | ||
| Video | 0.72 | 0.81 | 0.76 | 6623 | |
| P2P | 0.76 | 0.84 | 0.79 | 6745 | |
| VOIP | 0.92 | 0.99 | 0.95 | 6798 | |
| Logistic Regression | Audio | 0.21 | 0.18 | 0.2 | 6674 |
| Browsing | 0.3 | 0 | 0.01 | 6593 | |
| Chat | 0.21 | 0.1 | 0.14 | 6809 | |
| 0.15 | 0.28 | 0.19 | 6786 | ||
| FTP | 0.23 | 0.33 | 0.27 | 6681 | |
| P2P | 0.14 | 0.34 | 0.2 | 6745 | |
| VOIP | 0.35 | 0 | 0 | 6798 | |
| Video | 0.29 | 0.26 | 0.27 | 6623 | |
| Ensemble | Audio | 0.85 | 0.91 | 0.88 | 6674 |
| Chat | 0.79 | 0.83 | 0.81 | 6809 | |
| FTP | 0.72 | 0.7 | 0.71 | 6681 | |
| Browsing | 0.83 | 0.72 | 0.77 | 6593 | |
| 0.83 | 0.69 | 0.75 | 6786 | ||
| Video | 0.75 | 0.8 | 0.77 | 6623 | |
| P2P | 0.78 | 0.84 | 0.81 | 6745 | |
| VOIP | 0.92 | 0.99 | 0.95 | 6798 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).