Submitted:
03 January 2025
Posted:
03 January 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- How can ML be leveraged to improve the real-time detection and identification of insider threats in organizational environments? Answered in Subsection 3.1.
- What limitations of existing security tools can be addressed through real-time data analysis and dynamic user profiling using ML? Answered in Section 2.
- How can ML models effectively classify users based on their behavior and assign risk levels to detect and mitigate insider threats in real-time? Answered in Section 5.
- What unique capabilities does the proposed ML tool provide over traditional security measures, especially in terms of automating real-time threat detection and user risk profiling? Answered in Subsection 3.2, Subsection 3.3, and Subsection 7.1.
- What are the key vulnerabilities of the proposed tool, and what measures can be implemented to mitigate these threats effectively? Answered in Section 6.
- Introduces a ML tool that continuously monitors employee activity in real-time, enabling rapid detection of insider threats.
- Implements dynamic user profiling that classifies individuals into risk categories (low, medium, high) based on their behavior, ensuring accurate identification of risky users.
- Automates immediate alert generation, notifying cybersecurity teams promptly when abnormal activities are detected, reducing response time.
- Operates as a fully automated, non-interactive system, eliminating the need for manual intervention while enhancing efficiency.
- Allows for customizable configurations, enabling organizations to adjust parameters like feature weights and risk thresholds according to their specific security needs.
- Combines real-time detection and user classification into a unified solution, addressing the shortcomings of traditional tools lacking these capabilities.
2. Related Work
2.1. Traditional-Based
2.2. ML-Based
2.3. Limitations and Gaps
3. The Proposed Tool
3.1. Workflow
3.1.1. Real-Time Analysis Tool
- Continuous Activities Monitoring: The proposed tool provides continuous surveillance of an organization’s network, capturing real-time data that shows the organization’s employees’ daily activities.
- Abnormality Identification: Utilizing ML, the system identifies abnormalities by examining the anomalous daily activities of employees on an organization’s network.
- Immediate Alert Generation: Upon identifying abnormalities, the proposed tool promptly issues detailed alerts to the cybersecurity team for immediate action. Figure 3 shows an example of the generated alert.
3.1.2. Employee Risk Classification Tool
- 4.
-
Risk-score Calculation: Each employee is assigned a risk-score, which is determined by their daily activities within the organization’s network, as calculated according to Equation (1).
Where- is the RiskScore for the record.
- The summation indicates that we are summing over all features from 1 to n.
- is the weight associated with the feature.
is the indicator function that equals 1 if the feature for the record, , is 1 indicating first-time abnormal daily activity, and 0 otherwise indicating no abnormal activity.
- 5.
- Dynamic Employee Profiling: Following step 4, employees are dynamically profiled, with their profiles undergoing continuous updates to reflect behaviors within the organization’s networks as well as their calculated risk-scores.
- 6.
- Classification of Employees: Each employee is classified into one of the risk levels (i.e., Low, Moderate, High) based on their data updated in step 5, utilizing ML.
- 7.
- Administration Notification: Employees identified as moderate or high risk are reported to the administration for necessary interventions, which may include additional training or enhanced monitoring. Figure 4 shows an example of the generated messages that will be sent to the administration.
3.1.3. Continuous
- 8.
- Recurrence: The proposed tool then restarts its monitoring process, ensuring continuous adaptation and up-to-date security maintenance.
3.2. The Proposed Tool Features
3.2.1. Instantaneous Data
3.2.2. Real-Time Analysis
3.2.3. Real-Time Classification
3.2.4. Non-Interactive
3.2.5. Continuous
3.2.6. Adjustable
3.2.7. Detection Time
3.2.8. Classification Time
3.3. Qualitative Comparison with the Discussed Works
4. Dataset
4.1. Data Acquisition
| Algorithm 1 Data Generation |
|
4.2. Feature Engineering
4.3. Data Validation
- Handling Missing Values: Missing values were imputed with 1, aligning with the goal of detecting the first instance of anomalous activity. This conservative approach minimizes the risk of false negatives by assuming that missing values may indicate potential anomalous activity.
- Outlier Detection and Treatment: Frequency values outside the {0, 1} range were adjusted to 1, treating these anomalies as indicators of potentially risky behavior. Our dataset did not exhibit outliers outside this range, as shown in Figure 6.
- Addressing Dataset Imbalance: We employed the Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors (SMOTEENN) to balance the representation across anomalous behaviors, which is crucial for effective model training.
| Algorithm 2 Data Validation |
|
4.4. Data Privacy
- Pseudonymization in Alerts: Users remain pseudonymized during the alert and monitoring phase, allowing for risk assessment without revealing sensitive information; see Figure 3.
- Controlled Access for De-anonymization: Full identification is restricted to authorized personnel when corrective action is necessary, maintaining privacy until intervention is required; see Figure 4.
5. Tool Validation
5.1. Real-Time Simulation
- The system used for this research is a Windows 11 Pro 64-bit HP laptop, equipped with an Intel(R) Core(TM) i5-10210U CPU, operating at a base speed of 1.60 GHz with a maximum clock speed of 2.11 GHz. The laptop features 8 GB of RAM and runs a 64-bit operating system on an x64-based processor. This configuration was sufficient for conducting the experiments in this study.
- The laptop is equipped with Intel(R) UHD Graphics, provided by Intel Corporation, featuring an internal DAC type. It offers a total memory of 4147 MB, including 128 MB of dedicated VRAM. The display operates at a resolution of 1366 x 768 with 32-bit color depth and a 60 Hz refresh rate.
- Anaconda was utilized as the primary environment manager, enabling the installation and management of required Python libraries. Python served as the main programming language, with key libraries such as Pandas and NumPy for data manipulation, Scikit-learn for implementing ML models (RandomForest, LogisticRegression, and SVM), and XGBoost for advanced gradient boosting. The Faker library was employed to generate synthetic employee data, such as names, emails, and behaviors, simulating various anomalous activities.
- Flask was used to set up a REST API for simulating the injection of employee behaviors and POSTMAN was the API testing platform used to inject behaviors into the simulation and retrieve results.
| Algorithm 3 Continuous Activities, Identification, and Alert |
|
| Algorithm 4 Employee Risk Classification and Dynamic Profiling |
|
| Algorithm 5 Administration Notification and Recurrence |
|
5.2. ML Models
- Random Forest (RF): A robust ensemble learning method that builds multiple decision trees and aggregates their results. It is well-suited for this system due to its ability to handle large datasets with a mixture of features types and its strength in estimating feature importance.
- XGBoost: Like RF, XGBoost is an ensemble method but uses a gradient boosting framework, building trees sequentially to improve model accuracy. It is known for its high performance, speed, and ability to handle complex patterns, which is crucial for accurately classifying user risk.
- Support Vector Machine (SVM): A powerful model for classification problems, particularly when data points are not linearly separable. It works well in high-dimensional spaces, making it effective for identifying risky behavior based on a variety of input features.
- Logistic Regression (LR): An interpretable model that provides clear probabilities for classification. Given its simplicity and ease of implementation, it serves as a baseline to compare with more complex models like RF and XGBoost.
5.3. Evaluation Metrics
5.4. ML Results
5.5. Detection and Classification Time Evaluation
5.5.1. Detection Time
5.5.2. Classification Time
6. Threat Models
6.1. Internal Threats
6.2. External Threats
6.3. Systemic Vulnerabilities
6.4. Communication and Infrastructure Threats
7. Discussion
7.1. Quantitative Comparison with Recent Studies
7.2. Limitations
7.3. Future Research Direction
8. Conclusion
References
- Verizon. 2024 Data Breach Investigations Report. Technical report, Verizon, 2024. Accessed September 10, 2024.
- IBM. Cost of a Data Breach Report 2024. Technical report, IBM, 2024. Accessed September 10, 2024.
- Le, D.C.; Zincir-Heywood, N. Exploring anomalous behaviour detection and classification for insider threat identification. International Journal of Network Management 2021, 31, e2109. [CrossRef]
- Al-Shehari, T.; Rosaci, D.; Al-Razgan, M.; Alfakih, T.; Kadrie, M.; Afzal, H.; Nawaz, R. Enhancing Insider Threat Detection in Imbalanced Cybersecurity Settings Using the Density-Based Local Outlier Factor Algorithm. IEEE Access 2024. [CrossRef]
- Chaabouni, N.; Mosbah, M.; Zemmari, A.; Sauvignac, C.; Faruki, P. Network intrusion detection for IoT security based on learning techniques. IEEE Communications Surveys & Tutorials 2019, 21, 2671–2701. [CrossRef]
- Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J. Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2019, 2, 1–22.
- Chandel, S.; Yu, S.; Yitian, T.; Zhili, Z.; Yusheng, H. Endpoint protection: Measuring the effectiveness of remediation technologies and methodologies for insider threat. In Proceedings of the 2019 international conference on cyber-enabled distributed computing and knowledge discovery (cyberc). IEEE, 2019, pp. 81–89.
- Zargar, A.; Nowroozi, A.; Jalili, R. XABA: A zero-knowledge anomaly-based behavioral analysis method to detect insider threats. In Proceedings of the 2016 13th International Iranian society of cryptology conference on information security and cryptology (ISCISC). IEEE, 2016, pp. 26–31.
- Fujii, S.; Kurima, I.; Isobe, Y. Scoring Method for Detecting Potential Insider Threat based on Suspicious User Behavior using Endpoint Logs. In Proceedings of the Proceedings on the International Conference on Artificial Intelligence (ICAI). The Steering Committee of The World Congress in Computer Science, Computer …, 2019, pp. 291–297.
- Pramudya, P.B.; Alamsyah, A. Implementation of signature-based intrusion detection system using SNORT to prevent threats in network servers. Journal of Soft Computing Exploration 2022, 3, 93–98. [CrossRef]
- Díaz-Verdejo, J.; Muñoz-Calle, J.; Estepa Alonso, A.; Estepa Alonso, R.; Madinabeitia, G. On the detection capabilities of signature-based intrusion detection systems in the context of web attacks. Applied Sciences 2022, 12, 852. [CrossRef]
- Asad, H.; Adhikari, S.; Gashi, I. A perspective–retrospective analysis of diversity in signature-based open-source network intrusion detection systems. International Journal of Information Security 2023, pp. 1–16. [CrossRef]
- Gupta, A.; Sharma, L.S. Performance evaluation of snort and suricata intrusion detection systems on ubuntu server. In Proceedings of the Proceedings of ICRIC 2019: Recent Innovations in Computing. Springer, 2020, pp. 811–821.
- Kumar, A.; Tanwar, A.; Malhotra, V. A COMPARATIVE ANALYSIS OF DIFFERENT INTRUSION DETECTION SYSTEMS. International Research Journal of Modernization in Engineering Technology and Science 2023.
- Guo, Y. A review of Machine Learning-based zero-day attack detection: Challenges and future directions. Computer Communications 2023, 198, 175–185. [CrossRef]
- Singh, U.K.; Joshi, C.; Kanellopoulos, D. A framework for zero-day vulnerabilities detection and prioritization. Journal of Information Security and Applications 2019, 46, 164–172. [CrossRef]
- Alsharabi, N.; Alqunun, M.; Murshed, B.A.H. Detecting Unusual Activities in Local Network Using Snort and Wireshark Tools. Journal of Advances in Information Technology 2023, 14. [CrossRef]
- Legg, P.A.; Buckley, O.; Goldsmith, M.; Creese, S. Caught in the act of an insider attack: detection and assessment of insider threat. In Proceedings of the 2015 IEEE International Symposium on Technologies for Homeland Security (HST), 2015, pp. 1–6. [CrossRef]
- Legg, P.; Buckley, O.; Goldsmith, M.; Creese, S. Automated Insider Threat Detection System Using User and Role-Based Profile Assessment. IEEE Systems Journal 2017, 11, 503–512. [CrossRef]
- Joshi, C.; Aliaga, J.R.; Insua, D.R. Insider Threat Modeling: An Adversarial Risk Analysis Approach. IEEE Transactions on Information Forensics and Security 2021, 16, 1131–1142. [CrossRef]
- Rios Insua, D.; Couce-Vieira, A.; Rubio, J.A.; Pieters, W.; Labunets, K.; G. Rasines, D. An adversarial risk analysis framework for cybersecurity. Risk Analysis 2021, 41, 16–36. [CrossRef]
- Kaushik, K. A systematic approach to develop an advanced insider attacks detection module. Journal of Engineering and Applied Sciences 2021, 8, 33. [CrossRef]
- Mehnaz, S.; Bertino, E. A Fine-Grained Approach for Anomaly Detection in File System Accesses With Enhanced Temporal User Profiles. IEEE Transactions on Dependable and Secure Computing 2021, 18, 2535–2550. [CrossRef]
- Pham, N.; Guo, J.; Wang, Z. Abnormality Detection in Network Traffic by Classification and Graph Data Analysis. In Proceedings of the 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2022, pp. 0041–0047. [CrossRef]
- Teymourlouei, H.; Harris, V.E. Preventing Data Breaches: Utilizing Log Analysis and Machine Learning for Insider Attack Detection. In Proceedings of the 2022 International Conference on Computational Science and Computational Intelligence (CSCI), 2022, pp. 1022–1027. [CrossRef]
- Abdulhammed, R.; Faezipour, M.; Abuzneid, A.; AbuMallouh, A. Deep and machine learning approaches for anomaly-based intrusion detection of imbalanced network traffic. IEEE sensors letters 2018, 3, 1–4. [CrossRef]
- Le, D.C.; Zincir-Heywood, A.N. Evaluating insider threat detection workflow using supervised and unsupervised learning. In Proceedings of the 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 2018, pp. 270–275. [CrossRef]
- Park, H.; Kim, K.; Shin, D.; Shin, D. BGP Dataset-Based Malicious User Activity Detection Using Machine Learning. Information 2023, 14, 501.
- Alshamy, R.; Ghurab, M.; Othman, S.; Alshami, F. Intrusion detection model for imbalanced dataset using SMOTE and random forest algorithm. In Proceedings of the Advances in Cyber Security: Third International Conference, ACeS 2021, Penang, Malaysia, August 24–25, 2021, Revised Selected Papers 3. Springer, 2021, pp. 361–378.
- Padmavathi, G.; Shanmugapriya, D.; Asha, S. A framework to detect the malicious insider threat in cloud environment using supervised learning methods. In Proceedings of the 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, 2022, pp. 354–358.
- Le, D.C.; Zincir-Heywood, N. Anomaly Detection for Insider Threats Using Unsupervised Ensembles. IEEE Transactions on Network and Service Management 2021, 18, 1152–1164. [CrossRef]
- Ahmadi-Assalemi, G.; Al-Khateeb, H.; Epiphaniou, G.; Aggoun, A. Super Learner Ensemble for Anomaly Detection and Cyber-Risk Quantification in Industrial Control Systems. IEEE Internet of Things Journal 2022, 9, 13279–13297. [CrossRef]
- Diop, A.; Emad, N.; Winter, T.; Hilia, M. Design of an ensemble learning behavior anomaly detection framework. International Journal of Computer and Information Engineering 2019, 13, 547–555.
- Yi, J.; Tian, Y. Insider Threat Detection Model Enhancement Using Hybrid Algorithms between Unsupervised and Supervised Learning. Electronics 2024, 13, 973. [CrossRef]
- Alshuaibi, F.; Alshamsi, F.; Saeed, A.; Kaddoura, S. Machine Learning-Based Classification Approach for Network Intrusion Detection System. In Proceedings of the 2024 15th Annual Undergraduate Research Conference on Applied Computing (URC). IEEE, 2024, pp. 1–6.
- Al Lail, M.; Garcia, A.; Olivo, S. Machine learning for network intrusion detection—a comparative study. Future Internet 2023, 15, 243. [CrossRef]
- Nikiforova, O.; Romanovs, A.; Zabiniako, V.; Kornienko, J. Detecting and Identifying Insider Threats Based on Advanced Clustering Methods. IEEE Access 2024, 12, 30242–30253. [CrossRef]
- Mehmood, M.; Amin, R.; Muslam, M.M.A.; Xie, J.; Aldabbas, H. Privilege Escalation Attack Detection and Mitigation in Cloud Using Machine Learning. IEEE Access 2023, 11, 46561–46576. [CrossRef]
- Nandini, K.; Girisha, G.; Reddy, S. CGBA: A Efficient Insider Attacker Detection Technique in Machine Learning. In Proceedings of the 2024 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI). IEEE, 2024, pp. 1–8.
- Li, Y.; Su, Y. The Insider Threat Detection Method of University Website Clusters Based on Machine Learning. In Proceedings of the 2023 6th International Conference on Artificial Intelligence and Big Data (ICAIBD), 2023, pp. 560–565. [CrossRef]
- Suresh, P.V.; Madhavu, M.L. Insider attack: Internal cyber attack detection using machine learning. In Proceedings of the 2021 12th International conference on computing communication and networking technologies (ICCCNT). IEEE, 2021, pp. 1–7.
- Peccatiello, R.B.; Gondim, J.J.C.; Garcia, L.P.F. Applying One-Class Algorithms for Data Stream-Based Insider Threat Detection. IEEE Access 2023, 11, 70560–70573. [CrossRef]
- Böse, B.; Avasarala, B.; Tirthapura, S.; Chung, Y.Y.; Steiner, D. Detecting Insider Threats Using RADISH: A System for Real-Time Anomaly Detection in Heterogeneous Data Streams. IEEE Systems Journal 2017, 11, 471–482. [CrossRef]
- Verma, A.; Ranga, V. Statistical analysis of CIDDS-001 dataset for Network Intrusion Detection Systems using Distance-based Machine Learning. Procedia Computer Science 2018, 125, 709–716. The 6th International Conference on Smart Computing and Communications. [CrossRef]
- Zhang, F.; Kodituwakku, H.A.D.E.; Hines, J.W.; Coble, J. Multilayer Data-Driven Cyber-Attack Detection System for Industrial Control Systems Based on Network, System, and Process Data. IEEE Transactions on Industrial Informatics 2019, 15, 4362–4369. [CrossRef]
- Begli, M.; Derakhshan, F.; Karimipour, H. A layered intrusion detection system for critical infrastructure using machine learning. In Proceedings of the 2019 IEEE 7th International Conference on Smart Energy Grid Engineering (SEGE). IEEE, 2019, pp. 120–124.
- Kim, J.; Park, M.; Kim, H.; Cho, S.; Kang, P. Insider threat detection based on user behavior modeling and anomaly detection algorithms. Applied Sciences 2019, 9, 4018.
- Le, D.C.; Zincir-Heywood, N.; Heywood, M.I. Analyzing Data Granularity Levels for Insider Threat Detection Using Machine Learning. IEEE Transactions on Network and Service Management 2020, 17, 30–44. [CrossRef]
- Khan, A.Y.; Latif, R.; Latif, S.; Tahir, S.; Batool, G.; Saba, T. Malicious Insider Attack Detection in IoTs Using Data Analytics. IEEE Access 2020, 8, 11743–11753. [CrossRef]
- Zou, S.; Sun, H.; Xu, G.; Quan, R. Ensemble Strategy for Insider Threat Detection from User Activity Logs. Computers, Materials & Continua 2020, 65, 1321–1334. [CrossRef]
- Janjua, F.; Masood, A.; Abbas, H.; Rashid, I. Handling insider threat through supervised machine learning techniques. Procedia Computer Science 2020, 177, 64–71. [CrossRef]
- Shaver, A.; Liu, Z.; Thapa, N.; Roy, K.; Gokaraju, B.; Yuan, X. Anomaly based intrusion detection for iot with machine learning. In Proceedings of the 2020 IEEE applied imagery pattern recognition workshop (AIPR). IEEE, 2020, pp. 1–6.
- Abhale, A.B.; Manivannan, S. Supervised machine learning classification algorithmic approach for finding anomaly type of intrusion detection in wireless sensor network. Optical Memory and Neural Networks 2020, 29, 244–256. [CrossRef]
- Oliveira, N.; Praça, I.; Maia, E.; Sousa, O. Intelligent Cyber Attack Detection and Classification for Network-Based Intrusion Detection Systems. Applied Sciences 2021, 11. [CrossRef]
- Al-Shehari, T.; Alsowail, R.A. An insider data leakage detection using one-hot encoding, synthetic minority oversampling and machine learning techniques. Entropy 2021, 23, 1258.
- Almomani, O.; Almaiah, M.A.; Alsaaidah, A.; Smadi, S.; Mohammad, A.H.; Althunibat, A. Machine learning classifiers for network intrusion detection system: comparative study. In Proceedings of the 2021 International Conference on Information Technology (ICIT). IEEE, 2021, pp. 440–445.
- Taghavirashidizadeh, A.; Zavvar, M.; Moghadaspour, M.; Jafari, M.; Garoosi, H.; Zavvar, M.H. Anomaly Detection In IoT Networks Using Hybrid Method Based On PCA-XGBoost. In Proceedings of the 2022 8th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS). IEEE, 2022, pp. 1–5.
- Manoharan, P.; Yin, J.; Wang, H.; Zhang, Y.; Ye, W. Insider threat detection using supervised machine learning algorithms. Telecommunication Systems 2023, pp. 1–17. [CrossRef]
- Inuwa, M.M.; Das, R. A comparative analysis of various machine learning methods for anomaly detection in cyber attacks on IoT networks. Internet of Things 2024, 26, 101162. [CrossRef]
- Faysal, J.A.; Mostafa, S.T.; Tamanna, J.S.; Mumenin, K.M.; Arifin, M.M.; Awal, M.A.; Shome, A.; Mostafa, S.S. XGB-RF: A hybrid machine learning approach for IoT intrusion detection. In Proceedings of the Telecom. MDPI, 2022, Vol. 3, pp. 52–69.
- OYELAKIN, A.M. A Learning Approach for The Identification of Network Intrusions Based on Ensemble XGBoost Classifier. Indonesian Journal of Data and Science 2023, 4, 190–197. [CrossRef]
- Khan, N.; Mohmand, M.I.; Rehman, S.u.; Ullah, Z.; Khan, Z.; Boulila, W. Advancements in intrusion detection: A lightweight hybrid RNN-RF model. Plos one 2024, 19, e0299666. [CrossRef]
- Onyebueke, A.E.; David, A.A.; Munu, S. Network Intrusion Detection System Using XGBoost and Random Forest Algorithms. Asian Journal of Pure and Applied Mathematics 2023, pp. 321–335.







| Study | Instantaneous Data | Real-Time | Real-Time User Classification | Non-Interactive | Continuous | Adjustability | Detection Time | Classification Time |
|---|---|---|---|---|---|---|---|---|
| Analysis | ||||||||
| [26] | N/D | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ |
| [27] | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ | ✗ |
| [28] | ✓ | ✓ | ✗ | N/D | ✓ | ✓ | ✗ | ✗ |
| [29] | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ |
| [30] | ✓ | ✓ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ |
| [31] | ✗ | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ |
| [32] | ✓ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ |
| [33] | ✓ | ✓ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ |
| [34] | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ |
| [35] | ✓ | ✓ | ✗ | N/D | ✗ | ✓ | ✗ | ✗ |
| [36] | ✓ | ✓ | ✗ | N/D | N/D | N/D | ✗ | ✗ |
| [37] | ✓ | ✓ | ✗ | ✗ | ✓ | N/D | ✗ | ✗ |
| [38] | ✓ | ✓ | ✗ | ✗ | N/D | ✗ | ✗ | ✗ |
| [39] | N/D | ✗ | ✗ | ✓ | N/D | N/D | ✗ | ✗ |
| [40] | ✓ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| [41] | N/D | ✓ | ✗ | ✓ | N/D | N/D | ✓ | ✗ |
| [42] | ✓ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ |
| [43] | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ |
| [44] | ✗ | ✓ | ✗ | N/D | N/D | N/D | ✗ | ✗ |
| [45] | ✓ | ✓ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ |
| [46] | ✓ | ✓ | ✗ | ✓ | ✓ | ✗ | ✓ | ✗ |
| [47] | ✓ | ✗ | ✗ | ✓ | ✓ | N/D | ✗ | ✗ |
| [48] | ✓ | ✓ | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ |
| [49] | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ |
| [50] | ✓ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ |
| [51] | ✓ | ✓ | ✗ | ✓ | ✗ | N/D | ✗ | ✗ |
| [52] | ✓ | ✓ | ✗ | N/D | N/D | ✓ | ✓ | ✗ |
| [53] | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ |
| [54] | ✓ | ✓ | ✗ | N/D | ✓ | ✗ | ✗ | ✗ |
| [55] | ✓ | ✗ | ✗ | N/D | ✓ | ✓ | ✗ | ✗ |
| [56] | ✓ | N/D | ✗ | ✓ | N/D | ✗ | ✗ | ✗ |
| [57] | ✗ | ✓ | ✗ | N/D | N/D | ✓ | ✗ | ✗ |
| [58] | ✗ | ✓ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ |
| [59] | ✓ | ✓ | ✗ | ✓ | N/D | ✓ | ✗ | ✗ |
| [60] | ✗ | ✓ | ✗ | N/D | N/D | ✗ | ✓ | ✗ |
| [61] | ✗ | ✗ | ✗ | ✓ | N/D | N/D | ✗ | ✗ |
| [62] | ✗ | ✓ | ✗ | N/D | N/D | ✓ | ✓ | ✗ |
| [63] | ✗ | ✓ | ✗ | N/D | N/D | N/D | ✗ | ✗ |
| ]1*Ours | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Activity Type | Description | Weight |
|---|---|---|
| Login Attempts | Logging in outside the normal business hours | 4 |
| Sensitive Files Access | Unauthorized entry into confidential data | 7 |
| Unauthorized Software | Installation or use of unapproved software within an organization | 9 |
| Data Transfer | Unauthorized or unmonitored transfer of sensitive or confidential data within an organization | 6 |
| Non-Work Websites Visited | Unauthorized access or frequent visitation of websites unrelated to work duties | 5 |
| Physical Access | Unauthorized entry or access to restricted areas, equipment, or sensitive information | 8 |
| Social Engineering Attacks | Deceive individuals into divulging confidential information or performing actions that compromise security protocols | 8 |
| Previous Incidents | Past security breaches, data leaks, or unauthorized activities within an organization’s information systems | 10 |
| Public Info Shared | Disclosing sensitive or confidential information to the public domain | 5 |
| Interaction With Malicious Accounts | Engaging with fraudulent or compromised online entities | 8 |
| Behavior Change | Significant alterations in an individual’s actions or habits, often signaling potential security concerns | 6 |
| Network Interaction | Illegal engagement and communication activities that occur within a networked environment | 7 |
| Poor InfoSec Practices | Inadequate or careless information security practices | 9 |
| Upload Sensitive Information | Upload time, file type, encryption status, and user privilege | 8 |
| Send Sensitive Information | Transmitting confidential or proprietary data through various communication channels | 8 |
| Attempted Thumb Drive Insertion | Unauthorized or suspicious insertion of external storage devices, such as USB thumb drives, into computer systems or network devices | 10 |
| Secure Printing | Printing documents without adequate safeguards to protect the confidentiality and integrity of the printed information | 6 |
| Activity Type | Features |
|---|---|
| Login Attempts | Login time, number of failed attempts, login location, and device type |
| Sensitive Files Access | Access time, file type, access location, and user privilege |
| Unauthorized Software | Installation time, user permission, and location |
| Data Transfer | Transfer time, file size, and destination |
| Non-Work Websites Visited | Visit time, website category, and visit frequency |
| Physical Access | Entry time, location accessed, and badge type |
| Social Engineering Attacks | Attack type, response time, and sensitivity |
| Previous Incidents | Incident type, incident date, user involvement, and incident severity |
| Public Info Shared | Sharing time, information type, and platform location |
| Interaction With Malicious Accounts | Interaction time, malicious flag, and user reaction |
| Behavior Change | Change type, frequency, time of change, and user motivation |
| Network Interaction | Protocol type, data volume, and frequency |
| Poor InfoSec Practices | Practice type, frequency, user awareness, and severity |
| Upload Sensitive Information | Upload time, file type, encryption status, and user privilege |
| Send Sensitive Information | Send time, file type, and user privilege |
| Attempted Thumb Drive Insertion | Insert time, device type, and location |
| Secure Printing | Print time, document type, location, and user privilege |
| Variable | Encoded Values |
|---|---|
| Login Time | Working Hours (0), Non-Working Hours (1) |
| Login Location | Office (0), Remote (1) |
| Device Type | Desktop (0), Laptop (1), Mobile (2) |
| Access Time | Working Hours (0), Non-Working Hours (1) |
| Access Location | Office (0), Remote (1) |
| User Privilege | Normal (0), Admin (1) |
| File Type | Document (0), Media (1), Executable (2) |
| Installation Time | Working Hours (0), Non-Working Hours (1) |
| User Permission | Normal (0), Admin (1) |
| Location | Office (0), Remote (1) |
| Transfer Time | Working Hours (0), Non-Working Hours (1) |
| File Size | Small (0), Medium (1), Large (2) |
| Destination | Internal (0), External Trusted (1), External Untrusted (2) |
| Visit Time | Working Hours (0), Non-Working Hours (1) |
| Website Category | Social Media (0), Shopping (1), News (2), Gaming (3) |
| Entry Time | Working Hours (0), Non-Working Hours (1) |
| Location Accessed | Office (0), Remote (1) |
| Badge Type | Visitor (0), Employee (1), Contractor (2) |
| Attack Type | Phishing (0), Baiting (1), Pretexting (2) |
| Response Time | Working Hours (0), Non-Working Hours (1) |
| Sensitivity | No Response (0), Minimal Disclosure (1), Sensitive Disclosure (2) |
| Incident Type | Low Risk (0), Medium Risk (1), High Risk (2) |
| User Involvement | None (0), Indirect (1), Direct (2) |
| Incident Severity | Low (0), Medium (1), High (2) |
| Sharing Time | Working Hours (0), Non-Working Hours (1) |
| Information Type | Personal (0), Professional (1), Sensitive (2) |
| Platform Location | Internal (0), External Public (1), External Private (2) |
| Interaction Time | Working Hours (0), Non-Working Hours (1) |
| Malicious Flag | Not Malicious (0), Malicious (1) |
| User Reaction | None (0), Minimal (1), Full (2) |
| Change Type | Behavioral (0), Habitual (1), Sudden (2) |
| Time of Change | Working Hours (0), Non-Working Hours (1) |
| User Motivation | Work Related (0), Personal (1), Suspicious (2) |
| Protocol Type | HTTP (0), HTTPS (1), FTP (2), SMTP (3) |
| User Awareness | Fully Aware (0), Partially Aware (1), Unaware (2) |
| Practice Type | Weak Passwords (0), Sharing Credentials (1), Lack of Updates (2) |
| Severity | Low (0), Medium (1), High (2) |
| Upload Time | Working Hours (0), Non-Working Hours (1) |
| Encryption Status | Not Encrypted (0), Encrypted (1) |
| Send Time | Working Hours (0), Non-Working Hours (1) |
| Insert Time | Working Hours (0), Non-Working Hours (1) |
| Print Time | Working Hours (0), Non-Working Hours (1) |
| Document Type | Personal (0), Official (1), Confidential (2) |
| Metric | Logistic Regression (LR) | Random Forest (RF) | XGBoost | Support Vector Machine (SVM) |
|---|---|---|---|---|
| Accuracy | 0.99 | 0.99 | 1.00 | 0.99 |
| Precision | 0.996 | 0.986 | 1.00 | 0.996 |
| Recall | 0.996 | 0.986 | 1.00 | 0.996 |
| F1-score | 0.996 | 0.986 | 1.00 | 1.00 |
| Metric | Logistic Regression (LR) | Random Forest (RF) | XGBoost | Support Vector Machine (SVM) |
|---|---|---|---|---|
| Detection (s) | 0.014 | 0.15 | 0.056 | 0.046 |
| Classification (s) | 0.071 | 0.34 | 0.102 | 0.1051 |
| Study | Logistic Regression (LR) | Study Date | |||||
|---|---|---|---|---|---|---|---|
| Accuracy | Recall | Precision | F1-Score | Detection (s) | Classification (s) | ||
| [29] | 0.97 | 0.97 | 0.98 | 0.97 | N/D | N/D | 2021 |
| [35] | 0.93 | 0.961 | 0.912 | 0.936 | N/D | N/D | 2024 |
| [36] | 0.90 | 0.25 | 0.24 | 0.24 | N/D | N/D | 2023 |
| [52] | 0.913 | 0.91 | 0.91 | 0.90 | N/D | N/D | 2020 |
| [53] | 0.80 | 0.86 | 0.81 | 0.83 | N/D | N/D | 2020 |
| [56] | 0.70 | N/D | 0.90 | 0.54 | N/D | N/D | 2021 |
| [57] | 0.946 | 0.973 | 0.969 | 0.971 | N/D | N/D | 2022 |
| Ours | 0.99 | 0.996 | 0.996 | 0.996 | 0.014 | 0.071 | N/A |
| Study | Random Forest (RF) | ||||||
| [29] | 0.99 | 0.99 | 0.99 | 0.99 | N/D | N/D | 2021 |
| [35] | 0.993 | 0.996 | 0.992 | 0.994 | N/D | N/D | 2024 |
| [36] | 0.99 | 0.97 | 0.97 | 0.97 | N/D | N/D | 2023 |
| [52] | 0.996 | 1.00 | 1.00 | 1.00 | N/D | N/D | 2020 |
| [53] | 0.83 | 0.91 | 0.81 | 0.86 | N/D | N/D | 2020 |
| [56] | 0.87 | N/D | 0.98 | 0.84 | N/D | N/D | 2021 |
| [57] | N/D | N/D | N/D | N/D | N/D | N/D | 2022 |
| Ours | 0.99 | 0.986 | 0.986 | 0.986 | 0.15 | 0.34 | N/A |
| Study | XGBoost | ||||||
| [29] | N/D | N/D | N/D | N/D | N/D | N/D | 2021 |
| [35] | 0.993 | 0.995 | 0.992 | 0.994 | N/D | N/D | 2024 |
| [36] | N/D | N/D | N/D | N/D | N/D | N/D | 2023 |
| [52] | 0.992 | 0.99 | 0.99 | 0.99 | N/D | N/D | 2020 |
| [53] | N/D | N/D | N/D | N/D | N/D | N/D | 2020 |
| [56] | N/D | N/D | N/D | N/D | N/D | N/D | 2021 |
| [57] | 0.999 | 0.999 | 0.999 | 0.999 | N/D | N/D | 2022 |
| Ours | 1.00 | 1.00 | 1.00 | 1.00 | 0.056 | 0.102 | N/A |
| Study | Support Vector Machine (SVM) | ||||||
| [29] | 0.97 | 0.97 | 0.98 | 0.98 | N/D | N/D | 2021 |
| [35] | 0.969 | 0.982 | 0.96 | 0.971 | N/D | N/D | 2024 |
| [36] | 0.70 | 0.14 | 0.14 | 0.14 | N/D | N/D | 2023 |
| [52] | 0.874 | 0.87 | 0.76 | 0.82 | N/D | N/D | 2020 |
| [53] | 0.84 | 0.86 | 0.87 | 0.87 | N/D | N/D | 2020 |
| [56] | N/D | N/D | N/D | N/D | N/D | N/D | 2021 |
| [57] | 0.786 | 0.896 | 0.722 | 0.80 | N/D | N/D | 2022 |
| Ours | 0.99 | 0.996 | 0.996 | 1.00 | 0.046 | 0.1051 | N/A |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).