Submitted:
20 September 2024
Posted:
23 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- To detect critical malicious behavior, we extended the APT alert dataset with more alert types using Splunk.
- To reduce false positive APT alarms, we re-engineered existing processes and developed an advanced learning framework with an accuracy of 99.6%.
- To automate the incident response process by reducing manual intervention in the post-alert decision using ML techniques.
2. Preliminaries and Related Literature
2.1. Overview of Security Information and Event Management
- Log Collection: Network Devices like switches, hosts, firewalls, etc, generate log data. There are two methods defined for log data collection i.e., agent-based and agent-less [14]. In the agent-based method, an intermediary agent collects log data and then forwards it to the next module. Similarly, in the Agent-less method, there is no intermediate agent and the server gets data directly from the servers.
- Log Aggregation: It is a procedure for gathering, analyzing, and capturing structured data from the log data generated by network devices. In the literature [14], two methods are used for log data collection i.e., push and pull. i) Push: The network devices like switches, firewalls, etc, send logs to the servers. ii) Pull: The servers drag the log data from the network devices.
- Parsing: It is a procedure for transforming the log data into structured data. In a network environment, multiple parsers are used to transform log data received from multiple devices like firewalls, server switches, etc.
- Normalization: It receives log data from multiple sources and converts it into one standard reduced format. It removes the redundant logs from the log data.
- Threat analysis: SIEM identifies adversaries by comparing log data with the records in a known threat database. During threat analysis, SIEM uses statistical methods to identify different threat patterns.
- Response: SIEM is a very efficient tool because it can detect malicious activity quickly before it can cause serious damage. It has built-in pre-defined real-time alerts and a notification system.
- Reporting: SIEM tools contain different custom reporting templates for data visualization. Moreover, the data collected by SIEM is available to data analysts and experts for investigation.
2.1.1. SIEM’s Architecture and Components
2.1.2. Splunk: Implementations of SIEM
2.2. Introduction to Machine Learning
2.3. Random Forest
2.4. XGBoost
2.5. Related Works
3. Overview of the Proposed Framework
3.1. Notations
3.2. Event Logging
3.3. Event Detection Module
3.3.1. Unusual Network Traffic (UNT) Behavior Detection:
3.3.2. Unusual System Behavior Detection:
3.3.3. Unusual User Behavior Detection:
3.3.4. File Behavior Detection:
3.3.5. Malicious IP Detection:
3.3.6. Malicious Domain Detection:
3.3.7. Alert ()
3.3.8. Alert Set ()
3.3.9. APT Life Cycle Graph ()
3.3.10. APT Attack Sequence ()
3.3.11. APT Incomplete Attack Sequence ()
3.3.12. APT Incomplete Attack Sequence
3.3.13. Event Detection Function
3.3.14. Implementation of Malicious Behavior Detection Algorithm
| Algorithm 1 Malicious Behavior Detection |
|
3.4. Alert Correlation Module
3.4.1. Alert Function
3.4.2. Alert Correlation Function
3.4.3. Implementation of Alerts Correlation Algorithm
| Algorithm 2 Alerts correlation Algorithm |
|
3.4.4. APT Reporting
4. Experimental Setup and Results Evaluation
4.1. System Setup
4.2. Dataset Collection
4.3. Data Pre-Processing and Encoding
4.4. Results and Discussion
4.5. Comparisons among Proposed and State-of-the-Art Framework
5. Conclusion
Author Contributions
Acknowledgments
Conflicts of Interest
References
- Annual Cyber security Report. https://www.cisco.com/c/m/en_au/products/security/offers/annual-cybersecurity-report-2017.htm. [Accessed 10-03-2024].
- Schlette, D.; Caselli, M.; Pernul, G. A comparative study on cyber threat intelligence: The security incident response perspective. IEEE Communications Surveys & Tutorials 2021, 23, 2525–2556. [Google Scholar]
- Bromiley, M. SANS 2019 incident response (IR) survey: It’s time for a change. Technical report, Technical Report. SANS Institute, 2019.
- Cichonski, P.; Millar, T.; Grance, T.; Scarfone, K.; others. Computer security incident handling guide. NIST Special Publication 2012, 800, 1–147. [Google Scholar]
- Morgus, R.; Skierka, I.; Hohmann, M.; Maurer, T. National CSIRTs and their role in computer security incident response; New America., 2022.
- Ahmad, A.; Hadgkiss, J.; Ruighaver, A.B. Incident response teams–Challenges in supporting the organisational security function. Computers & Security 2012, 31, 643–652. [Google Scholar]
- González-Granadillo, G.; González-Zarzosa, S.; Diaz, R. Security information and event management (SIEM): analysis, trends, and usage in critical infrastructures. Sensors 2021, 21, 4759. [Google Scholar] [CrossRef] [PubMed]
- Splunk. https://www.splunk.com. [Accessed 10-03-2024].
- QRadar. https://www.ibm.com/qradar. [Accessed 10-03-2024].
- Islam, C.; Babar, M.A.; Croft, R.; Janicke, H. SmartValidator: A framework for automatic identification and classification of cyber threat data. Journal of Network and Computer Applications 2022, 202, 103370. [Google Scholar] [CrossRef]
- Ghafir, I.; Hammoudeh, M.; Prenosil, V.; Han, L.; Hegarty, R.; Rabie, K.; Aparicio-Navarro, F.J. Detection of advanced persistent threat using machine-learning correlation analysis. Future Generation Computer Systems 2018, 89, 349–359. [Google Scholar] [CrossRef]
- Williams, A.; Nicolett, M. Improve it security with vulnerability management. Gartner ID 2005. [Google Scholar]
- Miller, D.R.; Harris, S.; Harper, A.; VanDyke, S.; Blask, C. Security information and event management (SIEM) implementation; McGraw Hill Professional, 2010.
- Sheeraz, M.; Paracha, M.A.; Haque, M.U.; Durad, M.H.; Mohsin, S.M.; Band, S.S.; Mosavi, A. Effective security monitoring using efficient SIEM architecture. Hum.-Centric Comput. Inf. Sci 2023, 13, 1–18. [Google Scholar]
- Gartner Magic Quadrant for Security Information and Event Management. https://www.gartner.com/en/documents/4019750?ref=null. [Accessed 25-02-2024].
- Hwoij, A.; Khamaiseh, A.h.; Ababneh, M. SIEM architecture for the Internet of Things and smart city. International Conference on Data Science, E-learning and Information Systems 2021, 2021, pp. 147–152. [Google Scholar]
- Hristov, M.; Nenova, M.; Iliev, G.; Avresky, D. Integration of Splunk Enterprise SIEM for DDoS Attack Detection in IoT. 2021 IEEE 20th International Symposium on Network Computing and Applications (NCA). IEEE, 2021, pp. 1–5.
- Balaji, N.; Pai, B.K.; Bhat, B.; Praveen, B. Data visualization in splunk and Tableau: a case study demonstration. Journal of Physics: Conference Series. IOP Publishing, 2021, Vol. 1767, p. 012008.
- Dunsin, D.; Ghanem, M.C.; Ouazzane, K.; Vassilev, V. A comprehensive analysis of the role of artificial intelligence and machine learning in modern digital forensics and incident response. Forensic Science International: Digital Investigation 2024, 48, 301675. [Google Scholar] [CrossRef]
- ElSahly, O.; Abdelfatah, A. An incident detection model using random forest classifier. Smart Cities 2023, 6, 1786–1813. [Google Scholar] [CrossRef]
- Pothumani, P.; Reddy, E.S. Original Research Article Network intrusion detection using ensemble weighted voting classifier based honeypot framework. Journal of Autonomous Intelligence 2024, 7. [Google Scholar] [CrossRef]
- Ibrahim, W.N.H.; Anuar, S.; Selamat, A.; Krejcar, O.; Crespo, R.G.; Herrera-Viedma, E.; Fujita, H. Multilayer framework for botnet detection using machine learning algorithms. IEEE Access 2021, 9, 48753–48768. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Reddi, V.J. Deep reinforcement learning for cyber security. IEEE Transactions on Neural Networks and Learning Systems 2021, 34, 3779–3795. [Google Scholar] [CrossRef] [PubMed]
- Almasoud, A.S. Enhanced Metaheuristics with Machine Learning Enabled Cyberattack Detection Model. Intelligent Automation & Soft Computing 2023, 37. [Google Scholar]
- Sadia, H.; Farhan, S.; Haq, Y.U.; Sana, R.; Mahmood, T.; Bahaj, S.A.O.; Rehman, A. Intrusion Detection System for Wireless Sensor Networks: A Machine Learning based Approach. IEEE Access 2024. [Google Scholar] [CrossRef]
- Brogi, G.; Tong, V.V.T. Terminaptor: Highlighting advanced persistent threats through information flow tracking. 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS). IEEE, 2016, pp. 1–5.
- Giura, P.; Wang, W. A context-based detection framework for advanced persistent threats. 2012 International Conference on Cyber Security. IEEE, 2012, pp. 69–74.
- Sopan, A.; Berninger, M.; Mulakaluri, M.; Katakam, R. Building a Machine Learning Model for the SOC, by the Input from the SOC, and Analyzing it for the SOC. 2018 IEEE Symposium on Visualization for Cyber Security (VizSec). IEEE, 2018, pp. 1–8.
- Farooq, H.M.; Otaibi, N.M. Optimal machine learning algorithms for cyber threat detection. 2018 UKSim-AMSS 20th International Conference on Computer Modelling and Simulation (UKSim). IEEE, 2018, pp. 32–37.
- Sethi, K.; Sai Rupesh, E.; Kumar, R.; Bera, P.; Venu Madhav, Y. A context-aware robust intrusion detection system: a reinforcement learning-based approach. International Journal of Information Security 2020, 19, 657–678. [Google Scholar] [CrossRef]
- Nilă, C.; Apostol, I.; Patriciu, V. Machine learning approach to quick incident response. 2020 13th International Conference on Communications (COMM). IEEE, 2020, pp. 291–296.
- Chandra, J.V.; Challa, N.; Pasupuleti, S.K. A practical approach to E-mail spam filters to protect data from advanced persistent threat. 2016 international conference on circuit, power and computing technologies (ICCPCT). IEEE, 2016, pp. 1–5.
- Saini, N.; Bhat Kasaragod, V.; Prakasha, K.; Das, A.K. A hybrid ensemble machine learning model for detecting APT attacks based on network behavior anomaly detection. Concurrency and Computation: Practice and Experience 2023, 35, e7865. [Google Scholar] [CrossRef]











| References | APT Life Cycle Detection | AI Models | Lowest FPR | Highest Accuracy | Limitations |
|---|---|---|---|---|---|
| [11] | Complete | Decision Tree, SVM, KNN,Ensemble Classifier | 4.5% | 84.8% | The blacklist based detection modules require continuous update. Moreover, the accuracy is lower. |
| [26] | Complete | No ML model used | High | Not calculated | Information Flow Tracking is used to detect APTs, however, the FPR is high. |
| [27] | Complete | No ML model used | 27.88% | Not calculated | sufficient knowledge is required to setup the mechanism. |
| [28] | No | Random Forest | Not calculated | 98.5% | The post-alert decision is not automated, it involves the intervention of security experts. |
| [29] | Partial | SVM, Random Forest | Not calculated | Not calculated | The process anomaly detection is presented formally using One-Class SVM, However, the paper lacks ML based experimental results. |
| [30] | No | Deep Q-network | 0.35% | 96.12 | The proposed model enable to detect APTs. |
| [31] | No | ZeroR, OneR, NaiveBayes, SVM, J48, RandomForest | Not calculated | 95.5% | The authors did not discuss the FPR of the proposed model. |
| [32] | Partial | NaiveBayes | Not calculated | 87% | The authors did not discuss the FPR of the proposed model. |
| [33] | Partial | Random Forest, XGBoost | 0.12% | 99.91% | This study used the existing datasets poses a challenge, because these datasets might not contain the latest attack scenarios. |
| Notation | Description |
|---|---|
| jth host IP address. | |
| ith user account. | |
| kth event, recorded in the log file. | |
| alert generated by ith infected host. | |
| Malicious Domain. | |
| Malicious IP address. | |
| Malicious DNS. | |
| Malicious SSL Certificate. | |
| Malicious Execution File. | |
| Unusual File Transfer. | |
| Malicious Hash. | |
| Various failed attempt before successful login. | |
| Sensitive Data Access. | |
| Change User Privileges | |
| Network Scan | |
| Unusual Network Traffic | |
| Tor connection with the remote server | |
| Unusual User Behavior | |
| Set of Malicious Domain. | |
| Set of Malicious IP address. | |
| Set of Unusual User Behavior | |
| Set of Unusual File Behavior | |
| Set of Unusual Network Traffic | |
| Set of Unusual System Behavior | |
| Source IP address | |
| Destination IP address | |
| Source Port Number | |
| Destination Port Number |
| index=* source="C:\\firewall.log" |
| ((S_Port NOT IN ("21", "22", "80", "443", "445", "135")) OR |
| (proto NOT IN ("TCP", "UDP")) AND |
| D_IP IN (compromised_IPs.txt)) |
| | stats count by S_IP, D_IP, D_Port, proto |
| | where count > 5 |
| | sort - count |
| | head 10 |
| index=* source="C:\\xampp\\FileZillaFTP\\Logs\\FileZilla |
| Server.log" |
| | stats count(eval(login=530)) as failures count(eval(login=230)) |
| as successes by host |
| | where (successes=<1 AND failures >=5) |
| index=* source="Microsoft-windows-sysmon/operational" |
| (Eventcode IN ("11", "23", "29") AND host="fileserver") |
| | table host file_creation_time location |
| | sort _time |
| APT Stages | Unusual Behavior Events | Alerts | Alert Description |
|---|---|---|---|
| S1: Initial Compromise | Malicious Domain | Malicious Domain Alert | |
| Malicious File Execution | Malicious File Execution alert | ||
| Malicious Hashes | Malicious Hash Alert | ||
| Malicious DNS | Alert generated in response of a malicious DNS detection | ||
| Successful login after several failed attempts | Alert generated in response of illegal login attempts | ||
| S2: C and C Communication | Malicious IP addresses | Alert generated when malicious IP is detected | |
| Malicious SSL certificate | Alert generated when malicious SSL certificate is detected | ||
| S3: Asset Discovery | Network scanning | Network ports scanning | |
| Assessing sensitive data | Illegal access to sensitive data | ||
| Change to user privileges | Alert generated when there is a change in user privileges | ||
| S4: Data Ex-filtration | Unusual Network Traffic | Alert generated due to huge increase in network traffic | |
| Unexpected System Errors | Alert generated due to system errors | ||
| Tor connection with C and C server | Alert generated when tor connection is detected | ||
| File transfer, creation, or deletion | Alert generated due to file transfer, deletion, or creation |
| Notation | Description | ||||
|---|---|---|---|---|---|
| Classifiers | Acc | TPR | TNR | FNR | FPR |
| Random Forest | 98.54 | 98.55 | 98.53 | 1.44 | 1.46 |
| XGBoost | 99.6 | 98.55 | 1.0 | 1.44 | 0.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).