Version 1
: Received: 4 January 2024 / Approved: 4 January 2024 / Online: 5 January 2024 (02:09:06 CET)
How to cite:
Fengrui, Y.; Du, Y. Cyber Threat Intelligence-Driven ATT&CK Recognition based on Large Language Models. Preprints2024, 2024010372. https://doi.org/10.20944/preprints202401.0372.v1
Fengrui, Y.; Du, Y. Cyber Threat Intelligence-Driven ATT&CK Recognition based on Large Language Models. Preprints 2024, 2024010372. https://doi.org/10.20944/preprints202401.0372.v1
Fengrui, Y.; Du, Y. Cyber Threat Intelligence-Driven ATT&CK Recognition based on Large Language Models. Preprints2024, 2024010372. https://doi.org/10.20944/preprints202401.0372.v1
APA Style
Fengrui, Y., & Du, Y. (2024). Cyber Threat Intelligence-Driven ATT&CK Recognition based on Large Language Models. Preprints. https://doi.org/10.20944/preprints202401.0372.v1
Chicago/Turabian Style
Fengrui, Y. and Yanhui Du. 2024 "Cyber Threat Intelligence-Driven ATT&CK Recognition based on Large Language Models" Preprints. https://doi.org/10.20944/preprints202401.0372.v1
Abstract
Tactics, Techniques, and Procedures (TTPs) constitute the most valuable aspect of Cyber Threat Intelligence (CTI). However, TTPs are often implicit in unstructured text, necessitating manual analysis by field experts. Automating the classification of TTPs from unstructured text is a crucial task in contemporary research. MITRE ATT&CK serves as the de facto standard for studying TTPs. Existing research constructs classification datasets based on its procedural examples for tactics and techniques. However, due to a significant proportion of small sample categories, a long-tail phenomenon exists, leading to a highly imbalanced sample distribution. Consequently, more research concentrates on categories with relatively abundant samples. This paper proposes a method that combines ChatGPT data augmentation with Instruction Supervised Fine-Tuning of open large language models. This approach offers a solution for TTPs classification in few-shot learning scenarios, achieving coverage of 625 technical categories. The Precision, Recall, and F1 scores reach 86.2%, 89.9%, and 87.3%, respectively.
Keywords
cyber threat intelligence(CTI); TTPs; ATT&CK; data augmentation; large language models(LLMs); supervised fine-tuning(SFT)
Subject
Computer Science and Mathematics, Computer Networks and Communications
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.