Submitted:
22 May 2026
Posted:
29 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Background and Necessity
1.2. Research Objectives and Contributions
2. Background
2.1. Research on Automated Incident Analysis
2.2. LLM-Based Security Analysis
2.3. Limitations of LLM Utilization in Closed-Network Environments
2.4. Research on Utilization in Local LLM and Closed-Network Environments
2.5. Research on Evaluation Metrics for Automated Incident Analysis
3. Local LLM-Based Incident Analysis Framework
3.1. External Knowledge Layer
| Analysis Stage | Function in Knowledge Distillation | Practical Example in Training Dataset |
|---|---|---|
| Stage 1: Evidence Identification |
Extracts specific indicators of compromise (IoC) or suspicious artifacts from raw logs. | Identified suspicious use of certutil.exe with the -url cacheflag in the command line. |
| Stage 2: Threat Hypothesis |
Formulates a potential attack scenario based on identified evidence. |
Hypothesis: The attacker is attempting ‘Ingress Tool Transfer’ to download secondary malware. |
| Stage 3: Technical Verification |
Cross-references related logs to validate the hypothesis. |
Verified outgoing network connection to an external IP immediately following the process execution. |
| Stage 4: Final Determination |
Finalizes the mapping to a standardized framework. | Conclusion: Mapping to MITRE ATT&CK T1105 |
3.2. Internal Learning & Model Layer
3.3. AI Agent Orchestration Layer
| Correlation Type |
Fragmented Clues (Input Telemetry) |
Reconstructed Threat Scenario (Inference) |
MITRE ATT&CK Mapping (Output) |
|---|---|---|---|
| Temporal (Time-series) |
ㆍ w3wp.exespawns cmd.exe ㆍ10 mins later: net.exeused to scan network. |
The attacker exploited a web vulnerability to gain a shell and performed internal reconnaissance. | ㆍT1190: Exploit Public-Facing Application ㆍT1018: Remote System Dis covery |
| Spatial (Cross-asset) |
ㆍFailed logins on Host A. ㆍSuccessful login on Host B from Host A via RDP. |
The attacker attempted brute-force on Host A and successfully moved laterally to Host B. | ㆍT1110: Brute Force ㆍT1021.001: Remote Desktop Protocol |
| Behavioral (Causal Chain) |
ㆍPowerShell with Base64 string. ㆍOutbound connection to unknown IP. |
An obfuscated script was executed to establish a Command and Control (C2) channel. | ㆍT1059.001: PowerShell ㆍT1132: Data Encoding |
| Objective (Final Goal) |
ㆍAccessing ntds.ditfile. ㆍCompressed file created in \Temp. |
The attacker is attempting to dump credentials for full domain takeover. | ㆍT1003.003: NTDS Dumping ㆍT1560: Archive Collected Data |
4. Experiment and Results Analysis
4.1. Experimental Environment
| Category | Specification | |
|---|---|---|
| Hardware | CPU | Intel Xeon Gold 6226R @ 2.90GHz |
| GPU | NVIDIA A100-PCIE-80GB (1ea) | |
| RAM/Storage | 256 GB DDR4/2TB NVMe SSD | |
| Software | OS/Language | Ubuntu 22.04 LTS/Python 3.10 |
| Framework | PyTorch 2.1.2 + CUDA 11.8 | |
| Libraries | Transformers 4.36.0, PEFT 0.7.0, Accelerate 0.25.0 | |
| Inference | Precision | FP16 (Half-Precision) |
4.2. Dataset Composition
4.2.1. Scenario Reconstruction: Hierarchical Design Based on Analysis Depth
| Tier | Scenario Type | Fragmented Clues (Log Inputs) |
Target Inference Result (Chain of Thought) |
|---|---|---|---|
| Single |
Credential Access | Log A:procdump.exe -ma lsass.exe lsass.dmp |
Identify:T1003.001 (LSASS Dumping). Analysis:Technical risk of memory dumping for password extraction. |
| Single |
Persistence |
Log A:net user /add attacker_account /password123 |
Identify:T1136.001 (Local Account). Analysis:Creation of a backdoor account for persistent access. |
| Multi | Full Intrusion Chain |
Log A: powershell.exe -enc ... Log B: mstsc.exe /v:10.0.1.50 Log C: vssadmin.exe delete shadows |
Inference: 1. Initial C2 establishment (A) 2. Lateral jump to critical asset (B) 3. Ransomware preparation by inhibiting recovery (C). Conclusion: Macro-level scenario reconstruction of an active Ransomware attack. |
| Multi | Information Stealing | Log A:reg.exe add HKCU\Software\... Log B:net view /all Log C:7z.exe a data.zip C:\Confidential |
Inference: 1. Establishing persistence (A) 2. Internal reconnaissance (B) 3. Staging for data exfiltration (C). Conclusion:Comprehensive mapping of an Information Stealing objective. |
4.2.2. Dataset Partitioning and Learning Contamination Prevention Strategy
4.3. Performance Comparison by Baseline Model
4.4. Definition of Evaluation Metrics
4.5. Experimental Results Analysis and Resource Efficiency Evaluation
4.5.1. Comprehensive Performance Comparison Analysis
4.5.2. Resource Efficiency and Closed-Network Practicality Analysis
| Model | VRAM Usage (GB) | Throughput (tok/sec) | Avg. Power (W) | Inference Time (s/case) |
|---|---|---|---|---|
| Llama-3-8B-Instruct | 15.2 | 52.4 | 148 | 12.5 |
| Mistral-7B-Instruct-v0.3 | 14.5 | 55.1 | 142 | 11.8 |
| Qwen2.5-7B-Instruct | 14.8 | 54.8 | 145 | 11.9 |
| DeepSeek-R1 | 15.4 | 48.2 | 155 | 13.6 |
| Distilled Llama-3-8B (Ours) | 15.3 | 51.8 | 150 | 12.7 |
4.5.3. Ablation Study on Framework Components
| Configuration | Detection Accuracy (%) | ATT&CK Mapping (F1) | Hallucination Rate (%) |
|---|---|---|---|
| Full Framework (Proposed) | 88.4 | 0.91 | 6.2 |
| w/o AI Agent (Direct) | 74.2 | 0.72 | 18.5 |
| w/o Verification Layer | 86.1 | 0.88 | 14.1 |
| w/o Classification (No-Filtering) |
81.5 | 0.83 | 9.8 |
5. In-Depth Verification via Hierarchical Threat Scenarios
5.1. Verification of Technical Identification via Single-Step Scenarios
| Case ID | Targeted Technique (MITRE ATT&CK) | Input Source (Raw Telemetry) | Extracted Critical Artifacts | Inference Result (Mapping) | Accuracy |
|---|---|---|---|---|---|
| S-01 | T1003.001(LSASS Memory Dump) | Sysmon Event ID 10 (ProcessAccess) | Target: lsass.exe, Source: procdump.exe, AccessMask: 0x1ff1ff | OS Credential Dumping | 100% |
| S-02 | T1543.003(Windows Service) | Sysmon Event ID 1 (ProcessCreate) | Image: sc.exe, Command: create, binPath=, start=auto | Create or Modify System Process | 100% |
| S-03 | T1059.001(PowerShell) | Sysmon Event ID 1 (ProcessCreate) | Image: powershell.exe, Arguments: -enc, -nop, -w hidden | Command and Scripting Interpreter | 100% |
| S-04 | T1070.004(File Deletion) | Sysmon Event ID 26 (FileDelete) | : C:\Windows\System32\winevt\Logs\*, Command: wevtutil clTarget | Indicator Removal on Host | 100% |
| S-05 | T1547.001(Registry Run Keys) | Sysmon Event ID 13 (RegistryValueSet) | Target: ...\CurrentVersion\Run, Details: malicious.exe | Boot or Logon Autostart Execution | 100% |
5.2. Verification of Multi-Stage Scenario-Based Contextual Inference
5.3. Hallucination Suppression and Factual Accuracy Verification
6. Conclusions
6.1. Research Summary and Implications
6.2. Significance of Research Findings
6.3. Limitations and Future Research Directions
Author Contributions
Acknowledgments
References
- CISA; NSA; FBI. Identifying and mitigating living off the land techniques. Cybersecurity Advisory, 2023. [Google Scholar]
- Microsoft Threat Intelligence. Iran surge in cyber-enabled influence operations. Microsoft Digit. Def. Rep. 2024. [Google Scholar]
- Liu, Y.; Han, T.; Ma, S.; Zhang, J.; Yang, Y.; Tian, J.; et al. Summary of ChatGPT-related research and perspective towards the future of large language models. Meta-Radiology 2023, 1(1). [Google Scholar] [CrossRef]
- Stouffer, K.; Pease, N.; Tang, C.; Zimmerman, R.; Lightman, S. NIST Special Publication 800-82r3; Guide to operational technology (OT) security. 2023.
- Al-Abassi, A.; Karimibiermann, H. A deep learning-based framework for detecting intrusion in industrial control systems. Alex. Eng. J. 2020. [Google Scholar]
- Shaukat, K.; Luo, S.; Varadharajan, V. A novel deep learning-based intrusion detection system for mobile devices The cognitive load of the security analyst. In Alexandria Engineering Journal;Comput Secur.; D’Amico, S., Whitley, K., Tesone, D., Morrissey, B., Roth, R., Eds.; 2021; Volume 125, p. 103023. [Google Scholar]
- Ferrag, M.; Tihanyi, N.; Hamadouche, L.; Rezvy, S.; Debbah, M. Secure-LLM: a system for cybersecurity utilizing LLMs. arXiv 2023, arXiv:2305.15175. [Google Scholar]
- Microsoft Security. Microsoft Security Copilot: AI-powered incident response. 2023. [Google Scholar]
- Liu, H.; Ning, R.; Teng, Z.; Liu, J.; Zhou, Q.; Yue, S. Evaluating the logical reasoning ability of ChatGPT and GPT-4. arXiv 2023, arXiv:2304.03439. [Google Scholar] [CrossRef]
- Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, J.; Xu, Y.; et al. Survey of hallucination in natural language generation. ACM Comput Surv. 2023, 55(12), 1–38. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; et al. Llama: open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Kryscinski, W.; McCann, B.; Braverman, Y.; Socher, R. Evaluating the factual consistency of abstractive text summarization (FactCC). Proceedings of EMNLP, 2020. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Maeda, M.; Edakov, D.; Ku, H.; et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process Syst. 2022, 35, 24824–37. [Google Scholar]
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; et al. ReAct: synergizing reasoning and acting in language models. In Proceedings of the International Conference on Learning Representations (ICLR), 2023. [Google Scholar]
- Dubey, A.; Jauhri, G.; Pandey, A.; Kadian, A.; Al-Dahle, W.; Letman, J.; et al. The Llama 3 herd of models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
- Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; Casas de las, D.G.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825. [Google Scholar] [CrossRef]
- Qwen Team. Qwen2.5 technical report. arXiv 2024, arXiv:2412.15115. [Google Scholar]
- DeepSeek-AI. DeepSeek-V2: a strong, economical, and efficient mixture-of-experts language model. arXiv 2024, arXiv:2405.04434. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Xi, Z.; Chen, W.; Guo, X.; He, W.; Ding, Y.; Hong, B.; et al. The rise and potential of large language model based agents: a survey. arXiv 2023, arXiv:2309.07864. [Google Scholar] [CrossRef]
- Lin, C.Y. ROUGE: a package for automatic evaluation of summaries. Proceedings of Text Summarization Branches Out (ACL), 2004. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: evaluating text generation with BERT. In Proceedings of the International Conference on Learning Representations (ICLR), 2020. [Google Scholar]
- Strom, B.E.; Applebaum, A.; Miller, D.P.; Nickels, K.C.; Pennington, A.G.; Thomas, C.B. MITRE ATT&CK: Design and philosophy; Technical Report; The MITRE Corporation, 2018. [Google Scholar]
- OpenAI. GPT-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]



| Category | Large-scale LLMs | Local LLMs (7B–8B) |
|---|---|---|
| Parameters | 70B ~ 400B+ | 7B ~ 8B |
| GPU Requirements | 10–100 GPUs | 1 ~ 2 GPUs |
| Memory Requirements | Hundreds of GB | 40 to 80 GB |
| Deployment Environment | Cloud | On-premise Server |
| External Data Transfer | Required | Not Required |
| Air-gapped Operation | Impossible | Supported |
| Operating Cost | Very High | Relatively Low |
| Category | Item | Sanitization & Masking Policy | Rationale |
|---|---|---|---|
| Network | IP Address | Replaced with INT_IP_[N]or EXT_IP_[N] | Preserve traffic directionality while masking topology. |
| Identity | Username | Generalized to SEC_USER, ADMIN_USER, etc. | Protect PII while maintaining privilege context. |
| System | Hostname | Masked as ENDPOINT_[N] | Maintain multi-stage lateral movement context. |
| Path | File Directory |
Generalized to C:\Users\MASKED_USER\... | Protect user privacy during forensic analysis. |
| Stage | Process Name | Technical Specification & Rationale |
|---|---|---|
| Stage 1: Data Acquisition |
Reasoning Triplets |
ㆍIntegration of [Log+CoT+Label] with hash-based integrity checks. ㆍPurpose: Establishing a ground-truth dataset for causal reasoning. |
| Stage 2: Model Adaptation |
LoRA Configuration |
ㆍParameter-efficient updates (r=16, α=32) on attention layers. ㆍPurpose: Mitigating VRAM constraints while preventing knowledge drift. |
| Stage 3: Optimization |
Weight Internalization |
ㆍCross-entropy loss minimization of the reasoning path. ㆍPurpose: Enabling standalone expert-level inference in air-gapped zones. |
| Split | Scenarios | Ratio | Usage Purpose |
|---|---|---|---|
| Total | 3,500 | 100% | - |
| Training | 2,500 | 70% | Generating CoT knowledge via Teacher model & Student model training |
| Validation | 500 | 15% | Hyperparameter tuning & Early stopping decision |
| Test |
500 |
15% |
Final performance evaluation (Completely held-out from training) |
| Model | Parameters | Context Window | Key Characteristics |
|---|---|---|---|
| Llama-3-8B-Instruct | 8B | 8K/128K | General-purpose instruction following & balanced performance |
| Mistral-7B-Instruct-v0.3 | 7B | 32K | Efficient long-context processing & lightweight architecture |
| Qwen2.5-7B-Instruct | 7B | 32K/128K | Specialized in code analysis & mathematical reasoning |
| DeepSeek-R1 | 8B | 8K~128K | Advanced reasoning capabilities based on Reinforcement Learning |
| Distilled Llama-3-8B (Ours) | 8B | 8K~128K | Security domain-specific knowledge distillation & agent optimization |
| Model | Detection Accuracy |
ATT&CK Mapping Score |
Script Analysis Score |
Hallucination Rate |
|---|---|---|---|---|
| Llama-3-8B-Instruct | 71.5% | 0.68 | 72.0% | 28.4% |
| Mistral-7B-Instruct-v0.3 | 69.8% | 0.65 | 68.5% | 31.2% |
| Qwen2.5-7B-Instruct | 76.2% | 0.74 | 85.4% | 19.8% |
| DeepSeek-R1 | 79.5% | 0.78 | 83.1% | 15.6% |
| Distilled Llama-3-8B (Ours) | 88.4% | 0.91 | 84.8% | 6.2% |
| Attack Stage | Log Source (Event ID) |
Key Correlators (Pivot) |
Logical Inference (Contextual Link) |
Result |
|---|---|---|---|---|
| Phase 1: PrivEsc |
Sysmon 1 / 10 | User: Admin-Svc/ PID: 4820 | Identification of token manipulation for privilege gain | Linked |
| Phase 2: Discovery |
Sysmon 1 (net.exe) | User: Admin-Svc/ Parent: 4820 | Enumeration of network shares following successful escalation | Linked |
| Phase 3: Exfiltration |
Firewall / Sysmon 3 | Src IP: 10.0.1.5/ Dest Port: 443 | Correlation between discovered assets and outbound data flow | Linked |
| Noise Filtering | Benign Sysmon Logs | No correlation found | Automatic exclusion of 1,240+ background event lines | Filtered |
| Metric Category | Evaluation Metric | Value | Interpretation & Impact |
|---|---|---|---|
| Integrity | Hallucination Rate | 6.2% | Minimizes false positives and operational confusion. |
| Traceability | Evidence Citation Rate | 93.8% | Every claim is backed by a verifiable log entry. |
| Precision | Technique Mapping Accuracy | 95.5% | Precise alignment with the MITRE ATT&CK framework. |
| Consistency | Logic Consistency Score | 98.1% | Internal logical flow remains coherent across reports. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).