Submitted:
25 February 2026
Posted:
27 February 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- The semantic similarity between NIST controls and ATT&CK mitigations is clearer and more robust than the direct similarity between controls and techniques; therefore, leveraging the hard-coded relationships between mitigations and techniques will drastically improve mapping accuracy.
2. Related Works
2.1. Embedding-based Semantic Mapping Pipelines
| Mapping Method | Mapping Target | Mapping Approach |
| MITRE CTID Mapping | RMF ↔ ATT&CK | Manual, community-based large-scale mapping |
| Wudali [6] NIST 800-160 v2 ↔ ATT&CK | NIST 800-160 v2 → ATT&CK | Theory-based mapping utilizing PETE analysis |
| Abderehman[7] RAM (Rule ATT&CK Mapper) |
SIEM Rules → ATT&CK | LLM prompt chaining-based automation |
| CVE ↔ ATT&CK[8] | CVE Descriptions → ATT&CK Tactics | Transformer-based classification models |
| rcATT Tool[8] | Report Text → ATT&CK | NLP classifier-based automated extraction |
2.2. NIST SP 800-53 and MITRE ATT&CK
2.3. Knowledge Graph-based Mapping and Mitigation Parameters
3. Proposed Method

3.1. Problem Definition and Performance Metric Refinement
| Metric | Definition |
|---|---|
| Precision | Precision represents the proportion of valid linkages within the Top-K mappings proposed by the model; achieving high precision is essential to mitigating the cognitive burden and alert fatigue of security operators. |
| Recall | Recall represents the proportion of valid ground-truth mappings that the model successfully identifies within its Top-K candidates. This metric is critical for identifying potential security gaps, as it measures the model's ability to ensure that no essential defensive links are overlooked. |
| F1-score | By calculating the harmonic mean of Precision and Recall, the F1-score provides a singular, balanced indicator of a model’s mapping accuracy and retrieval capability. This ensures a rigorous assessment of the model's effectiveness in aligning RMF controls with ATT&CK techniques under identical experimental conditions. |
- Numerator: The number of control-technique pairs correctly predicted by the model.
- Denominator: The total number of existing ground-truth pairs for the controls where predictions were generated.
3.2. Data Preprocessing and Embedding Design
3.3. Model Pipeline Architecture
- Efficient Vector Search via Faiss: For Model 1 and Model 3, which involve extensive semantic similarity calculations between RMF controls ($N=1,189$) and ATT&CK mitigations ($K=44$), we utilized the Faiss (Facebook AI Similarity Search) library. By indexing the SBERT-generated dense vectors into a FlatL2 or Inner Product index, we achieved near-instantaneous cosine similarity retrieval, significantly reducing the computational overhead as noted in Table 6.
- Deterministic Knowledge Graph Parsing: The structural linkages between mitigations and techniques were extracted using the stix2 Python library. This allowed for precise traversal of the "mitigates" relationship types within the STIX 2.1 formatted enterprise-attack.json. By automating the parsing of the STIX knowledge graph, we ensured that the deterministic mapping segment ($M \to T$) maintains high fidelity and technical reproducibility.

| Algorithm 1: Hybrid RMF-to-ATT&CK Mapping via Mitigation Chain |
| Input: RMF Controls C, ATT&CK Mitigations M, STIX Knowledge Graph GSTIX |
| Output: Top-K Mapped Techniques TTopK for each c ∈ C |
| 1: procedure GENERATE_MAPPING (C, M, GSTIX) |
| 2: EC = SBERT_Encode(C) // Generate 384-dim dense vectors |
| 3: EM = SBERT_Encode(M) // |
| 4: Index = Faiss_IndexFlatIP(384) // Initialize Inner Product index |
| 5: Index.add(EM) // Index mitigation vectors for efficient search |
| 6: for each c in C do |
| 7: Scores, Indices = Index.search(Ec, top_n) // Rapid retrieval |
| 8: Mcandidates = {m ∈ M | similarity(c, m) > τ // Apply mitigation parameter τ=0.45 |
| 9: Tcandidates = φ |
| 10: for each m in Mcandidates do |
| 11: Tmapped = GSTIX.get_mitigates(m) // Deterministic lookup via stix2 library |
| 12: Tcandidates = Tcandidates + Tmapped |
| 13: end for |
| 14: TTopK = Rank_and_Filter(Tcandidates, top=5) // |
| 15: end for |
| 16: return All TTopK |
| 17: end procedure |
3.4. Model Architectures

- C: The set of NIST Security Controls;
- M: The set of ATT&CK Mitigations;
- T: The set of ATT&CK Techniques;
- Sim(c, m) is the cosine similarity between a control and a mitigation calculated using Sentence-BERT (SBERT) [6].
- is the Threshold (Mitigation Parameter).
- m → t is the Deterministic Relationship defined in the STIX dataset.[0]
where α and β represent optimized weights determined through validation subset tuning, ensuring α + β = 1 to maintain scoring consistency.

4. Experiments
4.1. Experimental Workflow Overview
- Data Input and Acquisition: The cycle begins with the ingestion of heterogeneous datasets. This includes the manual upload of RMF mapping data (NIST SP 800-53 controls) and the CTID (Center for Threat-Informed Defense) mapping data, which serves as the ground truth. Additionally, the latest ATT&CK datasets are fetched via online repositories to ensure the experimental environment reflects the current threat landscape.
- Preprocessing: To harmonize the disparate data formats, a preprocessing layer is implemented. This involves Data Cleaning to remove noise, Normalization to standardize textual representations, and Alignment Prep to structure the control-technique pairs for vectorization.
- Mapping Pipelines: The core of the experiment utilizes four distinct methodological approaches to determine the relationship between security controls and adversarial techniques:
- Model 1 (SBERT Direct): Utilizes a Bi-Encoder architecture based on Sentence-BERT (SBERT) to independently embed RMF control descriptions and ATT&CK technique summaries into high-dimensional vectors, determining relevance through cosine similarity calculations.
- Model 2 (Cross-Encoder): Employs a Cross-Encoder re-ranking architecture that processes concatenated control–technique pairs to capture token-level interactions, refining the initial Top-M candidates generated by Model 1 for improved semantic precision.
- Model 3 (Mitigation Mapping): Leverages a structural layered mapping approach that uses ATT&CK Mitigations as an intermediate semantic bridge; it calculates similarity between controls and mitigations and then expands to techniques via deterministic "mitigates" relationships in the STIX knowledge graph.
- Model 4 (Ensemble Mapping): Implements a hybrid ensemble strategy that integrates probabilistic semantic signals from Model 1 with structural mapping signals from Model 3 using weighted sum ensemble and gating mechanisms to optimize overall mapping stability.
- Evaluation and Output: In the final phase, the generated mappings are validated by comparing them against the CTID Gold Standard. An Accuracy Analysis is performed using metrics such as Precision, Recall, and F1-Score. The validated results are then exported into structured CSV and XLSX formats for further forensic analysis and visualization.

4.2. Datasets
4.3. Experimental Environment
| Category | Details |
|---|---|
| Base Environment | Google Colab Pro+ (Python 3.12) |
| Type of Runtime (HW) | T4 / A100 GPU |
| Core Libraries | NumPy 2.2.1, Pandas 2.2.2, Scikit-learn 1.5.2, Sentence-transformers 3.0.1, Faiss-cpu 1.9.0, Matplotlib 3.8.4 |
| Model Configuration |
Baseline/M2: Sentence-BERT (SBERT) family embedding (all-MiniLM-L6-v2) Re-ranking: Cross-Encoder architecture |
| Gating Mechanism | RMF family → Tactic gating (set of allowed tactics) Keyword overlap filter (DS/MIT) |
| Indexing/Search | Cosine Similarity (Normalized Inner Product) + Top-M candidate generation |
| Dataset |
CTID: attack-control-framework-mappings ATT&CK: enterprise-attack-v14.1.json RMF R5: nist800-53-r5-controls.tsv Gold Standard: attack-10-1-to-nist800-53-r5-mappings.tsv |
4.4. Evaluation Procedure

4.5. Experimental Results and Discussion
4.5.1. Direct Matching Models (Models 1 and 2)
4.5.2. Mitigation Chain Model (Model 3)
4.5.3. Ensemble Model (Model 4)
| Model | Expect Controls(C) | Active Controls(C’) | Hit@5 Count | Standard Recall | Recall@restricted |
|---|---|---|---|---|---|
| M1: SBERT Direct | 1,189 | 1,189 | 428 | 0.0174 | 0.36 |
| M2: Cross-Encoder | 1,189 | 1,189 | 451 | 0.0181 | 0.38 |
| M3: Mitigation Chaing | 1,189 | 419 | 247 | 0.0097 | 0.59 |
| M4: Proposed Ensemble | 1,189 | 1,189 | 880 | 0.0211 | 0.74 |
4.6. Analysis of Model Performance and Error Modes

4.7. Validation of Mappings Outside the CTID Ground Truth
| Control ID | Technique ID | Score | Rank |
|---|---|---|---|
| SI-7(10) | T1542 | 0.7081 | 1 |
| SI-7(10) | T1495 | 0.7060 | 2 |
| SI-7(10) | T1542.001 | 0.7020 | 3 |
| SI-7(10) | T1542.003 | 0.6939 | 4 |
| SI-7(9) | T1542 | 0.6641 | 1 |
| Feature | Model 1 (M1) | Model 2 (M2) | Model 3 (M3) | Model 4 (M4) |
|---|---|---|---|---|
| Description | Direct Semantic Mapping | Pairwise Sentence Re-scoring | Structural Layered Mapping | Hybrid Ensemble Mapping |
| Comparison Path | Control↔ Technique | Control ↔ Technique (Top-M) | Control ↔ Mitigation ↔ Technique | M1 (Semantic) + M3 (Structural) |
| Architecture | SBERT Bi-Encoder | SBERT + Cross-Encoder | SBERT + STIX Graph | Weighted Ensemble + Gating |
| Scoring Method | Cosine Similarity | λ Sembed + (1-λ) SCE | SCT = max(SCM) | αSM1 + βSM3 |
| Explainability | Medium | Low (Black-box CE) | Very High (Path-based) | High (Reasoning-based) |
| Computational Cost | Low | High | Medium | Medium |
| Key Advantage | High speed & Scalability | Improved Precision | Semantic gap reduction | Structural consistency |
| Limitations | High False Positives | Limited by Candidate pool | Mitigation link dependent | Weight optimization required |
5. Conclusions
Contributions
Author Contributions
Funding
Data Availability Statement
References
- Joint Task Force. Security and Privacy Controls for Information Systems and Organizations; NIST Special Publication 800-53, Revision 5; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2020. [CrossRef]
- The MITRE Corporation. MITRE ATT&CK®. Available online: Available online: https://attack.mitre.org (accessed on 20 January 2026).
- Center for Threat-Informed Defense. NIST 800-53 Control Mappings to MITRE ATT&CK. Available online: https://ctid.mitre.org/projects/nist-800-53-control-mappings/ (accessed on 20 January 2026).
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3982–3992.
- OASIS Cyber Threat Intelligence (I) TC. STIX™ Version 2.1; OASIS Standard: Burlington, MA, USA, 2021.
- Prasanna N. Wudali, Moshe Kravchik, Ehud Malul, Parth A. Gandhi.; P.N. Rule-ATT&CK Mapper (RAM): Mapping SIEM Rules to TTPs Using LLMs. arXiv 2025, arXiv:2502.02337.
- Roger CYIZA USENGIMANA, Mohammed Abderehman.; M. VMTT&RP: Automated Vulnerability Mapping with MITRE ATT&CK TTPs and Risk Prioritization. Research Square 2025. (Preprint). [CrossRef]
- Chenhui Zhang, Le Wang, Dunqiu Fan, Junyi Zhou, Liyi Zeng, Zhaohua Le.; VTT-LLM: Advancing Vulnerability-to-Tactic-and-Technique Mapping through Fine-Tuning of Large Language Model. Mathematics 2024, 12, 1286. [CrossRef]
- Pasha Rafiey.; Jafarnejad, S. Mapping Vulnerability Description to MITRE ATT&CK Framework by LLM. Research Square 2025. (Preprint). [CrossRef]
- Ampel, B.M.; Samtani, S.; Zhu, H.; Chen, H.; Nunamaker, J.F. Improving threat mitigation through a cybersecurity risk management framework: A computational design science approach. J. Manag. Inf. Syst. 2024, 41, 236–265. [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830.
- Swedish Defence Research Agency (FOI). Bridging Semantic Interoperability Gaps with SILF. FOI Report No. FOI-R--5082--SE. Available online: https://www.foi.se/download/18.7fd35d7f166c56ebe0bffdf/1542623723499/Bridging-semantic-interoperability_FOI-S--5082--SE.pdf (accessed on 11 February 2026).
- National Institute of Standards and Technology (NIST). Knowledge Mining in Cybersecurity: From Attack to Defense. NISTIR 8450. Available online: https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=934782 (accessed on 11 February 2026).
- OASIS Cyber Threat Intelligence Technical Committee. STIX Version 2.1. Committee Specification 01. Available online: https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html (accessed on 11 February 2026).
- Elmorshidy, S. Sentence Transformers, Bi-Encoders And Cross-Encoders. Medium. Available online: https://medium.com/@shazaelmorsh/sentence-transformers-bi-encoders-and-cross-encoders-a82cba125abd (accessed on 11 February 2026).
- Alotaibi, M.; Alharbi, B.; Alshahrani, M. Enhancing Query Relevance: Leveraging SBERT and Cosine Similarity for Optimal Information Retrieval. Available online: https://www.researchgate.net/publication/383179880_Enhancing_query_relevance_leveraging_SBERT_and_cosine_similarity_for_optimal_information_retrieval (accessed on 11 February 2026).
- eccenca GmbH. Build a Knowledge Graph from STIX 2.1 Data Such as the MITRE ATT&CK® Datasets. eccenca Documentation. Available online: https://documentation.eccenca.com/23.3/build/tutorial-how-to-link-ids-to-osint/lift-data-from-STIX-2.1-data-of-mitre-attack/ (accessed on 11 February 2026).
- Bury, M.; Konrad, C. On Estimating Maximum Matching Size in Graph Streams. Available online: https://www.researchgate.net/publication/312252274_On_Estimating_Maximum_Matching_Size_in_Graph_Streams (accessed on 11 February 2026).
- MITRE. STIX™ 2.1 Representation of the ATT&CK® Knowledge Base; MITRE Corporation: McLean, VA, USA, 2021. Available online: https://ctid.mitre-projects.org/ (accessed on 19 February 2026).
- Kwon, R.; Ashley, T.; Castleberry, J.; Maughan, P. Cyber Threat Intelligence Modeling Based on STIX 2.1 Using Knowledge Graph Embedding. IEEE Access 2022, 10, 56123–56135.
- Kim, J.; Lee, H.; Park, N. Automated Mapping of Security Controls to Adversarial Techniques Using Transformer-based Language Models. Applied Sciences 2023, 13, 4021.
- Lee, H.; Yoon, S.; Lee, Y.-K.; Kang, J. Evaluating BERT-Based Models for Mapping RMF Security Controls to MITRE ATT&CK Techniques. Journal of Convergence Security 2025, 25, 11–20. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).