Submitted:
30 May 2025
Posted:
10 June 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Background
2.1. Rust’s Safety Mechanism
2.2. Unsafe Rust
2.3. RustSec and OSV Vulnerability Databases
2.4. Language-Independent Intermediate Representation (LLVM IR)
2.5. Embedding-Based Vulnerability Classifier
2.6. Vulnerability Detection Tools
3. System Architecture
4. Research Methodology
4.1. Research Questions
4.2. Data Collection and Pre-processing
4.3. Rust Snippet Wrapping
- Missingmain()Functions: Many examples consist solely of library functions. Our wrapper detects any code lacking main function and appends a minimal fn main() so that rustc –emit=llvm-ir will succeed.
- Missing Contextual Definitions: Snippets sometimes refer to types or traits defined elsewhere. We include lightweight dummy modules (e.g. mod reactor ) to satisfy these external references.
- Feature and Flag Variability: Different crates target varying Rust editions or feature sets. By standardizing on the 2018 edition and using a uniform stub approach, we avoid per-snippet compiler flag adjustments.
- Project-Level Dependencies: Some code relies on broader project settings or build scripts. Where isolated compilation fails, our script logs and skips those cases, ensuring only self-contained snippets proceed.
4.4. Data Labeling
4.5. Code Pre-Processing and Representation
| Listing 1. Pre-processing .ll Files. |
![]() |
4.6. Embedding-Based Feature Extraction
| Listing 2. Extraction of CLS Embedding from LLVM IR. |
![]() |
4.7. Experimental Setup
| Listing 3. Threshold Tuning on Validation Set. |
![]() |
5. Experimental Output
- Safe example (alignment.ll) The model processed the uploaded alignment.ll file, cleaned and embedded it via GraphCodeBERT, and correctly output “NOT VULNERABLE”, demonstrating its low false-alarm rate on benign code (see Figure 7).
- Vulnerable example (error.ll) In contrast, when given error.ll—which contains a known flaw—the classifier returned “VULNERABLE” and automatically assigned CVE-2023-41317, matching the ground truth (see Figure 8).


6. Discussion
7. Threats to Validity
8. Conclusions and Future Work
8.1. Future Work
Funding
References
- Kishiyama, B.; Lee, Y.; Yang, J. Improving VulRepair’s Perfect Prediction by Leveraging the LION Optimizer. Applied Sciences 2024, 14, 5750. [Google Scholar] [CrossRef]
- Yang, J.; Lodgher, A. Fundamental Defensive Programming Practicec with Secure Coding Modules. International Conference on Security and Management 2019. [Google Scholar]
- Zheng, X.; Wan, Z.; Zhang, Y.; Chang, R.; Lo, D. A Closer Look at the Security Risks in the Rust Ecosystem. ACM Trans. Softw. Eng. Methodol. 2023, 33, 34–1. [Google Scholar] [CrossRef]
- Bugden, W.; Alahmar, A. The Safety and Performance of Prominent Programming Languages. International Journal of Software Engineering and Knowledge Engineering 2022, 32, 713–744. [Google Scholar] [CrossRef]
- Zhu, S.; Zhang, Z.; Qin, B.; Xiong, A.; Song, L. Learning and programming challenges of rust: a mixed-methods study. In Proceedings of the Proceedings of the 44th International Conference on Software Engineering, New York, NY, USA, 2022; ICSE ’22, pp. 1269–1281. [CrossRef]
- Memory-Safety Challenge Considered Solved? An In-Depth Study with All Rust CVEs | ACM Transactions on Software Engineering and Methodology.
- Bae, Y.; Kim, Y.; Askar, A.; Lim, J.; Kim, T. Rudra: Finding Memory Safety Bugs in Rust at the Ecosystem Scale. In Proceedings of the Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, Virtual Event Germany, 2021; pp. 84–99. [CrossRef]
- Qin, B.; Chen, Y.; Yu, Z.; Song, L.; Zhang, Y. Understanding memory and thread safety practices and issues in real-world Rust programs. In Proceedings of the Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, London UK, 2020; pp. 763–779. [CrossRef]
- Zeming Yu.; Linhai Song.; Yiying Zhang. Fearless Concurrency? Understanding Concurrent Programming Safety in Real-World Rust Software. ArXiv 2019, abs/1902.01906.
- Hassnain, M.; Stanford, C. Counterexamples in Safe Rust. In Proceedings of the Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops, New York, NY, USA, 2024; ASEW ’24, pp. 128–135. [CrossRef]
- How to write a timing-attack-proof comparison function (`Ord::cmp`, lexicographic) for byte arrays? - help, 2023. Section: help.
- Park, S.; Cheng, X.; Kim, T. Unsafe’s Betrayal: Abusing Unsafe Rust in Binary Reverse Engineering via Machine Learning. 2022.
- Li, Z.; Zou, D.; Xu, S.; Jin, H.; Zhu, Y.; Chen, Z. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing 2022, 19, 2244–2258. [Google Scholar] [CrossRef]
- Li, Z.; Zou, D.; Xu, S.; Ou, X.; Jin, H.; Wang, S.; Deng, Z.; Zhong, Y. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In Proceedings of the Proceedings 2018 Network and Distributed System Security Symposium, 2018. arXiv:1801.01681 [cs]. [CrossRef]
- Hanif, H.; Maffeis, S. VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022, pp. 1–8. Place: Padua, Italy tex.eventtitle: 2022 International Joint Conference on Neural Networks (IJCNN). [CrossRef]
- Guo, D.; Ren, S.; Lu, S.; Feng, Z.; Tang, D.; Liu, S.; Zhou, L.; Duan, N.; Svyatkovskiy, A.; Fu, S.; et al. GraphCodeBERT: Pre-training Code Representations with Data Flow, 2021. arXiv:2009.08366 [cs]. [CrossRef]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: unbiased boosting with categorical features. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 2018, Vol. 31.
- Ralf Jung.; Jacques-Henri Jourdan.; R. Krebbers.; Derek Dreyer. RustBelt: securing the foundations of the rust programming language. Proceedings of the ACM on Programming Languages 2017, 2, 1–34. [CrossRef]
- Yanovski, J.; Dang, H.H.; Jung, R.; Dreyer, D. GhostCell: separating permissions from data in Rust. Proceedings of the ACM on Programming Languages 2021, 5, 1–30. [Google Scholar] [CrossRef]
- Jung, R.; Jourdan, J.H.; Krebbers, R.; Dreyer, D. Safe systems programming in Rust. Communications of the ACM 2021, 64, 144–152. [Google Scholar] [CrossRef]
- AWS’ sponsorship of the Rust project | AWS Open Source Blog, 2019. Section: Developer Tools.
- Klabnik, S. The Rust Programming Language, 2nd Edition; No Starch Press: New York, 2023. [Google Scholar]
- About RustSec › RustSec Advisory Database.
- OSV - Open Source Vulnerabilities.
- Computer Security Division. NIST 2008. Last Modified: 2022-04-11T08:23-04:00.
- Cheng, X.; Zhang, G.; Wang, H.; Sui, Y. Path-sensitive code embedding via contrastive learning for software vulnerability detection. In Proceedings of the Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual South Korea, 2022; pp. 519–531. [CrossRef]
- Zhou, Y.; Liu, S.; Siow, J.; Du, X.; Liu, Y. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks, 2019. arXiv:1909.03496 [cs]. [CrossRef]
- Lattner, C.; Adve, V. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization, 2004. CGO 2004., San Jose, CA, USA; 2004; pp. 75–86. [Google Scholar] [CrossRef]
- Moses, W.S. Understanding High-Level Properties of Low-Level Programs Through Transformers. 2022.
- Mahyari, A. A Hierarchical Deep Neural Network for Detecting Lines of Codes with Vulnerabilities, 2022. arXiv:2211.08517 [cs]. [CrossRef]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: unbiased boosting with categorical features, 2019. arXiv:1706.09516 [cs]. [CrossRef]
- Fu, M.; Tantithamthavorn, C.; Le, T.; Kume, Y.; Nguyen, V.; Phung, D.; Grundy, J. AIBugHunter: A Practical tool for predicting, classifying and repairing software vulnerabilities. Empirical Software Engineering 2024, 29, 4. [Google Scholar] [CrossRef]
- HALURust: Exploiting Hallucinations of Large Language Models to Detect Vulnerabilities in Rust.
- Suneja, S.; Zheng, Y.; Zhuang, Y.; Laredo, J.; Morari, A. Learning to map source code to software vulnerability using code-as-a-graph. ArXiv 2020, abs/2006.08614.
- Cipollone, D.; Wang, C.; Scazzariello, M.; Ferlin, S.; Izadi, M.; Kostic, D.; Chiesa, M. Automating the Detection of Code Vulnerabilities by Analyzing GitHub Issues, 2025. arXiv:2501. 0525. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2016; KDD ’16, pp. 785–794. [CrossRef]
- Wang, R.; Xu, S.; Tian, Y.; Ji, X.; Sun, X.; Jiang, S. SCL-CVD: Supervised contrastive learning for code vulnerability detection via GraphCodeBERT. Computers & Security 2024, 145, 103994. [Google Scholar] [CrossRef]
- K, V.K.; P, S.K.; S, D.; C, G.K.; S, R. Design and Development of Android App Malware Detector API Using Androguard and Catboost. International Journal for Research in Applied Science and Engineering Technology 2024, 12, 5121–5128. [Google Scholar] [CrossRef]
- Ullah, S.; Han, M.; Pujar, S.; Pearce, H.; Coskun, A.; Stringhini, G. LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks, 2024. arXiv:2312.12575 [cs]. [CrossRef]
- Mittal, A. Code Embedding: A Comprehensive Guide, 2024. Section: Artificial Intelligence.
- Sahil Suneja.; Yunhui Zheng.; Yufan Zhuang.; Jim Laredo.; Alessandro Morari. Learning to map source code to software vulnerability using code-as-a-graph. ArXiv 2020, abs/2006.08614.
- de Moor, O.; Verbaere, M.; Hajiyev, E.; Avgustinov, P.; Ekman, T.; Ongkingco, N.; Sereni, D.; Tibble, J.; Limited, S.; Centre, M.; et al. Keynote Address:.QL for Source Code Analysis.








| Approach | Input Format | Language | Model Architecture | Dataset | Metrics |
|---|---|---|---|---|---|
| Rust-IR-Bert | Rust → LLVM IR → Embeddings | Rust | GraphCodeBERT + CatBoost | RustSec + OSV IR | F1-Score: 98.10%; Recall (V): 94.94%, (nV): 99.66% |
| HALURust | Rust-LLM-generated vulnerability reports | Rust | Gemma-7B (7B-parameter LLM) | 81 real-world Rust CVEs | F1: 77.3% |
| AI4VA | C → Code Property Graph | C | Gated Graph Neural Network (GGNN) | Juliet, s-bAbI, Draper | F1: 0.87 (Juliet), 0.50 (Draper) |
| SySeVR | C/C++ → syntax/semantic vectors | C/C++ | Bidirectional GRU (BGRU) | Libav, Seamonkey, Thunderbird, Xen | Recall: 92.9% at Coverage: 16.8% |
| Unsafe’s Betrayal | Rust binary (assembly tokens) | Rust | RoBERTa-on-asm tokens | CrateU + RustSec | AUPRC 80% for unsafe-code detection |
| VulBERTa | C/C++ source | C/C++ | RoBERTa-based Transformer | Draper, muVuldeepecker, CodeXGLUE, D2A | F1: 57.9% (Draper), 99.6% (muVuldeepecker) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).


