Submitted:
19 October 2025
Posted:
20 October 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Proposed CQLLM, a framework for automatic generation of executable QL code based on LLM.It enables the automatic generation of QL code from CWE identifiers and vulnerability descriptions. The execution success rate increased from 0.31% to 72.48%, and CWE detection coverage reached 57.4%. This reduces reliance on manual effort while improving the correctness of generated QL code and the accuracy of vulnerability detection.
- Construction of a high-quality QL knowledge base and a dataset.The knowledge base includes common vulnerability detection examples and QL dependency libraries. By using RAG to constrain the LLM’s generation process, it effectively avoids syntax errors and invalid module calls. The dataset was enhanced based on officially provided QL code. QL code generated using this knowledge base and dataset achieved an execution rate of 65.1%, significantly outperforming directly generated results from the LLM.
- Domain-specific fine-tuning of the base model using LoRA.This enhances the model’s understanding of vulnerability semantics and QL syntax, improving its code generation performance in complex scenarios. After fine-tuning, the LLM achieved an executable QL code generation rate of 22.11%.
2. Related Work
2.1. Automatic QL Code Generation
2.2. LLM-Assisted Code Generation
2.3. Retrieval-Augmented Generation
3. Our Methods
4. Method Implementation
4.1. Data Collection and Preprocessing Module
4.2. RAG Knowledge Base
4.3. Parameter-Efficient Fine-Tuning
4.4. Inference and Generation
5. Experiments and Results
5.1. Experimental Setting
5.2. Comparative Experiment
5.3. Ablation Study
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- GitHub. CodeQL for research. Available online: https://securitylab.github.com/tools/codeql.
- Lab, G.S. CodeQL: Code analysis for security research. Available online: https://codeql.github.com/.
- Yamaguchi, F.; Golde, N.; Arp, D.; Rieck, K. Modeling and Discovering Vulnerabilities with Code Property Graphs. In Proceedings of the 2014 IEEE Symposium on Security and Privacy, San Jose, CA; 2014; pp. 590–604. [Google Scholar]
- Chess, B.; McGraw, G. Static Analysis for Security. IEEE Security and Privacy Magazine 2004, 2, 76–79. [Google Scholar] [CrossRef]
- Nanda, M.G.; Sinha, S. Accurate Interprocedural Null-Dereference Analysis for Java. In Proceedings of the 2009 IEEE 31st International Conference on Software Engineering, Vancouver, BC, Canada; 2009; pp. 133–143. [Google Scholar]
- Heine, D.L.; Lam, M.S. A Practical Flow-Sensitive and Context-Sensitive C and C++ Memory Leak Detector. ACM SIGPLAN Notices 2003, 38, 168–181. [Google Scholar] [CrossRef]
- Tai-e Project. Tai-e: A static analysis framework for Java. Available online: https://tai-e.pascal-lab.net/.
- Vallée-Rai, R.; Co, P.; Gagnon, E.; Hendren, L.; Lam, P.; Sundaresan, V. Soot: A Java bytecode optimization framework. In CASCON First Decade High Impact Papers; 2010; pp. 214–224.
- Santos, J.C.S.; Dolby, J. Program Analysis Using WALA (Tutorial). In Proceedings of the Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore Singapore, 2022; p. 1819. [Google Scholar]
- Sahu, S.P.; Mandal, M.; Bharadwaj, S.; Kanade, A.; Maniatis, P.; Shevade, S. CodeQueries: A Dataset of Semantic Queries over Code. In Proceedings of the Proceedings of the 17th Innovations in Software Engineering Conference, Bangalore India, 2024; pp. 1–11. [Google Scholar]
- Sannigrahi, S.; Fraga-Silva, T.; Oualil, Y.; Van Gysel, C. Synthetic Query Generation Using Large Language Models for Virtual Assistants. In Proceedings of the Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington DC USA, 2024; pp. 2837–2841. [Google Scholar]
- Choenni, S.; Busker, T.; Bargh, M.S. Generating Synthetic Data from Large Language Models. In Proceedings of the 2023 15th International Conference on Innovations in Information Technology (IIT), Al Ain, United Arab Emirates; 2023; pp. 73–78. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W.; et al. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 2020, 33, 9459–9474. [Google Scholar]
- Hu, J.; Jin, X.; Zeng, Y.; Liu, Y.; Li, Y.; Du, D.; Xie, K.; Zhu, H. QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration, 2025.
- Li, Z.; Dutta, S.; Naik, M. IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities, 2024.
- Chen, M.; Tworek, J.; Jun, H.; Yuan, Q.; Pinto, H.P.d.O.; Kaplan, J.; Edwards, H.; Burda, Y.; Joseph, N.; Brockman, G.; et al. Evaluating Large Language Models Trained on Code, 2021.
- Wang, Y.; Wang, W.; Joty, S.; Hoi, S.C. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 2021; pp. 8696–8708. [Google Scholar]
- Wang, Y.; Le, H.; Gotmare, A.; Bui, N.; Li, J.; Hoi, S. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. In Proceedings of the Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 2023; Bouamor, H., Pino, J., Bali, K., Eds.; pp. 1069–1088. [Google Scholar]
- Le, H.; Wang, Y.; Gotmare, A.D.; Savarese, S.; Hoi, S.C.H. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems 2022, 35, 21314–21328. [Google Scholar]
- Zheng, Q.; Xia, X.; Zou, X.; Dong, Y.; Wang, S.; Xue, Y.; Shen, L.; Wang, Z.; Wang, A.; Li, Y.; et al. CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X. In Proceedings of the Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23. New York, NY, USA, 2023; pp. 5673–5684. [Google Scholar]
- Gao, D.; Wang, H.; Li, Y.; Sun, X.; Qian, Y.; Ding, B.; Zhou, J. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation, 2023.
- Zhou, X.; Sun, Z.; Li, G. Db-gpt: Large language model meets database. Data Science and Engineering 2024, 9, 102–111. [Google Scholar] [CrossRef]
- Kandpal, N.; Deng, H.; Roberts, A.; Wallace, E.; Raffel, C. Large language models struggle to learn long-tail knowledge. In Proceedings of the International conference on machine learning. PMLR; 2023; pp. 15696–15707. [Google Scholar]
- Cho, J.; Mahata, D.; Irsoy, O.; He, Y.; Bansal, M. M3docrag: Multi-modal retrieval is what you need for multi-page multi-document understanding. arXiv preprint arXiv:2411.04952 2024.
- Zhang, B.; Xiang, Y. Knowledge Base Enhanced ChatGLM for RPA Robot Code Generation. In Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application, Hangzhou China; 2023; pp. 961–965. [Google Scholar]
- Xu, Z.; Cruz, M.J.; Guevara, M.; Wang, T.; Deshpande, M.; Wang, X.; Li, Z. Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering. In Proceedings of the Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington DC USA, 2024; pp. 2905–2909. [Google Scholar]
- Dong, J.; Fatemi, B.; Perozzi, B.; Yang, L.F.; Tsitsulin, A. Don’t Forget to Connect! Improving RAG with Graph-based Reranking, 2024.





| Tuple Field | Explanation | Example |
|---|---|---|
| CWE-id | Vulnerability type identifier from the CWE, used to uniquely classify vulnerability categories. | CWE-20 |
| Query_id | Unique query identifier, used to distinguish different vulnerability detection rules. | py/affinity_filter |
| Name | Query name, usually containing the CWE ID and a brief description. | CWE-20: Improper Input Validation |
| Vul-type | A short summary or classification of the vulnerability type, helping to quickly understand the category. | StackTraceExposureQuery |
| Description | A detailed explanation of the vulnerability behavior and potential impact, supporting the model’s semantic understanding of the vulnerability. | “The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.” |
| Model | QL_dr | # CWEs | CWE_cov | Total_vul | Execution Success |
|---|---|---|---|---|---|
| Qwen2.5-coder-14B-Noft-NoRAG | 0.00% | 1 | 1.90% | 0 | 0.31% |
| CQLLM-Qwen2.5-coder-14B | 24.30% | 31 | 57.40% | 110686 | 58.38% |
| Qwen2.5-coder-7B-Noft-NoRAG | 0.00% | 0 | 0.00% | 0 | 0.00% |
| CQLLM-Qwen2.5-coder-7B | 17.70% | 29 | 53.70% | 3930 | 56.43% |
| Qwen3-8B-Noft-NoRAG | 0.00% | 0 | 0.00% | 0 | 0.00% |
| CQLLM-Qwen3-8B | 9.30% | 14 | 25.90% | 1401 | 47.42% |
| Metric | Qwen2.5-coder-14B | Qwen2.5-coder-7B | Qwen3-8B |
|---|---|---|---|
| predict_bleu-4 | 73.9625 | 72.6406 | 72.7459 |
| predict_model_preparation_time (s) | 0.0052 | 0.0052 | 0.0052 |
| predict_rouge-1 | 80.2009 | 79.2349 | 79.2776 |
| predict_rouge-2 | 72.3614 | 70.9991 | 71.2472 |
| predict_rouge-l | 78.3990 | 77.0226 | 77.2451 |
| predict_runtime | 3096.25 s | 3148.79 s | 3178.544 s |
| predict_samples_per_second | 0.613 | 0.603 | 0.597 |
| predict_steps_per_second | 0.077 | 0.076 | 0.075 |
| Model | Total QLs | Executable QLs | Execution Success | Total Vul |
|---|---|---|---|---|
| CQLLM | 321 | 181 | 56.38% | 3930 |
| Qwen2.5-coder-7B-ft-NoRAG | 321 | 71 | 22.11% | 1765 |
| Qwen2.5-coder-7B-Noft-RAG | 298 | 216 | 72.48% | 4755 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).