Submitted:
29 May 2025
Posted:
03 June 2025
You are already at the latest version
Abstract
Keywords:
1. INTRODUCTION
2. DESIGN OF DEEP LEARNING BASED CODE GENERATION RECOMMENDATION PLATFORM
3. CODE GENERATION ALGORITHM FUSING WORD EMBEDDING MODELS AND NEURAL NETWORKS
3.1. Item2Vec-based code feature extraction
3.2. Neural network collaborative filtering code generation methods
3.3. Multimodal feature fusion strategy
4. EXPERIMENTAL RESULTS AND ANALYSIS
4.1. Experimental dataset and environment configuration
4.2. Algorithm performance evaluation index
4.3. Comparative experimental results of different models
5. CONCLUSION
References
- Gu, X.; Chen, M.; Lin, Y.; et al. On the effectiveness of large language models in domain-specific code generation. ACM Transactions on Software Engineering and Methodology 2025, 34, 1–22. [Google Scholar] [CrossRef]
- Hou, W.; Ji, Z. Comparing large language models and human programmers for generating programming code. Advanced Science 2025, 12, 2412279. [Google Scholar] [CrossRef] [PubMed]
- Jansen, J.A.; Manukyan, A.; Al Khoury, N.; et al. Leveraging large language models for data analysis automation. PloS one 2025, 20, e0317084. [Google Scholar] [CrossRef] [PubMed]
- Zheng, Z.; Ning, K.; Zhong, Q.; et al. Towards an understanding of large language models in software engineering tasks. Empirical Software Engineering 2025, 30, 50. [Google Scholar] [CrossRef]
- Tihanyi, N.; Bisztray, T.; Ferrag, M.A.; et al. How secure is AI-generated code: a large-scale comparison of large language models. Empirical Software Engineering 2025, 30, 1–42. [Google Scholar] [CrossRef]
- Veeramachaneni, V. Large Language Models: a Comprehensive Survey on Architectures, Applications, and Challenges. Advanced Innovations in Computer Programming Languages 2025, 7, 20–39. [Google Scholar]
- Sobo, A.; Mubarak, A.; Baimagambetov, A.; et al. Evaluating LLMs for code generation in HRI: A comparative study of ChatGPT, gemini, and claude. Applied Artificial Intelligence 2025, 39, 2439610. [Google Scholar] [CrossRef]
- Das, B.C.; Amini, M.H.; Wu, Y. Security and privacy challenges of large language models: a survey. ACM Computing Surveys 2025, 57, 1–39. [Google Scholar] [CrossRef]
- Huang, S.; Huang, Y.; Liu, Y.; et al. Are large language models qualified reviewers in originality evaluation? Information Processing & Management 2025, 62, 103973. [Google Scholar]
- Jiang, X.; Dong, Y.; Wang, L.; et al. Self-planning code generation with large language models. ACM Transactions on Software Engineering and Methodology 2024, 33, 1–30. [Google Scholar] [CrossRef]



| Parameter name | numerical value | instructions |
| Total number of Token | 134,680,000 | Total of all Token in the training sample |
| Dictionary size | 87,592 | Number of unique Token |
| Embedded Dimension | 256 | Token Vector Dimension |
| Sliding window size | 5 | Context Window Settings |
| Negative sample size | 10 | Number of negative samples corresponding to each positive sample |
| Batch Size | 1024 | Number of Token processed per training round |
| learning rate | 0.001 | Initial learning rate |
| optimizer | Adam | Parameter update method |
| Number of iteration rounds | 12 | Full training rounds |
| Training platforms | Tesla V100×4 | distributed parallel environment (DPE) |
| Model name | BLEU-4 | CodeBLEU | Exact Match | Top-5 Accuracy | Average response time (ms) |
| Integration of multimodal + collaborative filtering | 47.82 | 53.67 | 28.94 | 71.53 | 87 |
| Unimodal Transformer | 38.45 | 42.18 | 19.64 | 60.21 | 213 |
| CodeBERT fine-tuning model | 40.71 | 45.96 | 21.88 | 63.74 | 186 |
| LSTM+Attention | 32.17 | 35.83 | 14.25 | 51.62 | 257 |
| model variant | BLEU-4 | CodeBLEU | Exact Match | Top-5 Accuracy |
| Full model (all modules) | 47.82 | 53.67 | 28.94 | 71.53 |
| Remove Item2Vec pre-training | 43.39 | 48.25 | 24.7 | 66.28 |
| Removal of neural collaborative filtering subnetworks | 41.83 | 46.02 | 22.12 | 64.71 |
| Removal of multimodal fusion structures | 39.12 | 43.67 | 20.85 | 61.39 |
| Model Type | BLEU-4 | CodeBLEU | Exact Match | Average response time (ms) |
| LSTM+Attention | 32.17 | 35.83 | 14.25 | 257 |
| Transformer (no pre-training) | 38.45 | 42.18 | 19.64 | 213 |
| CodeBERT fine-tuning model | 40.71 | 45.96 | 21.88 | 186 |
| The fusion model of this study | 47.82 | 53.67 | 28.94 | 87 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).