Submitted:
10 November 2025
Posted:
13 November 2025
You are already at the latest version
Abstract
Code generation has emerged as a critical research area at the intersection of Software Engineering (SE) and Artificial Intelligence (AI), attracting significant attention from both academia and industry. Within this broader landscape, Verilog, as a representative hardware description language (HDL), plays a fundamental role in digital circuit design and verification, making its automated generation particularly significant for Electronic Design Automation (EDA). Consequently, recent research has increasingly focused on applying Large Language Models (LLMs) to Verilog code generation, particularly at the Register Transfer Level (RTL), exploring how these AI-driven techniques can be effectively integrated into hardware design workflows. Despite substantial research efforts have been invested to explore LLM applications in this domain, a comprehensive survey synthesizing these developments remains absent from the literature. This review fill addresses this gap by providing a systematic literature review of LLM-based methods for Verilog code generation, examining their effectiveness, limitations, and potential for advancing automated hardware design. The review encompasses research work from conferences and journals in the fields of SE, AI, and EDA, encompassing 70 published papers, along with 32 high-quality preprint papers, bringing the total to 102 papers. By answering four key research questions, we aim to (1) identify the LLMs used for Verilog generation, (2) examine the datasets and metrics employed in evaluation, (3) categorize the techniques proposed for Verilog generation, and (4) analyze LLM alignment approaches for Verilog generation. Based on our findings, we have identified a series of limitations of existing studies. Finally, we have outlined a roadmap highlighting potential opportunities for future research endeavors in LLM-assisted hardware design.
Keywords:
1. Introduction
- Concurrency: hardware systems are inherently parallel, requiring simultaneous consideration of multiple signal paths unlike sequential software execution;
- Timing constraints: designs must satisfy strict timing requirements including setup/hold times and propagation delays;
- Physical limitations: generated code must respect area, power, and routing constraints that directly impact manufacturing feasibility;
- Synthesizability: verification encompasses functional correctness and timing closure beyond traditional software testing;
- Domain expertise: effective generation demands deep understanding of digital logic and computer architecture.
- We present the first systematic literature review on LLM-based Verilog code generation, analyzing 102 papers (70 peer-reviewed, 32 high-quality preprints) spanning 2020-2025.
- We provide comprehensive taxonomy and trend analysis of LLMs for Verilog generation (RQ1), systematically analyze dataset construction and evaluation evolution across 27 benchmarks and 34 training datasets (RQ2), investigate adaptation and optimization techniques (RQ3), and examine alignment techniques addressing human-centric requirements including security, efficiency, copyright, and hallucinations (RQ4).
- We identify key research limitations and propose a comprehensive roadmap for future directions in LLM-assisted hardware design.
2. Background and Preliminaries
2.1. Large Language Models
2.2. Verilog Code Generation
3. Review Methodology
3.1. Research Question
3.2. Search Strategy
3.2.1. Search Items
- Keywords related to Verilog code generation (12 terms): Verilog, HDL, Hardware Description Language, RTL, Register Transfer Level, Digital Design, Hardware Design, Electronic Design Automation, EDA, Verilog Generation, FPGA, ASIC
- Keywords related to LLMs (13 terms): LLM, Large Language Model, Language Model, GPT, ChatGPT, Transformer, fine-tuning, prompt engineering, In-context learning, Natural Language Processing, NLP, Machine Learning, AI
3.2.2. Search Databases
3.3. Search Selection
3.3.1. Inclusion and Exclusion Criteria
3.3.2. Quality Assessment
3.3.3. Forward and Backward Snowballing
3.4. Data Extraction and Analysis
4. RQ1: LLMs Employed for Verilog Code Generation
4.1. Analysis of Base LLMs
4.1.1. Open-Source Base LLMs
(1) Llama Family Leadership
(2) DeepSeek Series Growth
(3) Qwen Series Growth
(4) Other Code-Oriented Models
(5) Other Models
4.1.2. Closed-Source Base LLMs
(1) GPT-series LLMs
(2) Other LLMs
Trends Analysis
(1) Open-Source vs. Closed-Source
-
Open-Source LLMs: Demonstrate accelerated growth (5 → 46 → 153 mentions) with 233% increase from 2024 to 2025. The evolution shows a clear shift from specialized code models to general-purpose foundation models:
- –
- 2023: Dominated by traditional code generation models (CodeGen: 3 mentions)
- –
- 2024: Transition to foundation families (Llama: 15, DeepSeek: 6, Qwen: 3 mentions)
- –
- 2025: Widespread adoption of latest foundation models (Llama: 49, DeepSeek: 48, Qwen: 25 mentions)
-
Closed-Source LLMs: Show steady expansion (7 → 51 → 121 mentions) with 137% growth rate, reflecting market diversification:
- –
- 2023: Exclusively OpenAI models (GPT-3.5, GPT-4: 7 mentions)
- –
- 2024: GPT dominance continues (42 mentions) while competitors emerge (Claude: 5 mentions, Gemini: 1 mention)
- –
- 2025: Market expansion with new reasoning models (GPT-o1/o3) and stronger competition from Claude 3.5 and Gemini
(2) Architectural Preferences
4.2. Analysis of Instruction Tuned LLMs
4.2.1. Foundation Model
- DeepSeek-Coder ecosystem: 32.4% market share (11 models) across variants (base, v2, R1-Distill), reflecting confidence in its code comprehension architecture
- Qwen coder series: 26.5% adoption (9 models) combining CodeQwen and Qwen2.5-Coder, demonstrating rapid integration of Alibaba’s latest developments
- CodeLlama variants: 23.5% usage (8 models), maintaining strong presence despite newer alternatives
4.2.2. Naming Patterns
4.2.3. Trends Analysis
4.3. Model Selection Insights

5. RQ2: Dataset and Evaluation Metrics for Verilog Code Generation
5.1. Dataset Construction
5.1.1. Overview
5.1.2. Benchmark Dataset Construction
(1) Data Sources
(2) Quality Assurance
5.1.3. Instruct-Tuning Dataset Construction
(1) Data Sources
(2) Quality Assurance
5.1.4. Trends Analysis
(1) Testbench availability
(2) Input modality
(3) Dataset scale
5.2. Evaluation Metrics
5.2.1. Similarity-Based Metrics
(1) General-Purpose Metrics
(2) Verilog-Specific Metrics
5.2.2. Execution-Based Metrics
(1) Correctness Assessment
- Syntax-pass@k: Quantifies the proportion of generated code samples that successfully compile without syntax errors across k attempts
- Functional-pass@k: Determines behavioral correctness by executing generated modules against standardized testbenches with predefined expected outputs
- Formal-Verification: Establishes equivalence between generated code and reference implementations using formal methods via "Yosys -equiv"
(2) Hardware Quality Metrics
- Power: Measures energy efficiency of synthesized designs
- Performance: Analyzes timing characteristics including maximum operating frequency, critical path delays, and latency metrics
- Area: Quantifies resource utilization efficiency in terms of logic gates, flip-flops, memory elements, and overall silicon footprint
5.2.3. LLM-Based Metrics
(1) Model Confidence Metrics
(2) LLM-as-a-Judge
- VCD-RNK: Evaluates semantic consistency between natural language specifications and generated code implementations, implementing knowledge distillation techniques to develop efficient lightweight models capable of assessing functional [119]
- MetRex: Utilizes language models as expert evaluators to predict the PPA of generated Verilog code [115]
5.2.4. Metrics Trends Analysis
(1) Execution-based metrics
(2) Similarity-based metrics
(3) LLM-based metrics

6. RQ3: Adaptation and Optimization Techniques for Verilog Code Generation
6.1. Training-Free Methods
6.1.1. EDA-Tool Feedback
(1) Single-Agent Systems
(2) Multi-Agent Systems
6.1.2. Prompt Engineering
6.1.3. Inference Optimization
6.2. Training-Based Methods
6.2.1. Pre-Training
(1) Training from Scratch:
(2) Continued Pre-training:
6.2.2. Supervised Fine-Tuning
(1) Data-Centric Tuning
(2) Strategy-Centric Tuning
(3) Multi-task Tuning
(4) Knowledge-enhanced Tuning.
6.2.3. Reinforcement Learning
(1) Structure-based Rewards.
(2) Tool-based Feedback.
(3) Multi-objective Optimization.

7. RQ4: Alignment Approaches for Verilog Code Generation
- Security: Ensuring generated Verilog code is free from vulnerabilities, backdoors, or other security issues that could affect system security when implemented in hardware.
- Efficiency: Optimizing generated code for resource usage (e.g., Performance, Power, Area), as poorly designed implementations can lead to excessive energy consumption, slower operation, or inefficient use of silicon resources.
- Copyright: Addressing ownership concerns by verifying that generated Verilog code does not infringe on existing patents, licenses, or copy other’s designs without permission.
- Hallucinations: Reducing instances where LLMs produce code that appears correct but contains functional errors or logical inconsistencies that only become apparent during testing or implementation.
7.1. Security
(1) Current Methods
(2) Ongoing Challenges
(3) Promising Solutions
7.2. Efficiency
(1) Current Methods
(2) Ongoing Challenges
(3) Promising Solutions
7.3. Copyright
(1) Current Methods
(2) Ongoing Challenges
(3) Promising Solutions
7.4. Hallucinations
(1) Current Methods
(2) Ongoing Challenges
(3) Promising Solutions

8. The Road Ahead
8.1. Limitations
8.2. Roadmap
8.2.1. Established Pathways in Current Studies
8.2.2. Path Forward
- Opportunity 1: Hardware-Aware Foundation Models. While existing studies predominantly use general-purpose LLMs (RQ1), there is substantial opportunity to develop billion-scale Verilog-specific foundation models pretrained on curated hardware design corpora. These models should integrate EDA tool feedback during pretraining to develop intrinsic understanding of physical constraints, incorporate specialized architectural components for modeling concurrency and timing, and employ hardware-aware tokenization strategies capturing domain-specific patterns. Initial efforts like ChipGPT [70] demonstrate feasibility, but systematic development of specialized foundation models with reasoning capabilities remains largely unexplored.
- Opportunity 2: Comprehensive Benchmark Construction. Current benchmarks suffer from limited scale and coverage (RQ2). A critical opportunity exists to construct large-scale benchmarks exceeding 1,000 samples with graduated complexity from basic modules to industrial SoC designs. These benchmarks should include complete testbench suites with comprehensive coverage metrics, verified PPA ground truth from actual synthesis runs, standardized formats enabling consistent evaluation, and contamination prevention mechanisms ensuring fair assessment. Community-driven efforts maintaining living benchmarks through continuous expert-verified additions would provide reliable progress measurement.
- Opportunity 3: System-Level and Hierarchical Generation. Current approaches focus predominantly on module-level generation. Substantial opportunities exist for system-level (SoC to RTL) hierarchical generation capable of managing cross-module dependencies, interface protocol compliance, and design partitioning. This includes automatic design decomposition identifying appropriate module boundaries, hierarchical planning generating top-level architectures before detailed implementation, interface synthesis ensuring protocol compatibility across modules, and IP integration reusing verified components. Recent work on hierarchical benchmarks [100,103] provides foundations for this direction.
- Opportunity 4: Multimodal Verilog Generation. While most studies process text-only inputs, practical hardware design involves diverse artifacts including circuit diagrams, timing waveforms, block schematics, and datasheet specifications. Vision-language models adapted for hardware design could enable designers to sketch conceptual architectures and automatically generate RTL implementations. Initial explorations like ChipGPTV [97] and VGV [92] demonstrate feasibility, but systematic development of multimodal generation frameworks integrating visual specifications with textual descriptions remains largely unexplored.
- Opportunity 5: Closed-Loop PPA Optimization. As identified in RQ4, current PPA optimization approaches rely on expensive iterative feedback. Opportunities exist for developing reinforcement learning algorithms with reward functions derived directly from synthesis results, implementing efficient gradient-free optimization suitable for discrete hardware design spaces, and establishing automated refinement frameworks where initial generations undergo systematic improvement based on EDA tool feedback. Integration of synthesis tools (Yosys, Design Compiler) into generation loops could enable PPA-aware generation without manual iteration.
- Opportunity 6: Comprehensive Security Alignment. Security analysis (RQ4) reveals limited coverage of hardware-specific vulnerabilities. Opportunities include systematic construction of security-focused datasets covering full CWE hardware taxonomy, development of specialized detection models identifying timing-based vulnerabilities, side-channel weaknesses, and trojan insertion points, robust backdoor defense mechanisms protecting against training data poisoning, and automated security verification integrating formal methods and simulation-based validation. Frameworks like SecFSM [126] and SecV [127] provide initial foundations.
- Opportunity 7: Deployment-Ready Features. No current studies integrate solutions into industrial design workflows (RQ4). Critical opportunities exist for developing interactive design tools supporting human-in-the-loop refinement, implementing explainable generation providing transparent rationales for design decisions, creating seamless IDE integration enabling designers to leverage AI assistance within existing workflows, and establishing intelligent feedback mechanisms learning from designer corrections to align with project-specific requirements. Such features are essential for transitioning from research prototypes to production-ready tools.
- Once hardware-aware foundation models are developed, researchers can apply established fine-tuning or prompting techniques to enhance system-level hierarchical generation, then extend these models to support interactive designer collaboration through explainable interfaces and iterative refinement.
- With comprehensive benchmarks established, researchers can re-evaluate existing approaches to assess true progress beyond potentially contaminated datasets. Combining new benchmarks with standardized evaluation frameworks integrating PPA assessment and security analysis would enable fair cross-method comparison.
- Researchers can leverage multimodal generation capabilities with closed-loop PPA optimization, enabling designers to sketch high-level architectures that are automatically refined through synthesis feedback to meet timing and power constraints while maintaining visual correspondence with original specifications.
- Developing hierarchical generation frameworks using hardware-aware foundation models with integrated security alignment could produce system-level designs that are both architecturally sound and free from hardware vulnerabilities, verified through automated formal checking integrated into the generation pipeline.
9. Threats to Validity
10. Conclusion
Acknowledgments
| 1 | For preprint papers released on arXiv, we performed manual quality assessments to determine their eligibility for inclusion based on methodological soundness and contribution relevance. |
| 2 | Due to page limitations, we place the full statistical results of other models on the project homepage. |
| 3 | |
| 4 |
References
- Mark Chen, Jerry Tworek, Heewoo Jun, et al. Evaluating large language models trained on code. arXiv preprint, 2021.
- Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, et al. Code llama: Open foundation models for code. arXiv preprint, 2023.
- Qiwei Peng, Yekun Chai, Xuhong Li. HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization. In: Proceedings of Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024. 8383–8394.
- Qianhui Zhao, Fang Liu, Xiao Long, et al. On the Applicability of Code Language Models to Scientific Computing Programs. IEEE Transactions on Software Engineering, 2025, 51: 1685-1701. [CrossRef]
- Yi Gui, Zhen Li, Yao Wan, et al. Webcode2m: A real-world dataset for code generation from webpage designs. In: Proceedings of Proceedings of the ACM on Web Conference 2025, 2025. 1834–1845.
- Sathvik Joel, Jie Wu, Fatemeh Fard. A survey on llm-based code generation for low-resource and domain-specific programming languages. ACM Transactions on Software Engineering and Methodology, 2025, Online. [CrossRef]
- Xiaodong Gu, Meng Chen, Yalan Lin, et al. On the effectiveness of large language models in domain-specific code generation. ACM Transactions on Software Engineering and Methodology, 2025, 34: 1–22. [CrossRef]
- Nan Wu, Yuan Xie, Cong Hao. Ai-assisted synthesis in next generation eda: Promises, challenges, and prospects. In: Proceedings of 2022 IEEE 40th International Conference on Computer Design (ICCD), 2022. 207-214.
- Hammond Pearce, Benjamin Tan, Ramesh Karri. DAVE: Deriving Automatically Verilog from English. In: Proceedings of MLCAD ’20: 2020 ACM/IEEE Workshop on Machine Learning for CAD, Virtual Event, Iceland, 2020. 27–32.
- Barbara A. Kitchenham, Lech Madeyski, David Budgen. SEGRESS: Software Engineering Guidelines for REporting Secondary Studies. IEEE Transactions on Software Engineering, 2022, 49: 1273–1298. [CrossRef]
- He Zhang, Muhammad Ali Babar, Paolo Tell. Identifying relevant studies in software engineering. Inf. Softw. Technol., 2011, 53: 625–637. [CrossRef]
- Juyong Jiang, Fan Wang, Jiasi Shen, et al. A survey on large language models for code generation. arXiv preprint, 2024.
- Xiangping Chen, Xing Hu, Yuan Huang, et al. Deep learning-based software engineering: progress, challenges, and opportunities. Science China Information Sciences, 2025, 68: 111102. [CrossRef]
- Wenji Fang, Jing Wang, Yao Lu, et al. A survey of circuit foundation model: Foundation ai models for vlsi circuit design and eda. arXiv preprint, 2025.
- Zhuolun He, Bei Yu. Large language models for eda: Future or mirage? In: Proceedings of Proceedings of the 2024 International Symposium on Physical Design, 2024. 65–66.
- Lei Chen, Yiqi Chen, Zhufei Chu, et al. Large circuit models: opportunities and challenges. Science China Information Sciences, 2024, 67: 200402. [CrossRef]
- Xinyi Hou, Yanjie Zhao, Yue Liu, et al. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology, 2024, 33: 1–79. [CrossRef]
- Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5998-6008.
- Tom Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners. In: Proceedings of Advances in Neural Information Processing Systems, 2020. 1877–1901.
- Yuntao Bai, Andy Jones, Kamal Ndousse, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint, 2022.
- Shang Liu, Yao Lu, Wenji Fang, et al. Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation. In: Proceedings of Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024. 1–9.
- Xinyi Hou, Yanjie Zhao, Yue Liu, et al. Large Language Models for Software Engineering: A Systematic Literature Review. ACM Transactions on Software Engineering and Methodology, 2024, 33: 1-79. [CrossRef]
- Xin Zhou, Sicong Cao, Xiaobing Sun, et al. Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead. ACM Transactions on Software Engineering and Methodology, 2025, 34: 1-31. [CrossRef]
- Xing Hu, Feifei Niu, Junkai Chen, et al. Assessing and Advancing Benchmarks for Evaluating Large Language Models in Software Engineering Tasks. arXiv preprint, 2025.
- IEEE Xplore Database. https://ieeexplore.ieee.org.
- ACM Digital Library. https://dl.acm.org.
- ScienceDirect Database. https://www.sciencedirect.com.
- Web of Science Database. https://www.webofscience.com.
- SpringerLink Database. https://link.springer.com.
- arXiv Database. https://arxiv.org.
- Meisam Abdollahi, Seyedeh Faegheh Yeganli, Mohammad Baharloo, et al. Hardware design and verification with large language models: A scoping review, challenges, and open issues. Electronics, 2024, 14: 120. [CrossRef]
- Claudia Negri-Ribalta, Rémi Geraud-Stewart, Anastasia Sergeeva, et al. A systematic literature review on the impact of AI models on the security of code generation. Frontiers in Big Data, 2024, 7: 1386720. [CrossRef]
- Lanxin Yang, He Zhang, Haifeng Shen, et al. Quality Assessment in Systematic Literature Reviews: A Software Engineering Perspective. Information and Software Technology, 2021, 130: 106397. [CrossRef]
- Claes Wohlin. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of 18th International Conference on Evaluation and Assessment in Software Engineering, 2014. 1-10.
- Hugo Touvron, Thibaut Lavril, Gautier Izacard, et al. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint, 2023.
- Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, et al. The llama 3 herd of models. arXiv preprint, 2024.
- Xiao Bi, Deli Chen, Guanting Chen, et al. Deepseek llm: Scaling open-source language models with longtermism. arXiv preprint, 2024.
- Daya Guo, Qihao Zhu, Dejian Yang, et al. DeepSeek-Coder: When the Large Language Model Meets Programming–The Rise of Code Intelligence. arXiv preprint, 2024.
- Daya Guo, Dejian Yang, Haowei Zhang, et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature, 2025, 645: 633–638. [CrossRef]
- Jinze Bai, Shuai Bai, Yunfei Chu, et al. Qwen technical report. arXiv preprint, 2023.
- Binyuan Hui, Jian Yang, Zeyu Cui, et al. Qwen2. 5-coder technical report. arXiv preprint, 2024.
- Erik Nijkamp, Bo Pang, Hiroaki Hayashi, et al. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In: Proceedings of International Conference on Learning Representations, 2023.
- Erik Nijkamp, Hiroaki Hayashi, Caiming Xiong, et al. CodeGen2: Lessons for Training LLMs on Programming and Natural Languages. arXiv preprint, 2023.
- Raymond Li, Yangtian Zi, Niklas Muennighoff, et al. StarCoder: may the source be with you! Transactions on Machine Learning Research, 2023.
- Anton Lozhkov, Raymond Li, Loubna Ben Allal, et al. Starcoder 2 and the stack v2: The next generation. arXiv preprint, 2024.
- CodeGemma Team, Heri Zhao, Jeffrey Hui, et al. Codegemma: Open code models based on gemma. arXiv preprint, 2024.
- Yue Wang, Hung Le, Akhilesh Deepak Gotmare, et al. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. In: Proceedings of EMNLP 2023-2023 Conference on Empirical Methods in Natural Language Processing, 2023. 1069–1088.
- Suriya Gunasekar, Yi Zhang, Jyoti Aneja, et al. Textbooks are all you need. arXiv preprint, 2023.
- Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, et al. Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint, 2023.
- Bo Adler, Niket Agarwal, Ashwath Aithal, et al. Nemotron-4 340b technical report. arXiv preprint, 2024.
- OpenAI’s GPT series models. https://platform.openai.com/docs/models.
- Anthropic’s Claude series models. https://docs.claude.com/en/docs/about-claude/models/overview.
- Google’s Gemini series models. https://ai.google.dev/gemini-api/docs/models.
- Haotian Liu, Chunyuan Li, Qingyang Wu, et al. Visual instruction tuning. In: Proceedings of Advances in Neural Information Processing Systems, 2023. 34892–34916.
- Mingzhe Gao, Jieru Zhao, Zhe Lin, et al. Autovcoder: A systematic framework for automated verilog code generation using llms. In: Proceedings of 2024 IEEE 42nd International Conference on Computer Design (ICCD), 2024. 162–169.
- PEI Zehua, Huiling Zhen, Mingxuan Yuan, et al. Betterv: Controlled verilog generation with discriminative guidance. In: Proceedings of International Conference on Machine Learning, 2024.
- Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, et al. Verilogeval: Evaluating large language models for verilog code generation. In: Proceedings of 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2023. 1–8.
- Mingjie Liu, Yun-Da Tsai, Wenfei Zhou, et al. CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair. In: Proceedings of International Conference on Learning Representations, 2025.
- Yi Liu, Changran XU, Yunhao Zhou, et al. DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model. In: Proceedings of International Conference on Learning Representations, 2025.
- Yi Liu, Hongji Zhang, Yunhao Zhou, et al. DeepRTL2: A Versatile Model for RTL-Related Tasks. In: Proceedings of Findings of the Association for Computational Linguistics, 2025. 6485–6500.
- Sam Bush, Matthew DeLorenzo, Phat Tieu, et al. Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs. In: Proceedings of 2025 62nd ACM/IEEE Design Automation Conference (DAC), 2025. 1–7.
- Peiyang Wu, Nan Guo, Xiao Xiao, et al. Itertl: An iterative framework for fine-tuning llms for rtl code generation. In: Proceedings of 2025 IEEE International Symposium on Circuits and Systems (ISCAS), 2025. 1–5.
- Bardia Nadimi, Hao Zheng. A multi-expert large language model architecture for verilog code generation. In: Proceedings of 2024 IEEE LLM Aided Design Workshop (LAD), 2024. 1–5.
- Jinghua Wang, Lily Jiaxin Wan, Sanjana Pingali, et al. OpenRTLSet: A Fully Open-Source Dataset for Large Language Model-based Verilog Module Design. In: Proceedings of 2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025. 212–218.
- Bardia Nadimi, Ghali Omar Boutaib, Hao Zheng. Pyranet: A multi-layered hierarchical dataset for verilog. In: Proceedings of 2025 62nd ACM/IEEE Design Automation Conference (DAC), 2025. 1–7.
- Mohammad Akyash, Kimia Azar, Hadi Kamali. RTL++: Graph-enhanced LLM for RTL Code Generation. In: Proceedings of 2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025. 44–50.
- Peiyang Wu, Nan Guo, Junliang Lv, et al. RTLRepoCoder: Repository-Level RTL Code Completion through the Combination of Fine-Tuning and Retrieval Augmentation. arXiv preprint, 2025.
- Chenhui Deng, Yun-Da Tsai, Guan-Ting Liu, et al. ScaleRTL: Scaling LLMs with Reasoning Data and Test-Time Compute for Accurate RTL Code Generation. In: Proceedings of 2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD), 2025. 1–9.
- Prithwish Basu Roy, Akashdeep Saha, Manaar Alam, et al. Veritas: Deterministic Verilog Code Synthesis from LLM-Generated Conjunctive Normal Form. arXiv preprint, 2025.
- Kaiyan Chang, Ying Wang, Haimeng Ren, et al. Chipgpt: How far are we from natural language hardware design. arXiv preprint, 2023.
- Yang Zhao, Di Huang, Chongxiao Li, et al. CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025. Early Access. [CrossRef]
- Yaoyu Zhu, Di Huang, Hanqi Lyu, et al. CodeV-R1: Reasoning-Enhanced Verilog Generation. In: Proceedings of Advances in Neural Information Processing Systems, 2025. Accept.
- Gaoche Zhang, Dingyang Zou, Kairui Sun, et al. GEMMV: An LLM-based Automated Performance-Aware Framework for GEMM Verilog Generation. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2025, 15: 325-336. [CrossRef]
- Weimin Fu, Shijie Li, Yifang Zhao, et al. Hardware phi-1.5 b: A large language model encodes hardware domain specific knowledge. In: Proceedings of 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), 2024. 349–354.
- Yiyao Yang, Fu Teng, Pengju Liu, et al. Haven: Hallucination-mitigated llm for verilog code generation aligned with hdl engineers. In: Proceedings of 2025 Design, Automation & Test in Europe Conference (DATE), 2025. 1–7.
- Yongan Zhang, Zhongzhi Yu, Yonggan Fu, et al. Mg-verilog: Multi-grained dataset towards enhanced llm-assisted verilog generation. In: Proceedings of 2024 IEEE LLM Aided Design Workshop (LAD), 2024. 1–5.
- Emil Goh, Maoyang Xiang, I Wey, et al. From english to asic: Hardware implementation with large language model. arXiv preprint, 2024.
- Fan Cui, Chenyang Yin, Kexing Zhou, et al. Origen: Enhancing rtl code generation with code-to-code augmentation and self-reflection. In: Proceedings of Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024. 1–9.
- Haiyan Qin, Zhiwei Xie, Jingjing Li, et al. ReasoningV: Efficient Verilog Code Generation with Adaptive Hybrid Reasoning Model. arXiv preprint, 2025.
- Shang Liu, Wenji Fang, Yao Lu, et al. Rtlcoder: Fully open-source and efficient llm-assisted rtl code generation technique. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024. [CrossRef]
- Anjiang Wei, Huanmi Tan, Tarun Suresh, et al. VeriCoder: Enhancing LLM-Based RTL Code Generation through Functional Correctness Validation. arXiv preprint, 2025.
- Shailja Thakur, Baleegh Ahmad, Hammond Pearce, et al. Verigen: A large language model for verilog code generation. ACM Transactions on Design Automation of Electronic Systems, 2024, 29: 1–31. [CrossRef]
- Kyungjun Min, Seonghyeon Park, Hyeonwoo Park, et al. Improving LLM-Based Verilog Code Generation with Data Augmentation and RL. In: Proceedings of 2025 Design, Automation & Test in Europe Conference (DATE), 2025. 1–7.
- Ning Wang, Bingkun Yao, Jie Zhou, et al. Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback. arXiv preprint, 2025.
- Yiting Wang, Guoheng Sun, Wanghao Ye, et al. VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation. arXiv preprint, 2025.
- Ning Wang, Bingkun Yao, Jie Zhou, et al. Large language model for verilog generation with code-structure-guided reinforcement learning. In: Proceedings of 2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025. 164–170.
- Patrick Yubeaton, Andre Nakkab, Weihua Xiao, et al. Verithoughts: Enabling automated verilog code generation using reasoning and formal verification. In: Proceedings of Advances in Neural Information Processing Systems, 2025. Accept.
- Shailja Thakur, Baleegh Ahmad, Zhenxing Fan, et al. Benchmarking large language models for automated verilog rtl code generation. In: Proceedings of 2023 Design, Automation & Test in Europe Conference Exhibition (DATE), 2023. 1–6.
- Vyom Kumar Gupta, Abhishek Yadav, Masahiro Fujita, et al. LLM-aided Front-End Design Framework For Early Development of Verified RTLs. In: Proceedings of 2024 IEEE 33rd Asian Test Symposium (ATS), 2024. 1–6.
- Matthew DeLorenzo, Animesh Basak Chowdhury, Vasudev Gohil, et al. Make every move count: Llm-based high-quality rtl code generation using mcts. arXiv preprint, 2024.
- Paola Vitolo, George Psaltakis, Michael Tomlinson, et al. Natural Language to Verilog: Design of a Recurrent Spiking Neural Network using Large Language Models and ChatGPT. In: Proceedings of 2024 International Conference on Neuromorphic Systems (ICONS), 2024. 110–116.
- Sam-Zaak Wong, Gwok-Waa Wan, Dongping Liu, et al. VGV: Verilog generation using visual capabilities of multi-modal large language models. In: Proceedings of 2024 IEEE LLM Aided Design Workshop (LAD), 2024. 1–5.
- Cangyuan Li, Chujie Chen, Yudong Pan, et al. Autosilicon: Scaling up rtl design generation capability of large language models. ACM Transactions on Design Automation of Electronic Systems, 2025, 30: 1–26. [CrossRef]
- Jinwei Tang, Jiayin Qin, Kiran Thorat, et al. Hivegen–hierarchical llm-based verilog generation for scalable chip design. In: Proceedings of 2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025. 30–36.
- Jianmin Ye, Tianyang Liu, Qi Tian, et al. ChatModel: Automating Reference Model Design and Verification with LLMs. arXiv preprint, 2025.
- Shailja Thakur, Jason Blocklove, Hammond Pearce, et al. Autochip: Automating hdl generation using llm feedback. arXiv preprint, 2023.
- Kaiyan Chang, Zhirong Chen, Yunhao Zhou, et al. Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation. In: Proceedings of Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024. 1–9.
- Matthew DeLorenzo, Vasudev Gohil, Jeyavijayan Rajendran. Creativeval: Evaluating creativity of llm-based hardware code generation. In: Proceedings of 2024 IEEE LLM Aided Design Workshop (LAD), 2024. 1–5.
- Jason Blocklove, Siddharth Garg, Ramesh Karri, et al. Evaluating llms for hardware design and test. In: Proceedings of 2024 IEEE LLM Aided Design Workshop (LAD), 2024. 1–6.
- Andre Nakkab, Sai Qian Zhang, Ramesh Karri, et al. Rome was not built in a single step: Hierarchical prompting for llm-based chip design. In: Proceedings of Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, 2024. 1–11.
- Yao Lu, Shang Liu, Qijun Zhang, et al. Rtllm: An open-source benchmark for design rtl generation with large language model. In: Proceedings of 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), 2024. 722–727.
- Ahmed Allam, Mohamed Shalan. Rtl-repo: A benchmark for evaluating llms on large-scale rtl design projects. In: Proceedings of 2024 IEEE LLM Aided Design Workshop (LAD), 2024. 1–5.
- Suresh Purini, Siddhant Garg, Mudit Gaur, et al. ArchXBench: A Complex Digital Systems Benchmark Suite for LLM Driven RTL Synthesis. In: Proceedings of 2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD), 2025. 1–10.
- Nathaniel Pinckney, Chenhui Deng, Chia-Tung Ho, et al. Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification. arXiv preprint, 2025.
- Gwok-Waa Wan, Yubo Wang, Sam-Zaak Wong, et al. GenBen: A Generative Benchmark for LLM-Aided Design. Arxiv, 2025.
- Pengwei Jin, Di Huang, Chongxiao Li, et al. RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs. arXiv preprint, 2025.
- Ce Guo, Tong Zhao. ResBench: A Resource-Aware Benchmark for LLM-Generated FPGA Designs. In: Proceedings of Proceedings of the 15th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2025. 25–34.
- Nathaniel Pinckney, Christopher Batten, Mingjie Liu, et al. Revisiting verilogeval: A year of improvements in large-language models for hardware code generation. ACM Transactions on Design Automation of Electronic Systems, 2025, 30:1–20. [CrossRef]
- Paul E Calzada, Zahin Ibnat, Tanvir Rahman, et al. VerilogDB: The Largest, Highest-Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation. arXiv preprint, 2025.
- Kaiyan Chang, Kun Wang, Nan Yang, et al. Data is all you need: Finetuning llms for chip design via an automated design-data augmentation framework. In: Proceedings of Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024. 1–6.
- Shang Liu, Wenji Fang, Yao Lu, et al. RTLCoder: Outperforming GPT-3.5 in Design RTL Generation with Our Open-Source Dataset and Lightweight Solution. In: Proceedings of 2024 IEEE LLM Aided Design Workshop (LAD), 2024. 1–5.
- Zeju Li, Changran Xu, Zhengyuan Shi, et al. DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis. In: Proceedings of 2025 IEEE International Conference on LLM-Aided Design (ICLAD), 2025. 204–211.
- Xiang Chen, Guang Yang, Zhan-Qi Cui, et al. Survey of State-of-the-art Automatic Code Comment Generation. Journal of Software, 2021, 32: 2118.
- HDLBits Website. https://hdlbits.01xz.net/wiki/Main_Page.
- Manar Abdelatty, Jingxiao Ma, Sherief Reda. MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs. In: Proceedings of Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025. 995–1001.
- OpenCores Website. https://opencores.org.
- Enrique Dehaerne, Bappaditya Dey, Sandip Halder, et al. A deep learning framework for verilog autocompletion towards design and verification automation. arXiv preprint, 2023.
- Mohammad Akyash, Hadi Mardani Kamali. Simeval: Investigating the similarity obstacle in llm-based hardware code generation. In: Proceedings of Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025. 1002–1007.
- Guang Yang, Wei Zheng, Xiang Chen, et al. The Cream Rises to the Top: Efficient Reranking Method for Verilog Code Generation. arXiv preprint, 2025.
- Matthew DeLorenzo, Kevin Tieu, Prithwish Jana, et al. Abstraction-of-Thought: Intermediate Representations for LLM Reasoning in Hardware Design. arXiv preprint, 2025.
- Kangbo Bai, Peiran Yan, Lifeng Liu, et al. VeriRAG: Design AI-Specific CPU Co-processor with RAG-Enhanced LLMs. In: Proceedings of 2025 International Symposium of Electronics Design Automation (ISEDA), 2025. 76–82.
- Heng Ping, Shixuan Li, Peiyu Zhang, et al. Hdlcore: A training-free framework for mitigating hallucinations in llm-generated hdl. arXiv preprint, 2025.
- Mohammad Akyash, Kimia Azar, Hadi Kamali. DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs. arXiv preprint, 2025.
- Zhuorui Zhao, Ruidi Qiu, Chao Lin, et al. Vrank: Enhancing verilog code generation from large language models via self-consistency. In: Proceedings of 2025 26th International Symposium on Quality Electronic Design (ISQED), 2025. 1–7.
- Changran Xu, Yi Liu, Yunhao Zhou, et al. Speculative Decoding for Verilog: Speed and Quality, All in One. In: Proceedings of 62nd ACM/IEEE Design Automation Conference, DAC 2025, San Francisco, CA, USA, June 22-25, 2025, 2025. 1–7.
- Ziteng Hu, Yingjie Xia, Xiyuan Chen, et al. SecFSM: Knowledge Graph-Guided Verilog Code Generation for Secure Finite State Machines in Systems-on-Chip. arXiv preprint, 2025.
- Fanghao Fan, YingJie Xia, Li Kuang. SecV: LLM-based Secure Verilog Generation with Clue-Guided Exploration on Hardware-CWE Knowledge Graph. In: Proceedings of Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025. 8049–8057.
- Lakshmi Likhitha Mankali, Jitendra Bhandari, Manaar Alam, et al. Rtl-breaker: Assessing the security of llms against backdoor attacks on hdl code generation. In: Proceedings of 2025 Design, Automation & Test in Europe Conference (DATE), 2025. 1–7.
- Zeng Wang, Minghao Shao, Jitendra Bhandari, et al. VeriContaminated: Assessing LLM-driven verilog coding for data contamination. arXiv preprint, 2025.
- Zeng Wang, Minghao Shao, Rupesh Karn, et al. SALAD: Systematic Assessment of Machine Unlearing on LLM-Aided Hardware Design. arXiv preprint, 2025.
- Kimia Tasnia, Alexander Garcia, Tasnuva Farheen, et al. Veriopt: Ppa-aware high-quality verilog generation via multi-role llms. arXiv preprint, 2025.
- Kaiyan Chang, Wenlong Zhu, Kun Wang, et al. A data-centric chip design agent framework for Verilog code generation. ACM Transactions on Design Automation of Electronic Systems, 2025, 30: 1-27.
- Kiran Thorat, Jiahui Zhao, Yaotian Liu, et al. LLM-VeriPPA: Power, Performance, and Area Optimization aware Verilog Code Generation with Large Language Models. In: Proceedings of 2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD), 2025. 1–7.
- Bowei Wang, Qi Xiong, Zeqing Xiang, et al. Rtlsquad: Multi-agent based interpretable rtl design. arXiv preprint, 2025.
- Kun Wang, Kaiyan Chang, Mengdi Wang, et al. Rtlmarker: Protecting llm-generated rtl copyright via a hardware watermarking framework. In: Proceedings of Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025. 808–813.
- Jiazheng Zhang, Cheng Liu, Huawei Li. Understanding and Mitigating Errors of LLM-Generated RTL Code. arXiv preprint, 2025.
- Wenhao Sun, Bing Li, Grace Li Zhang, et al. Paradigm-based automatic hdl code generation using llms. In: Proceedings of 2025 26th International Symposium on Quality Electronic Design (ISQED), 2025. 1–8.
- Hanxian Huang, Zhenghan Lin, Zixuan Wang, et al. Towards llm-powered verilog rtl assistant: Self-verification and self-correction. arXiv preprint, 2024.
- Yunda Tsai, Mingjie Liu, Haoxing Ren. RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Model. In: Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024. 1–6.
- Ping Guo, Yiting Wang, Wanghao Ye, et al. EvoVerilog: Large Langugage Model Assisted Evolution of Verilog Code. arXiv preprint, 2025.
- Jason Blocklove, Shailja Thakur, Benjamin Tan, et al. Automatically Improving LLM-based Verilog Generation using EDA Tool Feedback. ACM Transactions on Design Automation of Electronic Systems, 2025, 30: 1-26. [CrossRef]
- Mubashir ul Islam, Humza Sami, Pierre-Emmanuel Gaillardon, et al. AIvril: AI-Driven RTL Generation With Verification In-The-Loop. arXiv preprint, 2024.
- Sriram Ranga, Rui Mao, Debjyoti Bhattacharjee, et al. RTL Agent: An Agent-Based Approach for Functionally Correct HDL Generation via LLMs. In: Proceedings of 2024 IEEE 33rd Asian Test Symposium (ATS), 2024. 1–6.
- Yujie Zhao, Hejia Zhang, Hanxian Huang, et al. MAGE: A Multi-Agent Engine for Automated RTL Code Generation. In: Proceedings of 2025 62nd ACM/IEEE Design Automation Conference (DAC), 2025. 1–7.
- Chia-Tung Ho, Haoxing Ren, Brucek Khailany. VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2025. 300-307. [CrossRef]
- Zhendong Mi, Renming Zheng, Haowen Zhong, et al. CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation. arXiv preprint, 2025.
- Humza Sami, Mubashir ul Islam, Samy Charas, et al. Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation. arXiv preprint, 2025.
- Yangbo Wei, Zhen Huang, Huang Li, et al. VFlow: Discovering Optimal Agentic Workflows for Verilog Generation. arXiv preprint, 2025.
- Bardia Nadimi, Ghali Omar Boutaib, Hao Zheng. VeriMind: Agentic LLM for Automated Verilog Generation with a Novel Evaluation Metric. arXiv preprint, 2025.
- Claudia Negri-Ribalta, Rémi Geraud-Stewart, Anastasia Sergeeva, et al. A systematic literature review on the impact of AI models on the security of code generation. Frontiers in Big Data, 2024, 7: 1386720. [CrossRef]
- Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, et al. BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions. In: Proceedings of International Conference on Learning Representations, 2025.
- Tianyang Liu, Canwen Xu, Julian McAuley. RepoBench: A Benchmark for Repo-Level Code Generation. In: Proceedings of International Conference on Learning Representations, 2023.
- Carlos E. Jimenez, John Yang, Alexander Wettig, et al. SWE-bench: Can Language Models Resolve Real-World GitHub Issues? In: Proceedings of International Conference on Learning Representations, 2024.







| Survey | Year | Papers | Topics | Verilog Focus |
|---|---|---|---|---|
| General Code Generation Surveys | ||||
| Jiang et al. [12] | 2024 | 235 | General Programming Languages | ✗ |
| Joel et al. [6] | 2024 | 111 | Low-Resource and Domain-Specific Programming Languages | Partial |
| Chen et al. [13] | 2025 | 601 | Deep Learning-based Software Engineering | ✗ |
| EDA-focused Surveys | ||||
| Fang et al. [14] | 2025 | 250 | Layout/Netlists/HDL/Assertion Generation | Partial |
| He et al. [15] | 2024 | 211 | EDA Generation, Verification, and Debugging | Partial |
| Chen et al. [16] | 2024 | A Paradigm Shift from AI4EDA towards AI-rooted EDA (full-workflow AI4EDA) | Partial | |
| Our Verilog-specific Survey | ||||
| Our work | 2025 | 102 | Verilog Code Generation | ✓ |
| Acronym | Venues |
|---|---|
| AAAI | AAAI Conference on Artificial Intelligence |
| ACL | Annual Meeting of the Association for Computational Linguistics |
| ICML | International Conference on Machine Learning |
| ICLR | International Conference on Learning Representations |
| NeurIPS | Conference on Neural Information Processing Systems |
| DAC | Design Automation Conference |
| TCAD | IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems |
| Inclusion criteria | |
|---|---|
| 1) | The paper claims that an LLM is used. |
| 2) | The paper claims that the study involves to solve Verilog code generation. |
| 3) | The paper with accessible full text and must be written in English. |
| Exclusion criteria | |
| 1) | Short papers whose number of pages is less than 5. |
| 2) | Books, records, theses, tool demos papers, editorials, or venues not subject to a full peer-review process. |
| 3) | The paper is a literature review or survey. |
| 4) | The paper mentions LLMs only in future work or discussions rather than using LLMs in the approach. |
| 5) | The paper does not involve Verilog code generation. |
| 6) | Duplicate papers or similar studies authored by the same authors. |
| ID | Quality Assessment Criteria |
|---|---|
| QAC1 | Was the study published in a prestigious (CCF ranking) venue? |
| QAC2 | Does the study make a contribution to the academic or industrial community? |
| QAC3 | Does the study provide a clear description of the workflow and implementation of the proposed approach? |
| QAC4 | Are the experiment details, including datasets, baselines, and evaluation metrics, clearly outlined? |
| QAC5 | Do the findings from the experiments strongly support the main arguments presented in the study? |
| LLM Family | LLM Name | 2023 | 2024 | 2025 | Total |
|---|---|---|---|---|---|
| Llama Series | CodeLlama | 0 | 7 | 16 | 23 |
| Llama | 0 | 0 | 1 | 1 | |
| Llama2 | 0 | 2 | 5 | 7 | |
| Llama3 | 0 | 4 | 7 | 11 | |
| Llama3.1 | 0 | 0 | 18 | 18 | |
| Llama3.2 | 0 | 0 | 1 | 1 | |
| Llama3.3 | 0 | 0 | 1 | 1 | |
| LlaVa | 0 | 2 | 0 | 2 | |
| Subtotal | 0 | 15 | 49 | 64 | |
| DeepSeek Series | DeepSeek-Coder | 0 | 6 | 21 | 27 |
| DeepSeek-Coder-V2 | 0 | 0 | 5 | 5 | |
| DeepSeek-R1 | 0 | 0 | 8 | 8 | |
| DeepSeek-R1-Distill | 0 | 0 | 5 | 5 | |
| DeepSeek-V2 | 0 | 0 | 1 | 1 | |
| DeepSeek-V2.5 | 0 | 0 | 1 | 1 | |
| DeepSeek-V3 | 0 | 0 | 7 | 7 | |
| Subtotal | 0 | 6 | 48 | 54 | |
| Qwen Series | CodeQwen | 0 | 3 | 10 | 13 |
| Qwen2.5-Coder | 0 | 0 | 12 | 12 | |
| Qwen3 | 0 | 0 | 1 | 1 | |
| Qwen-vl | 0 | 0 | 1 | 1 | |
| QWQ | 0 | 0 | 1 | 1 | |
| Subtotal | 0 | 3 | 25 | 28 | |
| CodeGen Series | CodeGen | 3 | 2 | 0 | 5 |
| CodeGen2 | 0 | 4 | 1 | 5 | |
| CodeGen2.5 | 0 | 0 | 2 | 2 | |
| Subtotal | 3 | 6 | 3 | 12 | |
| StarCoder Series | StarCoder | 0 | 4 | 3 | 7 |
| StarCoder2 | 0 | 1 | 3 | 4 | |
| Subtotal | 0 | 5 | 6 | 11 | |
| Others | Various Models | 2 | 11 | 22 | 35 |
| Total Open-Source | 5 | 46 | 153 | 204 | |
| LLM Family | LLM Name | 2023 | 2024 | 2025 | Total |
|---|---|---|---|---|---|
| GPT Series | code-davinci-002 | 1 | 1 | 0 | 2 |
| GPT-3.5 | 3 | 16 | 24 | 43 | |
| GPT-4 | 3 | 20 | 29 | 52 | |
| GPT-4o | 0 | 4 | 31 | 35 | |
| GPT-4-turbo | 0 | 0 | 2 | 2 | |
| GPT-4V | 0 | 1 | 1 | 2 | |
| GPT-o1 | 0 | 0 | 6 | 6 | |
| GPT-o3 | 0 | 0 | 4 | 4 | |
| GPT-o4 | 0 | 0 | 3 | 3 | |
| Subtotal | 7 | 42 | 100 | 149 | |
| Claude Series | Claude 2 | 0 | 1 | 0 | 1 |
| Claude 3 | 0 | 2 | 2 | 4 | |
| Claude 3.5 | 0 | 2 | 6 | 8 | |
| Claude 3.7 | 0 | 0 | 5 | 5 | |
| Claude 4 | 0 | 0 | 1 | 1 | |
| Subtotal | 0 | 5 | 14 | 19 | |
| Gemini | Gemini | 0 | 1 | 4 | 5 |
| Others | Various Models | 0 | 3 | 3 | 6 |
| Total Closed-Source | 7 | 51 | 121 | 179 | |
| Verilog-Specific LLM | Foundation Model | Weight | URL |
|---|---|---|---|
| Closed-Source (Model Weights) | |||
| AutoVCoder [55] | CL/CQ/DS-Coder | Closed | - |
| BetterV [56] | CL/CQ/DS-Coder | Closed | - |
| CodeGen-Verilog [57] | CodeGen | Closed | - |
| CraftRTL [58] | CL/DS-Coder/StarCoder2 | Closed | - |
| DeepRTL [59] | CodeT5+ | Closed | - |
| DeepRTL2 [60] | Llama3.1/DS-Coder | Closed | - |
| FreeV [61] | Llama3.1 | Closed | - |
| ITERTL [62] | DS-Coder/DS-Coder-v2 | Closed | - |
| MEV-LLM [63] | CodeGen/Gemma | Closed | - |
| OpenRTLSet [64] | Qwen2.5 | Closed | - |
| PyraNet [65] | CL/DS-Coder | Closed | - |
| RTL++ [66] | CL | Closed | - |
| RTLRepoCoder [67] | DS-Coder | Closed | - |
| ScaleRTL [68] | DS-R1-Distill-Qwen | Closed | - |
| Veritas [69] | Llama-3.2 | Closed | - |
| Open-Source (Model Weights) | |||
| DAVE [9] | GPT-2 | Open | https://shorturl.asia/VWPTn |
| ChipGPT [70] | Llama2/Llama3 | Open | https://www.modelscope.cn/profile/changkaiyan |
| CodeV [71] | CQ/DS-Coder/QC | Open | https://huggingface.co/zhuyaoyu |
| CodeV-R1 [72] | QC | Open | https://huggingface.co/zhuyaoyu |
| GEMMV [73] | DS-R1-Distill-Qwen/Llama3.1 | Open | https://huggingface.co/bxsk2024 |
| Hardware Phi-1.5B [74] | Phi-1.5B | Open | https://huggingface.co/KSU-HW-SEC/Hardware_Phi_30k_version |
| HAVEN [75] | CL/DS-Coder/CQ | Open | https://huggingface.co/yangyiyao |
| MG-Verilog [76] | CL | Open | https://shorturl.asia/rDNSm |
| Mistral-Verilog [77] | Mistral | Open | https://huggingface.co/emilgoh/mistral-verilog |
| Origen [78] | DS-Coder | Open | https://huggingface.co/henryen/OriGen |
| ReasoningV [79] | QC | Open | https://modelscope.cn/models/GipsyAI/ReasoningV-7B |
| RTLCoder [80] | Mistral/DS-Coder | Open | https://github.com/hkust-zhiyao/RTL-Coder |
| VeriCoder [81] | QC | Open | https://huggingface.co/LLM4Code/VeriCoder_Qwen14B |
| VeriGen [82] | CodeGen | Open | https://huggingface.co/shailja |
| VeriLogos [83] | DS-Coder | Open | https://huggingface.co/97kjmin/VeriLogos |
| VeriPrefer [84] | Mistral/CL/DS-Coder/CQ/QC | Open | https://shorturl.asia/sMj3J |
| VeriReason [85] | QC/CL | Open | https://shorturl.asia/yaNkf |
| VeriSeek [86] | DS-Coder | Open | https://huggingface.co/LLM-EDA/VeriSeek |
| VeriThoughts [87] | QC | Open | https://shorturl.asia/81f2a |
| Dataset | Year | Input | Samples | Testbench | Available | URL |
|---|---|---|---|---|---|---|
| Closed-Source | ||||||
| DAVE_test [9] | 2020 | Text | 250 | No | Closed | - |
| LLM-aided [89] | 2024 | Text | 10 | Yes | Closed | - |
| MCTS [90] | 2024 | Text | 15 | Yes | Closed | - |
| NL2Verilog [91] | 2024 | Text | 8 | Yes | Closed | - |
| VGV [92] | 2024 | Text+Image | 20 | Yes | Closed | - |
| AutoSilicon_test [93] | 2025 | Text | 9 | Yes | Closed | - |
| GEMMV_test [73] | 2025 | Text | 10 | Yes | Closed | - |
| HiVeGen_test [94] | 2025 | Text | 4 | Yes | Closed | - |
| ModelEval_test [95] | 2025 | Text | 94 | Yes | Closed | - |
| Open-Source | ||||||
| ChipGPT [70] | 2023 | Text | 8 | Yes | Open | https://zenodo.org/records/7953725 |
| VeriGen_test [88] | 2023 | Text | 17 | Yes | Open | https://github.com/shailja-thakur/VGen |
| VerilogEval-v1 [57] | 2023 | Text | 156/143 | Yes | Open | https://github.com/NVlabs/verilog-eval/tree/release/1.0.0 |
| AutoChip [96] | 2024 | Text | 138 | Yes | Open | https://zenodo.org/records/10160723 |
| ChipGPTV [97] | 2024 | Text+Image | 30 | Yes | Open | https://github.com/aichipdesign/chipgptv |
| CreativEval [98] | 2024 | Text | 120 | Yes | Open | https://github.com/matthewdelorenzo/CreativEval |
| Evaluatie_LLMs [99] | 2024 | Text | 8 | Yes | Open | https://zenodo.org/records/10947127 |
| Hierarchical [100] | 2024 | Text | 10 | Yes | Open | https://github.com/ajn313/ROME-LLM |
| RTLLM-v1 [101] | 2024 | Text | 30 | Yes | Open | https://github.com/hkust-zhiyao/RTLLM/tree/v1.1 |
| RTLLM-v2 [21] | 2024 | Text | 50 | Yes | Open | https://github.com/hkust-zhiyao/RTLLM/tree/main |
| RTLRepo_test [102] | 2024 | Text+Code | 1.17k | No | Open | https://huggingface.co/datasets/ahmedallam/RTL-Repo |
| ArchXBench [103] | 2025 | Text | 51 | Yes | Open | https://github.com/sureshpurini/ArchXBench |
| CVDP [104] | 2025 | Text | 783 | Yes | Open | https://github.com/NVlabs/cvdp_benchmark |
| GenBen [105] | 2025 | Text+Image | 324 | Yes | Open | https://github.com/ChatDesignVerification/GenBen |
| RealBench [106] | 2025 | Text+Image | 60 | Yes | Open | https://github.com/IPRC-DIP/RealBench |
| ResBench [107] | 2025 | Text | 56 | Yes | Open | https://github.com/jultrishyyy/ResBench |
| VerilogEval-v2 [108] | 2025 | Text | 156 | Yes | Open | https://github.com/NVlabs/verilog-eval/tree/main/ |
| VeriThoughts [87] | 2025 | Text | 291 | No | Open | https://huggingface.co/datasets/wilyub/VeriThoughtsBenchmark |
| Dataset | Year | Input | Samples | Testbench | Available | URL |
|---|---|---|---|---|---|---|
| Closed-Source | ||||||
| DAVE_train [9] | 2020 | Text | 5k | No | Closed | - |
| VerilogEval_train [57] | 2023 | Text | 8.5k | No | Closed | - |
| BetterV [56] | 2024 | Text | Unknown | No | Closed | - |
| MEV-LLM [63] | 2024 | Text | 31k | No | Closed | - |
| CodeV [71] | 2025 | Text | 165k | No | Closed | - |
| CraftRTL [58] | 2025 | Text | 80K | No | Closed | - |
| DeepRTL [59] | 2025 | Text | 556k | No | Closed | - |
| DeepRTL2 [60] | 2025 | Text | 433k | No | Closed | - |
| GEMMV_train [73] | 2025 | Text | 12k | No | Closed | - |
| OpenRTLSet [64] | 2025 | Text | 98K | No | Closed | - |
| ScaleRTL [68] | 2025 | Text | 62k | No | Closed | - |
| VerilogDB [109] | 2025 | Text | 20k | No | Closed | - |
| Open-Source | ||||||
| VeriGen_train [88] | 2023 | Text | 109k | No | Open | https://huggingface.co/datasets/shailja/Verilog_GitHub |
| AutoVCoder-Data [55] | 2024 | Text | 1M | No | Open | https://github.com/sjtu-zhao-lab/AutoVCoder/tree/main/data |
| ChipGPT-FT-Data [110] | 2024 | Text | 124k | No | Open | https://modelscope.cn/datasets/changkaiyan/chipgptseries |
| MG-Verilog-Data [76] | 2024 | Text | 11k | No | Open | https://huggingface.co/datasets/GaTech-EIC/MG-Verilog |
| Origen-Data [78] | 2024 | Text | 222k | No | Open | https://huggingface.co/datasets/henryen/origen_dataset_instruction |
| RTLCoder-Data [80,111] | 2024 | Text | 27K | No | Open | https://github.com/hkust-zhiyao/RTL-Coder/tree/main/dataset |
| RTLRepo_train [102] | 2024 | Text+Code | 2.92k | No | Open | https://huggingface.co/datasets/ahmedallam/RTL-Repo |
| Verilog-dateset [77] | 2024 | Text | 68k | No | Open | https://huggingface.co/datasets/emilgoh/verilog-dataset-v2 |
| CodeV-R1-Data [72] | 2025 | Text | 3k | No | Open | https://huggingface.co/datasets/zhuyaoyu/CodeV-R1-dataset |
| DeepCircuitX [112] | 2025 | Text | 28k | No | Open | https://drive.google.com/file/d/1Y002eJQPMbrEX7IpmzXDpFRlGl0XgUFu |
| Haven-KL [75] | 2025 | Text | 62k | No | Open | https://huggingface.co/datasets/yangyiyao/HaVen-KL-Dataset |
| OpenCores [86] | 2025 | Text | 834 | No | Open | https://huggingface.co/datasets/LLM-EDA/opencores |
| PyraNet [65] | 2025 | Text | 692k | No | Open | https://huggingface.co/datasets/bnadimi/PyraNet-Verilog |
| ReasoningV-Data [79] | 2025 | Text | 5k | No | Open | https://huggingface.co/datasets/GipAI/ReaoningV |
| RTL++-Data [66] | 2025 | Text+Graph | 11.7k | No | Open | https://huggingface.co/datasets/makyash/RTL-PP |
| VeriCoder-Origen [81] | 2025 | Text | 126k | Yes | Open | https://huggingface.co/datasets/LLM4Code/expanded_origen_126k |
| VeriCoder-RTLCoder [81] | 2025 | Text | 12k | Yes | Open | https://huggingface.co/datasets/LLM4Code/expanded_rtlcoder_12k |
| VeriLogos-Data [83] | 2025 | Text | 10k | No | Open | https://huggingface.co/datasets/97kjmin/VeriLogos_Augmented_Dataset |
| VeriPrefer-Data [84] | 2025 | Text | 90k | No | Open | https://huggingface.co/datasets/LLM-EDA/pyra |
| VeriReason-Data [85] | 2025 | Text | 1k | Yes | Open | https://huggingface.co/Nellyw888/datasets |
| VeriThoughts-Data [87] | 2025 | Text | 20k | No | Open | https://huggingface.co/datasets/wilyub/VeriThoughtsTrainSet |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).