Preprint
Article

This version is not peer-reviewed.

Grammar-Guided Incremental Method for Efficient LLM-Generated Code Execution

Submitted:

31 March 2026

Posted:

02 April 2026

You are already at the latest version

Abstract
Rapid advancements in large language models with code generation abilities have enabled new paradigms in automated software development, positioning AI both as a coding assistant and an active actor within complex software ecosystems. Traditional code generation pipelines, mostly relying on tool calling via ReAct approach, require a complete code snippet to be generated and followed by validation and correction, often leading to significant latency and resource overhead due to sequential inference and execution processes. This research introduces a novel asynchronous inference algorithm that integrates context-free grammar parsing with real-time REPL-based execution, enabling early detection of syntax, semantic, and runtime errors without completing entire code snippets. We formally define the suitability criteria for LLMs in a target programming language, establish parse-tree-based identification of top-level statements, and present an incremental buffer-parsing mechanism that triggers execution upon recognition of complete statements. Implemented for Python 3 using the Lark parser and evaluated on a modified MBPP split ($N{=}113$ tasks; dataset and prompts in the Appendix) across six models---CodeAct--Mistral, GPT-OSS~20B, Gemma~3, Llama~3.2, Phi~4, and Qwen3-Coder~30B---our method is compared to a synchronous baseline using paired Wilcoxon tests with Bonferroni correction. Empirical results show significantly faster time-to-first-output for every model, large reductions in total latency where top-level script execution dominates (up to roughly an order of magnitude for CodeAct--Mistral), and no material change in pass or correctness rates, indicating that incremental execution improves responsiveness without altering task outcomes. With special prompting or finetuning, the method shows up to 4x reduction in latency for valid code generation. The benchmark results confirm that synchronous inference constraints can be alleviated through grammar-guided incremental execution, allowing more efficient and responsive agent-driven code execution workflows. Future research will explore predictive parsing techniques, deeper integration with agentic system architectures, security constraints, and formulating runtime requirements for scalable deployment of LLM-generated code execution environments.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated