Submitted:
20 December 2025
Posted:
22 December 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. LLM-Based Code Generation
2.2. Multi-Agent Systems for Software Development
2.3. Cloud-Native Development Automation
3. System Design
3.1. Orchestrator Agent
3.2. Architect Agent
- Service boundaries and responsibilities following domain-driven design principles.
- RESTful API specifications in OpenAPI format, including endpoints, request/response schemas, and authentication requirements.
- Data models with entity relationships and validation constraints.
- Inter-service communication patterns (synchronous REST, asynchronous messaging).
3.3. Coder Agents
3.4. Tester Agent
3.5. Ops Agent
3.6. Inter-Agent Communication and Conflict Resolution
4. CloudDevBench: A Benchmark for End-to-End Development
4.1. Task Collection
4.2. Evaluation Metrics
4.3. Baseline Systems
5. Experimental Results
5.1. Overall Performance
5.2. Performance by Task Complexity
5.3. Ablation Study
5.4. Efficiency Analysis
5.5. Human Evaluation
6. Discussion
6.1. Key Findings
6.2. Limitations and Threats to Validity
6.3. Implications for Practice
7. Conclusion
References
- Richardson, C. Microservices Patterns: With Examples in Java; Manning Publications, 2018. [Google Scholar]
- OpenAI, *!!! REPLACE !!!*. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Anthropic. The Claude Model Family: Claude 3.5 Sonnet Model Card Addendum. Online. 2024.
- Chen, M.; et al. Evaluating Large Language Models Trained on Code. arXiv 2021, arXiv:2107.03374. [Google Scholar] [CrossRef]
- Hong, S.; et al. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. In Proceedings of the Proc. ICLR, 2024. [Google Scholar]
- Qian, C.; et al. ChatDev: Communicative Agents for Software Development. In Proceedings of the Proc. ACL, 2024; pp. 15174–15186. [Google Scholar]
- Wu, Q.; et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv 2023, arXiv:2308.08155. [Google Scholar]
- Jimenez, C.E.; et al. SWE-bench: Can Language Models Resolve Real-World GitHub Issues? In Proceedings of the Proc. ICLR, 2024. [Google Scholar]
- Austin, J.; et al. Program Synthesis with Large Language Models. arXiv 2021, arXiv:2108.07732. [Google Scholar] [CrossRef]
- Zhuo, T.Y.; et al. BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions. arXiv 2024, arXiv:2406.15877. [Google Scholar]
- Jain, N.; et al. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code. arXiv 2024, arXiv:2403.07974. [Google Scholar]
- Huang, D.; et al. AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation. arXiv 2024, arXiv:2312.13010. [Google Scholar]
- Ishibashi, Y.; Nishimura, Y. Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization. arXiv 2024, arXiv:2404.02183. [Google Scholar]
- Forsgren, N.; Humble, J.; Kim, G. Accelerate: The Science of Lean Software and DevOps; IT Revolution Press, 2018. [Google Scholar]
- Yao, S.; et al. ReAct: Synergizing Reasoning and Acting in Language Models. In Proceedings of the Proc. ICLR, 2023. [Google Scholar]





| System | CSR | TPR | DSR | ACS |
|---|---|---|---|---|
| Single LLM | 0.62 | 0.41 | 0.28 | 0.55 |
| Single Agent | 0.78 | 0.59 | 0.52 | 0.71 |
| CloudMAS (Ours) | 0.92 | 0.81 | 0.84 | 0.89 |
| vs. Single Agent | +0.14 | +0.22 | +0.32 | +0.18 |
| System | Correct. | Read. | Practices | Docs |
|---|---|---|---|---|
| Single LLM | 2.8 | 3.1 | 2.5 | 2.2 |
| Single Agent | 3.4 | 3.5 | 3.2 | 2.9 |
| CloudMAS | 4.2 | 4.0 | 3.9 | 3.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).