Submitted:
12 May 2026
Posted:
13 May 2026
You are already at the latest version
Abstract
Keywords:
I. Introduction
II. Method
A. Hierarchical Agent Architecture
B. Semantic Communication and Decision Fusion
C. Dynamic Feedback Iteration
III. Experimental Setup
IV. Results and Analysis
A. Overall Comparison
B. Ablation and Error Source Analysis
C. Communication Budget Robustness
D. Discussion
E. Complexity and Scalability Analysis
V. Conclusion
References
- Wei et al, J., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” in Proc. NeurIPS, 2022.
- Zhou et al, D., “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models,” in Proc. ICLR, 2023.
- Wang et al, X., “Self-Consistency Improves Chain of Thought Reasoning in Language Models,” in Proc. ICLR, 2023.
- Yao et al, S., “ReAct: Synergizing Reasoning and Acting in Language Models,” in Proc. ICLR, 2023.
- Yao et al, S., “Tree of Thoughts: Deliberate Problem Solving with Large Language Models,” in Proc. NeurIPS, 2023.
- Schick et al, T., “Toolformer: Language Models Can Teach Themselves to Use Tools,” in Proc. NeurIPS, 2023.
- Madaan et al, A., “Self-Refine: Iterative Refinement with Self-Feedback,” arXiv preprint arXiv:2303.17651, 2023. [CrossRef]
- Shinn et al, N., “Reflexion: Language Agents with Verbal Reinforcement Learning,” in Proc. NeurIPS, 2023.
- Du et al, Y., “Improving Factuality and Reasoning in Language Models through Multiagent Debate,” arXiv preprint arXiv:2305.14325, 2023. [CrossRef]
- Besta et al, M., “Graph of Thoughts: Solving Elaborate Problems with Large Language Models,” in Proc. AAAI, 2024.
- Li et al, G., “CAMEL: Communicative Agents for ‘Mind’ Exploration of Large Language Model Society,” in Proc. NeurIPS, 2023.
- Hong et al, S., “MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,” in Proc. ICLR, 2024.
- Chen et al, W., “AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors,” in Proc. ICLR, 2024.
- Wu et al, Q., “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,” in Proc. COLM, 2024.
- Qian et al, C., “ChatDev: Communicative Agents for Software Development,” in Proc. ACL, 2024.
- Cobbe et al, K., “Training Verifiers to Solve Math Word Problems,” arXiv preprint arXiv:2110.14168, 2021. [CrossRef]
- Valmeekam et al, K., “PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change,” in Proc. NeurIPS Datasets and Benchmarks, 2023.
- Trivedi et al, H., “MuSiQue: Multihop Questions via Single-hop Question Composition,” Trans. Assoc. Comput. Linguistics, vol. 10, pp. 539–554, 2022. [CrossRef]




| Benchmark | Metric and challenge |
|---|---|
| GSM8K [16] | Completion; 2-8 arithmetic steps |
| PlanBench [17] | Valid-plan rate; symbolic state transition |
| MuSiQue [18] | Answer F1 / completion; 2-4 hop evidence composition |
| Method | Completion | Error | Consistency |
|---|---|---|---|
| Single LLM | 68.7 | 30.0 | - |
| CoT + SC | 73.4 | 27.6 | - |
| ReAct | 75.1 | 26.9 | - |
| Debate | 80.2 | 25.3 | 88.1 |
| Proposed | 83.3 | 24.4 | 92.6 |
| Variant | Completion | Error | Consistency |
|---|---|---|---|
| Full model | 83.3 | 24.4 | 92.6 |
| w/o feedback | 79.1 | 26.7 | 89.4 |
| w/o verifier | 77.8 | 28.2 | 87.9 |
| w/o fusion | 75.9 | 29.1 | 84.8 |
| Budget per subtask | Completion | Error |
|---|---|---|
| 2 messages | 76.8 | 28.9 |
| 4 messages | 81.0 | 25.8 |
| 6 messages | 83.3 | 24.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).