Preprint
Article

This version is not peer-reviewed.

Quantum-Enhanced LLM Cascade Routing: A QAOA Approach to Cost-Optimal Model Selection in Multi-Agent Systems

Submitted:

07 April 2026

Posted:

07 April 2026

You are already at the latest version

Abstract
The rapid proliferation of large language model (LLM) powered multi-agent systems creates a non-trivial combinatorial optimization problem: routing heterogeneous tasks to the most cost-effective model tier while maintaining quality guarantees. Current production systems rely on static lookup tables, which over-provision expensive models and waste computational budget. We formalize the LLM Cascade Routing Problem (LCRP) as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solve it using the Quantum Approximate Optimization Algorithm (QAOA). We benchmark QAOA against greedy heuristics and simulated annealing using both Google Cirq simulation and real IBM Quantum hardware (156-qubit Heron processors). Experiments across three IBM backends (ibm_fez, ibm_kingston, ibm_marrakesh) on problem instances from 6 to 18 qubits reveal three key findings: (i) shallow QAOA circuits (p=1, depth 52) achieve 15.4% valid assignment rate on real hardware versus 0.8% for deeper circuits (p=2, depth 101), demonstrating that NISQ noise favors shallow ansatze; (ii) hardware constraint satisfaction degrades steeply with problem size, dropping from 37-43% at 6 qubits to 0.2-0.3% at 18 qubits; and (iii) results are reproducible across all three backends with consistent valid rates within plus or minus 1.5%. To our knowledge, this is the first quantum computing formulation of the LLM model routing problem. We provide an open-source implementation and discuss the projected quantum advantage horizon.
Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated