Quantum-Enhanced LLM Cascade Routing: A QAOA Approach to Cost-Optimal Model Selection in Multi-Agent Systems

Amit Patole

doi:10.20944/preprints202604.0413.v1

Submitted:

07 April 2026

Posted:

07 April 2026

You are already at the latest version

Abstract

The rapid proliferation of large language model (LLM) powered multi-agent systems creates a non-trivial combinatorial optimization problem: routing heterogeneous tasks to the most cost-effective model tier while maintaining quality guarantees. Current production systems rely on static lookup tables, which over-provision expensive models and waste computational budget. We formalize the LLM Cascade Routing Problem (LCRP) as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solve it using the Quantum Approximate Optimization Algorithm (QAOA). We benchmark QAOA against greedy heuristics and simulated annealing using both Google Cirq simulation and real IBM Quantum hardware (156-qubit Heron processors). Experiments across three IBM backends (ibm_fez, ibm_kingston, ibm_marrakesh) on problem instances from 6 to 18 qubits reveal three key findings: (i) shallow QAOA circuits (p=1, depth 52) achieve 15.4% valid assignment rate on real hardware versus 0.8% for deeper circuits (p=2, depth 101), demonstrating that NISQ noise favors shallow ansatze; (ii) hardware constraint satisfaction degrades steeply with problem size, dropping from 37-43% at 6 qubits to 0.2-0.3% at 18 qubits; and (iii) results are reproducible across all three backends with consistent valid rates within plus or minus 1.5%. To our knowledge, this is the first quantum computing formulation of the LLM model routing problem. We provide an open-source implementation and discuss the projected quantum advantage horizon.

Keywords:

quantum computing

;

QAOA

;

LLM routing

;

multi-agent systems

;

combinatorial optimization

;

QUBO

;

NISQ

Subject:

Computer Science and Mathematics - Computer Science

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Quantum-Enhanced LLM Cascade Routing: A QAOA Approach to Cost-Optimal Model Selection in Multi-Agent Systems

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe