1. Introduction
Large language models and adjacent AI systems have created a new layer of intermediation. In many commercially relevant deployments, users do not choose directly among transparent menus of models. Instead, a platform, enterprise gateway, assistant, or API aggregator receives the query, evaluates the available systems, and decides which model answers. Engineering practice has made this problem explicit under the label of routing. A router may send simple requests to a cheap in-house model, difficult requests to a stronger frontier model, and niche requests to a specialized system. Recent computer-science contributions have shown that such routing can materially improve the cost–quality frontier of AI deployment. Chen et al. [
1] study budget-aware cascades, Ong et al. [
2] learn routers from preference data, and Panda et al. [
3] cast the deployment problem as contextual-bandit routing under budget constraints. Those papers are important for system design, but they typically treat the router as an algorithmic object rather than as a strategic economic actor.
From an industrial-organization perspective, that abstraction is too strong. The router is often a vertically integrated intermediary with market power, a business model, and its own model on the menu. Once the routing layer is recognized as an intermediary, familiar questions from platform economics and biased intermediation immediately reappear. If the gateway owns one candidate model, will it self-preference that model? If outside experts must pay for access or accept unfavorable API terms, how are allocation, price, and investment distorted? If traffic to outside experts also generates data, learning-by-serving, or capability improvement, who internalizes the long-run value of referral? In AI markets, routing is therefore not only a prediction problem. It is also a market-design problem in which the intermediary allocates both current demand and future learning opportunities.
This paper develops a unified model of that problem. We formulate “follow the expert” as delegated allocation by a dual-role platform. A unit mass of users arrives in each of two periods, each user indexed by query difficulty. The platform owns an incumbent model. An outside expert offers superior gross answer quality, and its advantage widens with difficulty, but the platform must pay an access price to use it. The outside expert also chooses a quality level through costly investment. After period 1 routing, the outside expert’s period-2 quality increases with period-1 routed demand. This feedback captures a broad family of economically relevant mechanisms: data accumulation, learning-by-serving, domain adaptation, reputation building, or a larger installed base over which the expert can amortize improvement.
The model deliberately stays one-dimensional. We do not model the internal architecture of an LLM, user-side search over a menu, or horizontal differentiation across many experts. The value of the abstraction is that it lets us isolate the industrial-organization margins that matter most for routing governance. There are four of them. First, the platform allocates traffic across competing technologies. Second, the outside expert sets an access price. Third, the outside expert invests in quality in anticipation of traffic. Fourth, current traffic shapes future quality. Those four margins already generate a rich theory of market power at the routing layer.
Two features deserve emphasis at the outset. The first concerns the interpretation of comparative advantage. In our specification, the outside expert is not worse in gross answer quality on easy queries. Rather, it is absolutely superior, and its advantage expands with difficulty. The reason a cutoff nonetheless emerges is that the platform faces an access price for the outside expert and may additionally obtain a private benefit from using its own incumbent model. The cutoff therefore reflects routing costs and integrated-platform incentives, not a literal crossing of production frontiers in gross quality. This interpretation is natural for many contemporary deployments in which the stronger external model is technically better across the board but too expensive or strategically inconvenient to invoke for every query.
The second feature concerns dynamics. In many digital settings, current demand affects future capability. In AI, that channel is especially salient. More traffic can mean more labeled outcomes, more comparative feedback, more opportunities for fine-tuning, more observed failure cases, and stronger incentives to invest in domain-specific quality. Our dynamic extension does not replace the static model with a different one. It extends the same primitives. The same quality choice, the same wholesale access price, and the same platform bias govern both periods. Period-1 routing determines period-2 quality through a transparent law of motion. That unified structure is central because it allows us to study how pricing, self-preferencing, and learning interact inside one equilibrium rather than across loosely connected submodels.
The paper relates to several literatures. The closest economic foundations come from platform economics and biased intermediation. Classic work on two-sided and multisided platforms emphasizes that intermediaries shape market outcomes not only through prices, but also through the access conditions and allocation rules they impose on participants [
4,
5,
6]. The literature on intermediary bias and dual-role platforms shows that vertically integrated gatekeepers may divert traffic toward affiliated sellers or first-party offerings [
7,
8,
9,
10]. Our setting fits squarely in that tradition, but with a distinct AI twist: routing does not only redirect current trade; it also governs the future quality path of specialized experts.
The paper is also related to the literature on search, ranking, and recommendation. Search engines, marketplaces, and recommender systems do not merely reveal information; they shape what users see and what suppliers can profitably offer. Athey and Ellison [
11] and de Corniere [
12] study search environments in which intermediary design affects market outcomes. Che and Hörner [
13] show that recommender systems may need to distort current recommendations in order to facilitate socially valuable learning. We take a parallel idea into AI routing, but the learning object is not only user beliefs or platform information. It is the outside expert’s own quality trajectory. When routing to the expert raises future expert quality, the planner values referral more than a myopic or self-preferring platform does.
A third connection is to innovation and information. Arrow [
14] made precise the idea that current production can increase future capability through learning-by-doing. Akcigit and Liu [
15] show how informational frictions shape innovative effort and market structure. In our model, routed traffic is the analog of productive experience. If the platform restricts access to hard or high-value queries, it reduces the scale on which outside experts can recover current costs and improve future quality. The market structure of AI intermediation therefore affects the direction and level of capability investment.
Against that background, our contribution is fourfold.
First, we provide a tractable industrial-organization model of AI routing in which the routing rule, access pricing, quality investment, and learning-by-serving are jointly determined. For any given expert quality and access price, the platform routes by a cutoff rule in query difficulty. This structure turns “follow the expert” into a delegated screening problem: easy queries stay in-house, and difficult queries are escalated outward. Because the expert’s gross advantage grows with difficulty, the geometry is simple and closed form.
Second, we show that self-preferencing acts as a tax on outside expertise. A larger platform bias toward the incumbent raises the routing threshold, reduces the outside expert’s demand in period 1, and thereby lowers future expert quality as well. The dynamic effect is not an add-on. It is the product of the same demand reduction that already distorts static routing. Once traffic is also a learning input, the harm from bias is amplified.
Third, we characterize the outside expert’s pricing and investment problem in closed form and show that data feedback makes the incidence of bias more severe. Stronger data feedback raises the return to outside traffic and therefore increases equilibrium investment under neutral governance. But the same feedback also magnifies the damage from self-preferencing because every lost query is simultaneously lost revenue and lost future capability. In that sense, the industrial-organization consequences of routing bias are larger in environments where traffic and learning are tightly linked.
Fourth, we derive a dynamic first-best benchmark and identify three distinct wedges between decentralized routing and efficient routing. The first is the access-markup wedge: the platform compares the access price to zero, whereas society compares real resource cost to zero. The second is the bias wedge: a self-preferring platform attaches extra private value to using the incumbent. The third is the data-feedback wedge: the platform does not internalize that routing a query outward today raises future expert quality. This decomposition yields a sharp governance implication. Neutrality rules reduce the bias wedge. Access-pricing remedies reduce the markup wedge. But even a neutral platform with marginal-cost access pricing still under-routes relative to the dynamic first best when it fails to internalize future outside learning. Hence neutrality, access pricing, and data-governance instruments are complements rather than substitutes.
Our theory is intentionally parsimonious, but it speaks directly to current debates about AI gateways, enterprise model hubs, regulated escalation systems, and vertically integrated assistants. In those environments, the economically relevant question is often not which model is globally best in the abstract. It is who controls the router that decides which model is used for which query, under what commercial terms, and with what consequences for future competition. Once the router is modeled as a strategic intermediary, platform economics becomes central to AI governance.
The rest of the paper proceeds as follows.
Section 2 presents the model.
Section 3 solves for equilibrium routing, pricing, and investment and derives the comparative statics of self-preferencing and data feedback.
Section 4 studies the dynamic planner’s benchmark and the welfare-relevant wedges generated by decentralized routing.
Section 5 discusses policy implications, empirical predictions, and extensions.
Section 6 concludes.