Preprint
Article

This version is not peer-reviewed.

Expert Routing and Self-Preferencing on AI Platforms

Submitted:

10 March 2026

Posted:

13 March 2026

You are already at the latest version

Abstract
This paper develops an industrial-organization theory of AI routing, modeling how a dual-role platform allocates user queries between an in-house model and an outside expert. We formulate this as a delegated allocation problem featuring endogenous quality investment and data feedback. The expert sets access prices and initial quality, while the platform routes queries by difficulty. Early traffic routed to the expert enhances its future quality through learning-by-serving. In equilibrium, routing follows a cutoff rule. The platform's self-preferencing acts as a tax on outside expertise, raising routing thresholds, reducing outside demand, and compressing both current quality investment and future learning gains. Decentralized routing introduces three inefficiency wedges compared to a dynamic first best: an access-markup wedge, a bias wedge, and a data-feedback wedge. The third is unique to AI routing because traffic allocation directly dictates learning opportunities. Consequently, neutrality and access-pricing remedies are complementary but insufficient together, as platforms fail to internalize the future value of outside learning. This model provides a tractable framework for analyzing AI gateways and router governance.
Keywords: 
;  ;  ;  ;  

1. Introduction

Large language models and adjacent AI systems have created a new layer of intermediation. In many commercially relevant deployments, users do not choose directly among transparent menus of models. Instead, a platform, enterprise gateway, assistant, or API aggregator receives the query, evaluates the available systems, and decides which model answers. Engineering practice has made this problem explicit under the label of routing. A router may send simple requests to a cheap in-house model, difficult requests to a stronger frontier model, and niche requests to a specialized system. Recent computer-science contributions have shown that such routing can materially improve the cost–quality frontier of AI deployment. Chen et al. [1] study budget-aware cascades, Ong et al. [2] learn routers from preference data, and Panda et al. [3] cast the deployment problem as contextual-bandit routing under budget constraints. Those papers are important for system design, but they typically treat the router as an algorithmic object rather than as a strategic economic actor.
From an industrial-organization perspective, that abstraction is too strong. The router is often a vertically integrated intermediary with market power, a business model, and its own model on the menu. Once the routing layer is recognized as an intermediary, familiar questions from platform economics and biased intermediation immediately reappear. If the gateway owns one candidate model, will it self-preference that model? If outside experts must pay for access or accept unfavorable API terms, how are allocation, price, and investment distorted? If traffic to outside experts also generates data, learning-by-serving, or capability improvement, who internalizes the long-run value of referral? In AI markets, routing is therefore not only a prediction problem. It is also a market-design problem in which the intermediary allocates both current demand and future learning opportunities.
This paper develops a unified model of that problem. We formulate “follow the expert” as delegated allocation by a dual-role platform. A unit mass of users arrives in each of two periods, each user indexed by query difficulty. The platform owns an incumbent model. An outside expert offers superior gross answer quality, and its advantage widens with difficulty, but the platform must pay an access price to use it. The outside expert also chooses a quality level through costly investment. After period 1 routing, the outside expert’s period-2 quality increases with period-1 routed demand. This feedback captures a broad family of economically relevant mechanisms: data accumulation, learning-by-serving, domain adaptation, reputation building, or a larger installed base over which the expert can amortize improvement.
The model deliberately stays one-dimensional. We do not model the internal architecture of an LLM, user-side search over a menu, or horizontal differentiation across many experts. The value of the abstraction is that it lets us isolate the industrial-organization margins that matter most for routing governance. There are four of them. First, the platform allocates traffic across competing technologies. Second, the outside expert sets an access price. Third, the outside expert invests in quality in anticipation of traffic. Fourth, current traffic shapes future quality. Those four margins already generate a rich theory of market power at the routing layer.
Two features deserve emphasis at the outset. The first concerns the interpretation of comparative advantage. In our specification, the outside expert is not worse in gross answer quality on easy queries. Rather, it is absolutely superior, and its advantage expands with difficulty. The reason a cutoff nonetheless emerges is that the platform faces an access price for the outside expert and may additionally obtain a private benefit from using its own incumbent model. The cutoff therefore reflects routing costs and integrated-platform incentives, not a literal crossing of production frontiers in gross quality. This interpretation is natural for many contemporary deployments in which the stronger external model is technically better across the board but too expensive or strategically inconvenient to invoke for every query.
The second feature concerns dynamics. In many digital settings, current demand affects future capability. In AI, that channel is especially salient. More traffic can mean more labeled outcomes, more comparative feedback, more opportunities for fine-tuning, more observed failure cases, and stronger incentives to invest in domain-specific quality. Our dynamic extension does not replace the static model with a different one. It extends the same primitives. The same quality choice, the same wholesale access price, and the same platform bias govern both periods. Period-1 routing determines period-2 quality through a transparent law of motion. That unified structure is central because it allows us to study how pricing, self-preferencing, and learning interact inside one equilibrium rather than across loosely connected submodels.
The paper relates to several literatures. The closest economic foundations come from platform economics and biased intermediation. Classic work on two-sided and multisided platforms emphasizes that intermediaries shape market outcomes not only through prices, but also through the access conditions and allocation rules they impose on participants [4,5,6]. The literature on intermediary bias and dual-role platforms shows that vertically integrated gatekeepers may divert traffic toward affiliated sellers or first-party offerings [7,8,9,10]. Our setting fits squarely in that tradition, but with a distinct AI twist: routing does not only redirect current trade; it also governs the future quality path of specialized experts.
The paper is also related to the literature on search, ranking, and recommendation. Search engines, marketplaces, and recommender systems do not merely reveal information; they shape what users see and what suppliers can profitably offer. Athey and Ellison [11] and de Corniere [12] study search environments in which intermediary design affects market outcomes. Che and Hörner [13] show that recommender systems may need to distort current recommendations in order to facilitate socially valuable learning. We take a parallel idea into AI routing, but the learning object is not only user beliefs or platform information. It is the outside expert’s own quality trajectory. When routing to the expert raises future expert quality, the planner values referral more than a myopic or self-preferring platform does.
A third connection is to innovation and information. Arrow [14] made precise the idea that current production can increase future capability through learning-by-doing. Akcigit and Liu [15] show how informational frictions shape innovative effort and market structure. In our model, routed traffic is the analog of productive experience. If the platform restricts access to hard or high-value queries, it reduces the scale on which outside experts can recover current costs and improve future quality. The market structure of AI intermediation therefore affects the direction and level of capability investment.
Against that background, our contribution is fourfold.
First, we provide a tractable industrial-organization model of AI routing in which the routing rule, access pricing, quality investment, and learning-by-serving are jointly determined. For any given expert quality and access price, the platform routes by a cutoff rule in query difficulty. This structure turns “follow the expert” into a delegated screening problem: easy queries stay in-house, and difficult queries are escalated outward. Because the expert’s gross advantage grows with difficulty, the geometry is simple and closed form.
Second, we show that self-preferencing acts as a tax on outside expertise. A larger platform bias toward the incumbent raises the routing threshold, reduces the outside expert’s demand in period 1, and thereby lowers future expert quality as well. The dynamic effect is not an add-on. It is the product of the same demand reduction that already distorts static routing. Once traffic is also a learning input, the harm from bias is amplified.
Third, we characterize the outside expert’s pricing and investment problem in closed form and show that data feedback makes the incidence of bias more severe. Stronger data feedback raises the return to outside traffic and therefore increases equilibrium investment under neutral governance. But the same feedback also magnifies the damage from self-preferencing because every lost query is simultaneously lost revenue and lost future capability. In that sense, the industrial-organization consequences of routing bias are larger in environments where traffic and learning are tightly linked.
Fourth, we derive a dynamic first-best benchmark and identify three distinct wedges between decentralized routing and efficient routing. The first is the access-markup wedge: the platform compares the access price to zero, whereas society compares real resource cost to zero. The second is the bias wedge: a self-preferring platform attaches extra private value to using the incumbent. The third is the data-feedback wedge: the platform does not internalize that routing a query outward today raises future expert quality. This decomposition yields a sharp governance implication. Neutrality rules reduce the bias wedge. Access-pricing remedies reduce the markup wedge. But even a neutral platform with marginal-cost access pricing still under-routes relative to the dynamic first best when it fails to internalize future outside learning. Hence neutrality, access pricing, and data-governance instruments are complements rather than substitutes.
Our theory is intentionally parsimonious, but it speaks directly to current debates about AI gateways, enterprise model hubs, regulated escalation systems, and vertically integrated assistants. In those environments, the economically relevant question is often not which model is globally best in the abstract. It is who controls the router that decides which model is used for which query, under what commercial terms, and with what consequences for future competition. Once the router is modeled as a strategic intermediary, platform economics becomes central to AI governance.
The rest of the paper proceeds as follows. Section 2 presents the model. Section 3 solves for equilibrium routing, pricing, and investment and derives the comparative statics of self-preferencing and data feedback. Section 4 studies the dynamic planner’s benchmark and the welfare-relevant wedges generated by decentralized routing. Section 5 discusses policy implications, empirical predictions, and extensions. Section 6 concludes.

2. Model

There are two periods, t { 1 , 2 } , and in each period a unit mass of users arrives on the platform. A user is indexed by query difficulty θ [ 0 , 1 ] , distributed uniformly. Higher θ means that the query is more difficult. The platform can route each query to one of two AI systems.
The first system, denoted I for incumbent, is owned by the platform. The second, denoted E for expert, is supplied by an outside firm. The incumbent’s gross user utility is
u I ( θ ) = v θ ,
where v > 0 is a common baseline. In period t, the outside expert’s gross user utility is
u E , t ( θ ) = v + q t γ θ ,
where q t 0 is expert quality in period t and γ ( 0 , 1 ) . Define
δ 1 γ ( 0 , 1 ) .
Then the expert’s gross advantage over the incumbent is
u E , t ( θ ) u I ( θ ) = q t + δ θ .
Thus the expert is grossly superior for every θ whenever q t > 0 , and that superiority grows with difficulty.
The expert has marginal inference cost m > 0 per routed query. Before routing begins, the expert chooses two variables. First, it chooses an initial quality level q q 1 0 at convex cost
κ 2 q 2 ,
where κ > 0 is the investment-cost parameter. Second, it chooses a constant wholesale access price w m paid by the platform for each query routed to the expert in either period. We treat the constant wholesale price as a long-term access contract. Allowing period-specific prices would add algebra but not alter the basic logic that the platform faces an access wedge relative to real cost.
The platform is a dual-role intermediary. If it routes a query to its own incumbent model, it obtains a private benefit b 0 . This reduced-form term captures self-preferencing, internal accounting advantages, traffic-retention motives, data capture, brand control, or ecosystem complementarity. Hence, in period t, the platform compares
Π I ( θ ) = u I ( θ ) + b
with
Π E , t ( θ ) = u E , t ( θ ) w .
The platform therefore routes to the expert in period t whenever
q t + δ θ w + b .
The dynamic link is a data-feedback or learning-by-serving channel. If the expert receives demand D 1 in period 1, then period-2 expert quality becomes
q 2 = q + λ D 1 ,
where λ 0 measures the strength of data feedback. The parameter λ can be interpreted broadly. More routed traffic may generate more outcome labels, more comparative user feedback, more observations of failure modes, more opportunities to specialize, or a larger installed base over which improvement can be amortized. Our law of motion is deliberately simple: one more unit of period-1 traffic raises period-2 quality by λ .
The timing is as follows.
1.
Parameters ( m , δ , κ , λ , b , β ) are given, where β ( 0 , 1 ] is the discount factor on period 2.
2.
The outside expert chooses initial quality q and wholesale access price w.
3.
The platform observes each user’s difficulty θ and routes period-1 queries to I or E.
4.
Period-1 demand D 1 updates expert quality to q 2 = q + λ D 1 .
5.
The platform routes period-2 queries using the same access price w and bias parameter b.
6.
Payoffs are realized.
We maintain the following parameter restriction.
Assumption A1.
Define
A ( λ ) 1 + β 1 + λ δ .
Parameters satisfy
2 κ δ > A ( λ ) , β λ 2 < δ 2 ,
and
0 < δ m b < 2 κ δ A ( λ ) κ 1 + λ / δ .
The first inequality guarantees strict concavity of the expert’s optimization problem. The second guarantees strict concavity of the planner’s dynamic routing problem in Section 4. The third ensures an interior equilibrium: the expert serves a positive but not full measure of queries in both periods. These restrictions are transparent economically. The expert must be valuable enough on difficult queries to attract some traffic, but not so valuable that it serves the entire market; investment cannot be arbitrarily cheap; and data feedback cannot be so explosive that a small amount of initial traffic collapses the cutoff immediately.
Because users do not directly choose the model and make no explicit monetary payment in this reduced-form environment, consumer surplus is simply expected user utility. Total welfare is expected user utility minus real inference cost and minus quality-investment cost. The wholesale price w is a transfer between the platform and the expert; it affects equilibrium behavior but not welfare directly.
Before solving the model, it is useful to state the interpretation clearly. The platform controls the allocation of queries across technologies. The expert chooses quality and access terms in anticipation of that control. Current routing determines future quality because routed traffic generates knowledge and incentives. The industrial-organization object of interest is therefore not merely the ranking of existing models, but the governance of the traffic-allocation layer itself.

3. Equilibrium Routing, Pricing, and Investment

We solve the model by backward induction. For any expert quality and access price, the platform chooses routing in each period. Anticipating those routing rules and the period-2 quality feedback they induce, the expert chooses ( q , w ) .

3.1. Cutoff Routing in Both Periods

For a given pair ( q , w ) , expert quality in period 1 is q 1 = q . Period-1 routing to the expert occurs whenever
q + δ θ w + b .
Define the period-1 cutoff by
t 1 ( q , w , b ) = w + b q δ .
Let D 1 ( q , w , b ) denote the mass of period-1 queries routed to the expert.
Proposition 1
(Cutoff routing and dynamic propagation). For any ( q , w , b ) , the platform’s routing rule is a cutoff rule in both periods. If the implied cutoffs are interior, then in period 1,
D 1 ( q , w , b ) = 1 t 1 ( q , w , b ) = q w b + δ δ .
Period-2 quality is
q 2 = q + λ D 1 ( q , w , b ) ,
and the period-2 cutoff is
t 2 ( q , w , b ) = w + b q 2 δ = t 1 ( q , w , b ) λ δ D 1 ( q , w , b ) .
Hence period-2 expert demand is
D 2 ( q , w , b ) = 1 t 2 ( q , w , b ) = 1 + λ δ D 1 ( q , w , b ) .
Proof. 
The platform routes to the expert in period 1 whenever q + δ θ w + b . Since δ > 0 , the set of routed types is the upper interval [ t 1 , 1 ] [ 0 , 1 ] , where
t 1 ( q , w , b ) = w + b q δ .
Whenever this cutoff is interior, period-1 expert demand equals the length of that upper interval:
D 1 ( q , w , b ) = 1 t 1 ( q , w , b ) = q w b + δ δ .
By the law of motion for expert quality,
q 2 = q + λ D 1 ( q , w , b ) .
In period 2, the platform routes to the expert whenever q 2 + δ θ w + b , which implies the cutoff
t 2 ( q , w , b ) = w + b q 2 δ = w + b q δ λ δ D 1 ( q , w , b ) = t 1 ( q , w , b ) λ δ D 1 ( q , w , b ) .
Whenever this period-2 cutoff is interior, period-2 expert demand is
D 2 ( q , w , b ) = 1 t 2 ( q , w , b ) = 1 t 1 ( q , w , b ) + λ δ D 1 ( q , w , b ) = 1 + λ δ D 1 ( q , w , b ) .
Proposition 1 gives the basic mechanism-design geometry. “Follow the expert” is implemented by partitioning the query space. Low-difficulty queries are kept in-house, while harder queries are escalated. The dynamic addition is equally transparent: a lower period-1 cutoff raises period-1 expert traffic, which increases period-2 expert quality and lowers the period-2 cutoff as well. Current routing therefore changes future routing through the quality law of motion.

3.2. The Outside Expert’s Optimization Problem

Given Proposition 1, the outside expert chooses ( q , w ) to maximize discounted profit:
π E ( q , w ; b , λ ) = ( w m ) D 1 ( q , w , b ) + β ( w m ) D 2 ( q , w , b ) κ 2 q 2 .
Using Proposition 1, this becomes
π E ( q , w ; b , λ ) = A ( λ ) ( w m ) D 1 ( q , w , b ) κ 2 q 2 ,
where A ( λ ) = 1 + β ( 1 + λ / δ ) . Since
D 1 ( q , w , b ) = q w b + δ δ ,
we may rewrite profit as
π E ( q , w ; b , λ ) = A ( λ ) ( w m ) q w b + δ δ κ 2 q 2 .
The factor A ( λ ) is a dynamic multiplier. When λ = 0 , it equals 1 + β : one unit of period-1 demand generates one contemporaneous margin and one discounted period-2 margin. When λ > 0 , period-1 demand is more valuable because it also lifts period-2 quality and thus period-2 demand.
Proposition 2
(Unique interior equilibrium). Under Assumption 1, the expert’s problem has a unique interior solution. Equilibrium initial quality and wholesale price are
q ( b , λ ) = A ( λ ) ( δ m b ) 2 κ δ A ( λ ) ,
w ( b , λ ) = m + κ δ ( δ m b ) 2 κ δ A ( λ ) .
Equilibrium expert demand in periods 1 and 2 is
D 1 ( b , λ ) = κ ( δ m b ) 2 κ δ A ( λ ) ,
D 2 ( b , λ ) = 1 + λ δ κ ( δ m b ) 2 κ δ A ( λ ) .
The associated cutoffs are t 1 = 1 D 1 and t 2 = 1 D 2 .
Proof. 
Differentiate the profit function with respect to w and q:
π E w = A ( λ ) q 2 w + m b + δ δ ,
π E q = A ( λ ) w m δ κ q .
Setting the first-order conditions equal to zero gives
q 2 w + m b + δ = 0 ,
A ( λ ) ( w m ) = κ δ q .
Equation (35) implies
w = m + κ δ A ( λ ) q .
Substituting into (34) yields
q 2 m + κ δ A ( λ ) q + m b + δ = 0 ,
which simplifies to
2 κ δ A ( λ ) q = A ( λ ) ( δ m b ) .
Therefore
q ( b , λ ) = A ( λ ) ( δ m b ) 2 κ δ A ( λ ) .
Substituting back gives
w ( b , λ ) = m + κ δ ( δ m b ) 2 κ δ A ( λ ) .
Using Proposition 1,
D 1 ( b , λ ) = q w b + δ δ .
Substitute the formula for w :
D 1 ( b , λ ) = q 1 κ δ / A ( λ ) + δ m b δ .
Since
δ m b = 2 κ δ A ( λ ) A ( λ ) q ,
we obtain
D 1 ( b , λ ) = κ δ q / A ( λ ) δ = κ q A ( λ ) = κ ( δ m b ) 2 κ δ A ( λ ) .
Proposition 1 then implies
D 2 ( b , λ ) = 1 + λ δ D 1 ( b , λ ) .
By Assumption 1,
0 < D 1 ( b , λ ) = κ ( δ m b ) 2 κ δ A ( λ ) < 1 1 + λ / δ < 1 ,
so
0 < D 2 ( b , λ ) = 1 + λ δ D 1 ( b , λ ) < 1 .
Hence the candidate indeed lies in the interior region characterized in Proposition 1. It remains to verify uniqueness of the interior solution. The Hessian of π E is
H = 2 A ( λ ) / δ A ( λ ) / δ A ( λ ) / δ κ .
The leading principal minor is negative. The determinant is
det ( H ) = A ( λ ) δ 2 2 κ δ A ( λ ) > 0
by Assumption 1. Hence H is negative definite, so the objective is strictly concave on the interior branch and the first-order conditions characterize its unique maximizer there. Under Assumption 1, this interior solution is the equilibrium stated in the proposition. □
Proposition 2 shows that the equilibrium remains closed form despite the dynamic channel. The expert chooses a strictly positive markup,
w ( b , λ ) m = κ δ ( δ m b ) 2 κ δ A ( λ ) ,
so the platform compares the expert to a transfer price above real inference cost even when it is neutral. That is the familiar markup distortion. The novel dynamic element is that A ( λ ) increases the return to expert demand, so stronger data feedback raises equilibrium quality and routed volume.
The next result formalizes the comparative statics of self-preferencing and data feedback.
Proposition 3
(Self-preferencing as a tax on outside expertise). Under Assumption 1, equilibrium quality, routed demand, and expert profit are all strictly decreasing in the platform bias parameter b. Specifically,
q b = A ( λ ) 2 κ δ A ( λ ) < 0 ,
D 1 b = κ 2 κ δ A ( λ ) < 0 ,
D 2 b = 1 + λ δ κ 2 κ δ A ( λ ) < 0 .
Equilibrium expert profit is
π E ( b , λ ) = A ( λ ) κ ( δ m b ) 2 2 2 κ δ A ( λ ) ,
which is strictly positive and strictly decreasing in b.
Proof. 
Differentiate the closed-form expressions in Proposition 2. Since A ( λ ) does not depend on b,
q b = A ( λ ) 2 κ δ A ( λ ) < 0 ,
which implies
D 1 b = κ 2 κ δ A ( λ ) < 0 ,
and, because D 2 = ( 1 + λ / δ ) D 1 ,
D 2 b = 1 + λ δ κ 2 κ δ A ( λ ) < 0 .
To compute equilibrium profit, substitute Proposition 2 into
π E = A ( λ ) ( w m ) D 1 κ 2 q 2 .
Using
w m = κ δ ( δ m b ) 2 κ δ A ( λ ) , D 1 = κ ( δ m b ) 2 κ δ A ( λ ) ,
and
q = A ( λ ) ( δ m b ) 2 κ δ A ( λ ) ,
we obtain
π E ( b , λ ) = A ( λ ) κ 2 δ ( δ m b ) 2 2 κ δ A ( λ ) 2 κ 2 A ( λ ) 2 ( δ m b ) 2 2 κ δ A ( λ ) 2
= A ( λ ) κ ( δ m b ) 2 2 2 κ δ A ( λ ) > 0 ,
where positivity follows from Assumption 1. Since the right-hand side is increasing in ( δ m b ) 2 and δ m b > 0 , it is strictly decreasing in b. □
Proposition 3 is the central incidence result. A larger bias does not merely misroute a given stock of quality. It lowers the revenue base over which the outside expert can recover access costs and quality investment. Because routed traffic also improves future quality, the effect propagates intertemporally. In economic terms, self-preferencing taxes the outside expert’s scale and thereby taxes outside capability accumulation.
The dynamic environment also lets us identify an amplification effect. Stronger data feedback increases the social and private return to expert demand, but it also makes bias more consequential because the same lost query now matters twice: once for current profit and again for future quality.
Corollary 1
(Data feedback amplifies the harm from bias). Under Assumption 1,
q λ > 0 , D 1 λ > 0 , D 2 λ > 0 .
Moreover, the absolute sensitivity of expert quality and period-1 demand to platform bias is increasing in λ:
λ q b > 0 , λ D 1 b > 0 .
Proof. 
Since A ( λ ) = β / δ > 0 , differentiating Proposition 2 with respect to A gives
q A = 2 κ δ ( δ m b ) 2 κ δ A 2 > 0 ,
so q / λ = ( q / A ) A ( λ ) > 0 . Next,
D 1 = κ ( δ m b ) 2 κ δ A ( λ ) ,
so
D 1 λ = κ ( δ m b ) A ( λ ) 2 κ δ A ( λ ) 2 > 0 .
Since D 2 = ( 1 + λ / δ ) D 1 and both factors are increasing in λ , we also have D 2 / λ > 0 .
For the amplification claim,
q b = A ( λ ) 2 κ δ A ( λ ) .
Differentiating with respect to λ yields
λ q b = 2 κ δ A ( λ ) 2 κ δ A ( λ ) 2 > 0 .
Similarly,
D 1 b = κ 2 κ δ A ( λ ) ,
so
λ D 1 b = κ A ( λ ) 2 κ δ A ( λ ) 2 > 0 .
The corollary clarifies why AI routing is a particularly consequential setting for self-preferencing. In a static product-market model, bias changes current market share. In our environment, where traffic also creates future quality, stronger feedback magnifies the damage from the same amount of bias. This mechanism is exactly what one would expect in markets where access to queries, outcomes, and user interactions is a core input into capability development.

4. Dynamic First Best and the Wedges in Decentralized Routing

We now compare decentralized routing to a planner benchmark. The goal is not to solve a full regulatory game, but to identify precisely which margins decentralized routing fails to internalize.

4.1. The Planner’s Dynamic Benchmark

Fix the expert’s initial quality level q. The planner chooses period-1 routing to maximize discounted total welfare, taking as given that period-2 expert quality is q 2 = q + λ D 1 . Because user types are uniformly distributed and the expert’s gross advantage is increasing in θ , it is sufficient for the planner to choose the mass D 1 [ 0 , 1 ] of hardest period-1 queries routed to the expert.
If the planner routes the hardest D 1 mass of period-1 queries to the expert, the period-1 incremental welfare relative to using the incumbent for all queries is
S 1 ( D 1 ; q ) = 1 D 1 1 ( q + δ θ m ) d θ = D 1 ( q m + δ ) δ 2 D 1 2 .
Given D 1 , period-2 expert quality is q + λ D 1 . In period 2, the planner again routes all and only those queries for which the expert’s incremental welfare is nonnegative. On the interior branch where period-2 expert demand lies in ( 0 , 1 ) , optimal period-2 demand is
D 2 F B ( q + λ D 1 ) = q + λ D 1 m + δ δ .
The corresponding period-2 incremental welfare on this branch is
S 2 ( q + λ D 1 ) = ( q + λ D 1 m + δ ) 2 2 δ .
Therefore the planner solves
max D 1 [ 0 , 1 ] S 1 ( D 1 ; q ) + β S 2 ( q + λ D 1 ) κ 2 q 2 .
Since q is fixed in this benchmark, the investment cost term does not affect the choice of D 1 .
Proposition 4
(Conditional dynamic first best). Fix the expert’s initial quality q. Under Assumption 1, suppose the planner’s optimum is interior in period 1 and induces interior period-2 demand. Then the planner’s unique interior optimum is
D 1 F B ( q ) = ( δ + β λ ) ( q m + δ ) δ 2 β λ 2 .
Equivalently, the planner’s period-1 cutoff is
t 1 F B ( q ) = 1 D 1 F B ( q ) .
The planner’s marginal condition is
q + δ ( 1 D 1 F B ( q ) ) m + β λ D 2 F B ( q + λ D 1 F B ( q ) ) = 0 .
Proof. 
The planner’s objective as a function of D 1 on this interior branch is
Ω ( D 1 ; q ) = D 1 ( q m + δ ) δ 2 D 1 2 + β ( q + λ D 1 m + δ ) 2 2 δ κ 2 q 2 .
Differentiate with respect to D 1 :
Ω D 1 ( D 1 ; q ) = q m + δ δ D 1 + β λ q + λ D 1 m + δ δ .
The second derivative is
Ω D 1 D 1 ( D 1 ; q ) = δ + β λ 2 δ = δ 2 β λ 2 δ < 0
by Assumption 1, so the objective is strictly concave on the interior branch and the interior optimum is unique. Setting the first derivative equal to zero yields
( q m + δ ) 1 + β λ δ + D 1 β λ 2 δ δ = 0 .
Solving for D 1 gives
D 1 F B ( q ) = ( δ + β λ ) ( q m + δ ) δ 2 β λ 2 .
The cutoff form follows from the monotonicity of incremental welfare in θ . Finally, the first-order condition may be rewritten as
q + δ ( 1 D 1 ) m + β λ q + λ D 1 m + δ δ = 0 .
Since the fraction on the right is exactly D 2 F B ( q + λ D 1 ) on the interior branch, the marginal condition becomes
q + δ ( 1 D 1 F B ) m + β λ D 2 F B ( q + λ D 1 F B ) = 0 .
The planner’s condition has an intuitive interpretation. At the period-1 margin, routing one more borderline query to the expert has a current payoff equal to its current incremental welfare, q + δ θ m . Unlike the platform, however, the planner also values the effect of that extra query on future expert quality. The continuation value is β λ D 2 F B because one more period-1 expert query raises period-2 quality by λ , and a one-unit increase in expert quality raises the payoff on each period-2 expert-routed query by one. The planner therefore sets a lower period-1 cutoff than a static planner would on this interior branch.
Indeed, when λ = 0 , Proposition 4 collapses to the static benchmark demand ( q m + δ ) / δ . When λ > 0 , the planner sends more traffic to the expert in period 1 because current routing also creates future capability. The next proposition shows how decentralized routing falls short.

4.2. Markup, Bias, and Data-Feedback Wedges

For any fixed ( q , w , b ) , the platform’s period-1 expert demand is
D 1 P ( q , w , b ) = q w b + δ δ .
Comparing this expression to Proposition 4 yields the wedges between decentralized routing and efficient routing.
Proposition 5
(Three wedges in decentralized routing). Fix the expert’s initial quality q and suppose the interior characterization in Proposition 4 applies and that platform demand is interior. Then the gap between the planner’s period-1 expert demand and the platform’s period-1 expert demand is
D 1 F B ( q ) D 1 P ( q , w , b ) = w + b m δ + β λ ( δ + λ ) ( q m + δ ) δ ( δ 2 β λ 2 ) .
Hence decentralized routing under-routes to the expert for three distinct reasons:
(i)
an access-markup wedge, w m δ ;
(ii)
a bias wedge, b δ ;
(iii)
a data-feedback wedge, β λ ( δ + λ ) ( q m + δ ) δ ( δ 2 β λ 2 ) .
All three wedges are nonnegative, and the data-feedback wedge is strictly positive whenever λ > 0 .
Proof. 
Using Proposition 4,
D 1 F B ( q ) = ( δ + β λ ) ( q m + δ ) δ 2 β λ 2 .
Write this as
D 1 F B ( q ) = q m + δ δ + δ + β λ δ 2 β λ 2 1 δ ( q m + δ ) .
The bracketed term simplifies to
β λ ( δ + λ ) δ ( δ 2 β λ 2 ) .
Hence
D 1 F B ( q ) = q m + δ δ + β λ ( δ + λ ) ( q m + δ ) δ ( δ 2 β λ 2 ) .
Now subtract the platform’s demand,
D 1 P ( q , w , b ) = q w b + δ δ .
We obtain
D 1 F B ( q ) D 1 P ( q , w , b ) = q m + δ δ q w b + δ δ + β λ ( δ + λ ) ( q m + δ ) δ ( δ 2 β λ 2 )
= w + b m δ + β λ ( δ + λ ) ( q m + δ ) δ ( δ 2 β λ 2 ) .
The first term may be decomposed into ( w m ) / δ + b / δ . Under the maintained parameter restrictions and interior conditions, each term is nonnegative, and the second fraction is strictly positive whenever λ > 0 . □
Proposition 5 is the paper’s main welfare decomposition on the interior branch. The platform under-routes to the outside expert not only because it faces an access price above real inference cost and not only because it may self-preference the incumbent, but also because it fails to internalize the future quality gains that current expert traffic creates. The third wedge is specific to an AI-routing environment in which traffic allocation doubles as data allocation.
The decomposition immediately implies that conduct and pricing remedies are not sufficient on their own.
Corollary 2
(Neutrality and access pricing are complements but not enough). Fix the expert’s initial quality q and suppose the interior benchmark in Proposition 5 applies. If regulation enforces neutrality ( b = 0 ) and marginal-cost access pricing ( w = m ), then the platform’s period-1 expert demand becomes
D 1 M C ( q ) = q m + δ δ ,
which is still strictly below the dynamic first best whenever λ > 0 :
D 1 F B ( q ) D 1 M C ( q ) = β λ ( δ + λ ) ( q m + δ ) δ ( δ 2 β λ 2 ) > 0 .
Proof. 
Set b = 0 and w = m in Proposition 5. The first two wedges vanish, leaving only the data-feedback wedge. Because λ > 0 , q m + δ > 0 on the interior branch, and δ 2 β λ 2 > 0 under Assumption 1, the remaining wedge is strictly positive. □
This corollary provides a sharp reason why AI-routing governance cannot be reduced to a single nondiscrimination mandate within the interior benchmark. Neutrality corrects the platform’s direct incentive to favor the incumbent. Marginal-cost access corrects the static markup distortion. But neither instrument makes the platform internalize how current expert traffic raises future expert capability. If the regulator also cares about dynamic efficiency, some instrument aimed at data sharing, learning access, or mandated experimentation remains relevant.

5. Policy Discussion, Empirical Predictions, and Extensions

The theoretical results carry several implications for the governance of AI gateways and other routing intermediaries.

5.1. Routing Neutrality and Dual-Role Platforms

The first implication is the most immediate. A platform that owns one of the candidate models has a structural incentive to overuse it. In the model, the distortion appears as the reduced-form bias parameter b. The interpretation is broader than overt ranking favoritism. The platform may make the incumbent look artificially cheap through internal transfer pricing, grant it better latency or context length, privilege it in hidden prompts, or favor it because routing inward preserves user data and ecosystem control. All of these mechanisms raise the effective platform-side value of keeping traffic in-house.
The model shows that the effects of such bias are broader in AI routing than in many standard search settings. Bias raises the period-1 threshold, but that is only the first-round distortion. It also lowers the scale on which the outside expert earns margins and, through that channel, reduces initial quality investment. Because period-1 traffic improves period-2 expert quality, the same initial diversion weakens future expert performance as well. In other words, self-preferencing reallocates demand and simultaneously depresses the rival’s future capability frontier.
This result matters for competition policy. In a conventional product-market setting, a dominant intermediary may harm a rival by reducing current sales. In AI markets, reducing routed traffic may also deny the rival the data and learning opportunities embedded in those interactions. The harm from self-preferencing is therefore not exhausted by static market-share diversion. It may operate through capability accumulation, which is exactly the margin on which specialized entrants attempt to compete.
The natural policy counterpart is an auditable neutrality rule for routing. Such a rule need not require identical traffic shares across models. It need only require that routing decisions be justified by observable cost, latency, quality, or safety criteria rather than by ownership or hidden strategic preferences. In the language of the model, the objective is to reduce b, not to eliminate all asymmetry.

5.2. Access Pricing and Interoperability

The second implication is that neutrality alone cannot solve the allocation problem. Even when b = 0 , the expert sets a strictly positive markup over real inference cost. The platform therefore compares the expert to a transfer price that exceeds the real resource cost of using the expert. Proposition 5 shows that this markup produces an under-routing wedge even in the absence of self-preferencing.
In actual AI markets, the analogs are numerous: wholesale API prices, platform commissions, revenue-sharing obligations, technical restrictions on external calls, minimum-spend commitments, and the inability of outside providers to interoperate on equal terms with the platform’s in-house model. These are all ways in which the gateway can make third-party expertise costly to invoke.
The model therefore points to a familiar but important conclusion from platform economics: conduct remedies and price-structure remedies are complements. A policy that eliminates discrimination but leaves outside access overpriced corrects only part of the problem. A policy that forces better access terms but allows the platform to tilt routing toward its own model also corrects only part of the problem. The relevant object is the joint design of the routing rule and the access contract.
One can see this directly from Proposition 5. The platform’s period-1 under-routing equals the sum of three nonnegative wedges. A neutrality rule removes only b / δ . A cost-based access rule removes only ( w m ) / δ . The remaining distortions survive unless each wedge is addressed on its own terms.

5.3. Data Governance and the Dynamic Wedge

The third implication is specific to AI routing. Even if a regulator perfectly eliminated self-preferencing and forced marginal-cost access, decentralized routing would still remain inefficient because the platform does not internalize the future value of expert learning. This is the data-feedback wedge isolated in Corollary 2.
That result suggests that AI governance should pay attention not only to current referral neutrality but also to the allocation of learning opportunities. In practice, those opportunities may depend on access to labeled outcomes, failure cases, side-by-side comparisons, or user-feedback logs. If the integrated platform controls those assets, it can preserve a dynamic advantage for its own model even under formally neutral routing.
Three types of policy instrument emerge naturally.
First, the regulator may require minimum auditability of routing logs and outcomes. If outside experts can document where they were bypassed and how they would have performed, the platform’s control over learning opportunities is reduced.
Second, the regulator may support data portability or data-sharing obligations in narrow, contestable forms. The objective is not unconditional diffusion of all data. It is to prevent the routing intermediary from using traffic control to monopolize the information generated by user interactions. The broader literature on data governance argues that data-sharing institutions can matter materially for innovation and competition in digital markets [16]. Our model gives a simple AI-routing rationale for such instruments.
Third, the regulator may in some contexts mandate limited outward referral or comparative testing on borderline queries. In our framework, the planner values additional expert referral because it raises future expert quality. A practical analog is a rule that requires periodic benchmarking, contested escalation rights, or a minimum external referral share for hard or high-stakes tasks.
The key point is conceptual. In many AI deployments, routing is not only a procurement decision. It is also a learning-allocation decision. Once that is recognized, the design of the router acquires a dynamic industrial-organization dimension that static neutrality rules alone cannot capture.

5.4. High-Stakes Domains and Escalation Design

The results are especially relevant in regulated or high-stakes environments. Consider legal, medical, financial, or safety-critical AI systems in which complex cases are exactly the ones that should be escalated to specialized experts or human review. If the platform self-preferences a cheap in-house model, or if it faces distorted access terms for the stronger outside system, it will set the escalation threshold too high. The resulting errors are concentrated among difficult cases, which are often precisely the cases with the largest downside risk.
The model suggests that such harms may be understated by average-performance metrics. A platform can satisfy broad performance targets while still routing too many borderline or difficult queries to the weaker in-house system. Sectoral governance may therefore need to focus on triage architecture: which cases are escalated, under what conditions, and with what audit trail. In the language of the model, regulation should attend to the cutoff itself, not only to unconditional model quality.

5.5. Empirical Predictions

Although the paper is theoretical, it yields several empirical predictions.
First, holding observable cost and quality constant, ownership links between the router and a candidate model should increase the probability that the owned model is selected. This ownership effect should be largest on intermediate-difficulty queries, where the platform is closest to indifferent and therefore has the greatest room to steer traffic.
Second, platform decisions that worsen outside access terms—higher commissions, weaker interoperability, slower latency guarantees, or more restrictive API conditions—should lower not only current third-party routing shares, but also subsequent third-party quality investment or performance improvements. The dynamic margin is a core prediction of the model and distinguishes it from a purely static diversion story.
Third, the damage from self-preferencing should be greatest in market segments where traffic is particularly important for capability accumulation. In our notation, those are environments with a larger λ . Empirically, one would expect stronger adverse effects in domains where user feedback is rich, labels arrive quickly, or domain adaptation depends heavily on active deployment.
Fourth, if regulators improve data portability, auditability, or contestability at the routing layer, one should observe not only more outward referrals but also greater responsiveness of referrals to measured relative performance. In the model, better governance reduces the non-performance wedges and increases the alignment between routing and comparative quality.
These predictions suggest a natural empirical agenda. The relevant evidence is not confined to cross-sectional traffic shares. It also includes how routing reforms affect subsequent investment, performance growth, and entry by specialized experts. For AI gateways, the dynamic response of outside capability may be the most informative outcome variable.

5.6. Extensions

The model can be extended in several directions without changing its core logic.
A first extension is many-expert routing. Suppose the platform can route across multiple outside experts with different ( q j , γ j , w j ) tuples. Under single crossing, the query space would be partitioned into intervals. Self-preferencing would then shift several routing boundaries rather than one. The central force would remain unchanged: a dual-role platform would distort the allocation of both demand and learning opportunities across specialists.
A second extension is user-side screening. Some users may be able to pay for escalation, wait longer for better models, or strategically rephrase queries to obtain stronger service. Embedding such behavior would connect AI routing to classical screening and referral design. The platform could then distort not only invisible backend routing, but also the user-facing menu that governs access to outside experts.
A third extension is endogenous entry. In the current model, the presence of an outside expert is taken as given. Yet Proposition 3 already implies the basic entry logic. A larger self-preferencing wedge compresses the revenue base and the learning base available to outsiders. In a richer environment, that would reduce entry into specialized capabilities. The router would then affect not only how demand is split among existing models, but also whether such specialists appear at all.
A fourth extension is richer data governance. Instead of the simple law of motion q 2 = q + λ D 1 , one could allow some fraction of all queries to generate portable data, or allow the platform to choose whether outcomes are disclosed to the expert. Those choices would make data governance an explicit control variable. The present framework already indicates what such an extension would deliver: withholding portable information would become another way for the integrated intermediary to enlarge the dynamic wedge.
These extensions all point in the same direction. Once the routing layer is treated as a strategic intermediary, questions commonly described as AI-governance problems can often be restated as industrial-organization problems about platform conduct, access, and investment incentives.

6. Conclusion

AI deployment increasingly depends on routers, gateways, and assistants that decide which model answers which query. This paper argues that such routing should be analyzed as an industrial-organization problem rather than as a purely algorithmic choice rule. A routing intermediary allocates traffic across models. If it owns one of those models, routing becomes endogenous to self-preferencing. If outside expertise is priced through access contracts and improves through routed traffic, then the platform simultaneously allocates current demand and future capability.
We build a tractable model of that environment. The platform routes queries between an in-house incumbent and an outside expert. The expert chooses an access price and a quality investment. Period-1 expert demand raises period-2 expert quality through data feedback. In equilibrium, routing is a cutoff rule in each period. Self-preferencing raises the cutoff, lowers routed demand, and depresses quality investment. Stronger data feedback increases the value of outside traffic, but it also amplifies the harm from bias because diverted demand now carries a dynamic cost.
Relative to a conditional dynamic first best, decentralized routing features three wedges: an access-markup wedge, a bias wedge, and a data-feedback wedge. The first two are familiar from platform economics. The third is distinctive to AI routing because traffic allocation is also learning allocation. For that reason, neutrality rules and access-pricing rules are complements, but even both together are not enough when the platform does not internalize the future value of outside learning.
The broader lesson is simple. In AI markets, the question is often not merely which model is best. It is who controls the router that decides which model is used, under what commercial terms, and with what consequences for future competition. Once that control layer is made explicit, the economics of platforms, biased intermediation, innovation incentives, and data governance become central to the analysis of AI systems.

References

  1. Chen, L.; Zaharia, M.; Zou, J. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance. Transactions on Machine Learning Research Published in December 2024. 2024. [Google Scholar]
  2. Ong, I.; Almahairi, A.; Wu, V.; Chiang, W.L.; Wu, T.; Gonzalez, J.E.; Kadous, M.W.; Stoica, I. RouteLLM: Learning to Route LLMs with Preference Data. In Proceedings of the Proceedings of the International Conference on Learning Representations; 2025. [Google Scholar]
  3. Panda, P.; Magazine, R.; Devaguptapu, C.; Takemori, S.; Sharma, V. Adaptive LLM Routing under Budget Constraints. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025, 25297–25313. [Google Scholar] [CrossRef]
  4. Rochet, J.C.; Tirole, J. Platform Competition in Two-Sided Markets. Journal of the European Economic Association 2003, 1, 990–1029. [Google Scholar] [CrossRef]
  5. Armstrong, M. Competition in Two-Sided Markets. The RAND Journal of Economics 2006, 37, 668–691. [Google Scholar] [CrossRef]
  6. Jullien, B.; Sand-Zantman, W. The Economics of Platforms: A Theory Guide for Competition Policy. Information Economics and Policy 2021, 54, 100880. [Google Scholar] [CrossRef]
  7. Hagiu, A.; Jullien, B. Why Do Intermediaries Divert Search? The RAND Journal of Economics 2011, 42, 337–362. [Google Scholar] [CrossRef]
  8. de Cornière, A.; Taylor, G. A Model of Biased Intermediation. The RAND Journal of Economics 2019, 50, 854–882. [Google Scholar] [CrossRef]
  9. Hagiu, A.; Teh, T.H.; Wright, J. Should Platforms Be Allowed to Sell on Their Own Marketplaces? The RAND Journal of Economics 2022, 53, 297–327. [Google Scholar] [CrossRef]
  10. Etro, F. e-Commerce Platforms and Self-Preferencing. Journal of Economic Surveys 2024, 38, 1516–1543. [Google Scholar] [CrossRef]
  11. Athey, S.; Ellison, G. Position Auctions with Consumer Search. The Quarterly Journal of Economics 2011, 126, 1213–1270. [Google Scholar] [CrossRef]
  12. de Cornière, A. Search Advertising. American Economic Journal: Microeconomics 2016, 8, 156–188. [Google Scholar] [CrossRef]
  13. Che, Y.K.; Hörner, J. Recommender Systems as Mechanisms for Social Learning. The Quarterly Journal of Economics 2018, 133, 871–925. [Google Scholar] [CrossRef]
  14. Arrow, K.J. The Economic Implications of Learning by Doing. The Review of Economic Studies 1962, 29, 155–173. [Google Scholar] [CrossRef]
  15. Akcigit, U.; Liu, Q. The Role of Information in Innovation and Competition. Journal of the European Economic Association 2016, 14, 828–870. [Google Scholar] [CrossRef]
  16. Graef, I.; Prüfer, J. Governance of Data Sharing: A Law & Economics Proposal. Research Policy 2021, 50, 104330. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated