6. Discussion
The results support the central argument of the paper: ethical infeasibility in AI-enabled infrastructure systems is not a single phenomenon with a single cause. The dominant algorithmic-fairness literature characterizes one such cause, namely the internal inconsistency of fairness criteria within a classifier, and proves that several natural fairness objectives cannot be satisfied simultaneously. The Structural Infeasibility Theorem identifies a categorically different cause. The bounds
and
of Eqs. (
9)–() depend only on the travel-time matrix, the spatial distribution of demand, and the group-composition profile. These are properties of the physical and demographic environment, not of any algorithmic design choice. When the interval
excludes zero and the implied parity floor exceeds the regulator’s tolerance, no allocation policy in
can satisfy demographic parity, and no algorithmic adjustment can recover it. The source of the failure lies in the infrastructure, and it is invisible to any audit that examines only the algorithm.
The case study makes this distinction operationally concrete in two complementary ways. The first concerns attribution. The IIS certificate for the HC-MS infeasibility contains only travel-time entries from the depot–zone topology; it contains no algorithmic constraint, no fairness rule, and no rule-set inconsistency. Algorithm-only auditing of the dispatch system under the three-depot configuration would correctly detect that the system violates the NFPA 1710 standard, but it would have no way to identify the cause as infrastructural and no basis for a constructive remedy. The IIS procedure supplies both: a formal attribution of cause and a quantified specification of the physical investment required to restore compliance.
The second concerns welfare. The harm-redistribution finding of
Table 10 shows that equity attained within fixed infrastructure differs in kind from equity attained through capital investment. Under the pre-investment Pareto incumbent, every group experiences a worse absolute outcome than under the operationally optimal baseline; under the post-investment corner of
Table 11, every group experiences a better one. Two policies can therefore report identical disparity values while embodying opposite welfare structures, and an equity audit that reports only the disparity statistic cannot tell them apart. As the case study establishes, this is not incidental to the present instance: within a fixed infrastructure whose binding floor is already attained, disparity can be reduced only by leveling the advantaged group down, never by lifting the disadvantaged group up. The welfare consequence is that an apparent equity gain and a genuine one are indistinguishable at the level of the summary statistic, which is why the distinction must be drawn at the level of the population-impact vector.
These findings carry three implications for the ethical governance of AI-enabled infrastructure systems. First, the scope of an audit must extend beyond the allocation algorithm to the physical system on which it operates. The IIS certificate is the formal object that determines whether observed inequity is attributable to the algorithm, to the infrastructure, or to a rule conflict, and the appropriate intervention follows from that attribution. Second, equity statistics evaluated on a fixed infrastructure can conceal harm-redistribution patterns that are ethically consequential. The population-impact decomposition of
Section 5.5 should therefore be treated as a minimum standard for distributional analysis: it is not enough to report that disparity fell; one must also report which groups were slowed and which were sped, in order to characterize the welfare structure of the change. Third, when the certificate attributes infeasibility to physical infrastructure, the framework returns not a refused deployment but a quantified capital-investment specification. This shifts the locus of decision from the algorithmic-design layer to the capital-planning layer of the governing institution and supplies the technical basis for that shift.
The relationship between the framework and existing fairness-optimization methods is best assessed along three axes: solution quality, diagnostic capability, and computational efficiency. With regards to solution quality, the framework’s optimization layer is a standard
-constraint procedure that sweeps a tolerance on one objective, here inter-group disparity, while minimizing the other, and records the non-dominated points to trace the Pareto frontier. On a feasible instance, this frontier is not a property of the procedure that computes it but of the problem itself: the set of non-dominated efficiency–equity trade-offs is fixed once the objectives and the feasible region are fixed. Any method that optimizes the same objectives over the same region must therefore reach the same frontier, whether it imposes equity as an explicit constraint to be tightened or aggregates outcomes through an order-weighted welfare function that penalizes inequality. The methods differ only in how they scalarize or traverse the frontier.
Table 12 confirms this on the case-study instance: the optima of an efficiency-only, an equity-constrained, a weighted-sum, and a lexicographic min-disparity formulation each coincide with a point on the Pareto frontier of
Table 9. Solution quality is therefore identical by construction, so a head-to-head quality comparison on feasible instances would measure the choice of scalarization rather than the merit of the approach. The single informative exception is a method that minimizes a
different equity functional, such as a Gini or order-weighted objective, whose optimum is Pareto-optimal in its own objective space but dominated in the maximum-disparity projection in which the regulator’s tolerance is expressed. The framework reproduces any of these by selecting the corresponding soft objective and is in this sense a superset of the individual formulations.
The distinctive capability of the framework appears where these methods fall silent, namely at infeasibility. When the feasible region is empty, a conventional equity-aware optimizer can do one of two things, and neither is diagnostic. It can halt with an “infeasible” status, a single bit that reports the failure but says nothing about its source. Alternatively, if the binding requirement is softened, it can return a least-violating point; depending on the violation metric, that point may localize the deficit to the unreachable zones or spread it across the population, but in neither case does it certify that the violation is irreducible across all policies, attribute its source to the infrastructure rather than the dispatch policy, or indicate how to remove it. Impossibility results from the fairness literature reach only rule–rule contradiction: they establish when fairness criteria are mutually inconsistent as logical statements, but they are formulated independently of the physical system and so cannot detect a conflict that arises from the interaction between a rule and the infrastructure on which it operates.
Table 13 contrasts what each approach can report on the infeasible minimum-service instance (HC-MS at
, where no assignment in
complies). The IIS procedure localizes the infeasibility to the travel-time entries for zones
and
, the zones no depot can reach within the standard, certifies the conflict as infrastructural rather than algorithmic or rule-internal, and converts that certificate into a quantified depot-siting specification that states the physical change required to restore feasibility. The comparison is therefore not one of competing solvers on the same problem, but of what each approach can say when the problem has no solution at all.
A third baseline family deserves separate consideration, because it is the most widely deployed algorithmic fairness remedy: post-processing and reweighting methods, which adjust the decision rule after training through group-specific calibration, threshold shifting, or instance reweighting. These are the canonical algorithmic remedy for outcome disparity, yet on the case-study instance none of them can restore minimum-service compliance. The set of achievable response times for a zone is , fixed by depot geography; for and the smallest element of this set is 9 and 11 minutes, both exceeding . Post-processing changes which attainable outcome is selected, not the attainable set itself, so reweighting, threshold adjustment, and group-conditional calibration are all powerless against an infeasibility whose source is that the compliant outcome is not attainable under any rule. This is essentially the content of the IIS certificate, and it sharpens the paper’s central claim: the dominant algorithmic-fairness intervention operates on the decision rule, but here the binding constraint is not in the rule. The contrast is not that the framework reweights better; it is that reweighting cannot reach the failure at all.
The capacity of the framework to emit an informative infeasibility certificate also bears on a practical concern about constraint-based formulations: that placing fairness requirements in the hard tier may render a problem infeasible. It may, and in a well-posed model it should, because a hard constraint encodes a non-negotiable requirement, and when no policy satisfies it the correct output is an infeasibility verdict rather than a silently degraded solution. The genuine risk is not infeasibility as such but over-specification: placing in the hard tier a requirement that properly admits trade-offs, or setting its tolerance tighter than any infrastructure can support, so that the feasible region is empty for reasons that should have been weighed on the Pareto frontier. The hard/soft tier distinction is the structural guard against this. Only truly non-negotiable requirements belong in ; requirements that admit legitimate trade-offs belong in and are resolved on the frontier rather than as feasibility cuts. When a hard constraint does bind, the IIS identifies exactly which constraint and parameter is responsible, so the governing institution can choose, on the record, to relax it, re-tier it, or invest in the physical system, rather than discovering the infeasibility as an unexplained solver failure.
On the third axis, computational efficiency, the comparison can focus on feasible instances: every method solves the same class of problem, so the optimization cost is shared, and the framework adds only the diagnostic step, deletion-filtering IIS extraction, at a cost linear in the number of hard constraints. This diagnostic also remains computable at deployment scale, because the procedure does not enumerate the decision space. The exhaustive evaluation of the
admissible assignments, and of the
assignments in the post-investment analysis, was a verification device used to confirm the analytical results exactly on a small instance; it is not a step the method requires. The diagnostic cost is instead governed by three operations, none of which scales combinatorially in the size of the instance. First, minimum-service feasibility (HC-MS) reduces to an independent per-zone test—zone
z is infeasible if and only if
—decided in
time without optimization. Second, the disparity bounds of the Structural Infeasibility Theorem are closed-form functions of the travel-time matrix, the demand intensities, and the group-composition profile, and are evaluated in the same
time, so demographic-parity feasibility is settled analytically rather than by search. Third, the only solver-dependent step is the Stage 2 joint-feasibility test, a single mixed-integer feasibility problem, after which the IIS is recovered by deletion filtering in a number of feasibility solves linear in the number of hard constraints,
, rather than combinatorial in problem size. The entire computational burden therefore concentrates in one standard mixed-integer feasibility problem, precisely the class for which modern branch-and-cut solvers are routinely effective on instances far larger than the case study. We accordingly expect the procedure to remain tractable for metropolitan systems with hundreds of demand zones and tens of facilities. To verify scalability, we ran the diagnostic on synthetic instances ranging from
demand zones and
facilities up to
zones and
facilities.
Table 14 reports the runtime of the three diagnostic operations. The per-zone minimum-service check and the closed-form disparity bounds complete in under a millisecond at every size, and the single mixed-integer feasibility test with IIS extraction stays under
s even for a decision space of
assignments, which cannot be enumerated. Because the cost is dominated by one feasibility solve rather than enumeration, the procedure remains tractable well beyond metropolitan scale.
A related question is how the stakeholder-selected thresholds, namely the parity tolerance , the service standard , and the activation states for dynamic rules, would be determined in practice. The framework treats these as normative inputs rather than quantities it computes, but it is not silent on how they are set; it supplies the information against which an informed choice can be made. In many domains the threshold is already fixed by a legal or regulatory standard. The case study’s s is the NFPA 1710 Advanced Life Support requirement, not a value chosen by the analyst, and analogous service standards, civil-rights disparate-impact criteria, and access mandates supply thresholds elsewhere. Where no external standard applies, the threshold is the responsibility of a normatively legitimate governance body, such as a regulator, an oversight board, or a public commission, ideally acting through a process that represents the affected communities. For that body, the framework provides three forms of decision support. The Structural Infeasibility Theorem reports, prior to deployment, the range of disparity achievable on the given infrastructure, so a tolerance can be anchored to what is attainable rather than set blind. The Pareto frontier exhibits the operational cost of each candidate tolerance, so a soft threshold can be chosen with its efficiency–equity trade-off visible. The IIS certificate, when a chosen threshold proves infeasible, identifies the physical change that would be required to meet it. What the framework does not do, deliberately, is adjudicate which threshold is normatively correct. That judgment belongs to institutional processes; the role of the framework is to make those processes better informed, not to replace them.
This paper has three main limitations, each pointing to ongoing work. First, the case study uses a synthetic instance rather than data from a specific city. This was deliberate: a synthetic instance allows every result to be verified exactly and every parameter to be reported, ensuring reproducibility that access-restricted municipal data would not permit. The study is therefore a proof of concept, and its qualitative conclusions depend on structural features (a southern coverage gap, multi-group zones, a binding service standard) rather than exact values. Validation on real-world EMS data is a direct next step, as the framework’s inputs map onto quantities available in deployed systems. Second, the case study is deterministic; extending the diagnostic to stochastic and robust formulations, under which the bounds of Theorem 1 hold in expectation, is a further step. Third, the framework takes the ethical rule set as given: it certifies whether the specified rules are feasible but does not judge whether they are normatively correct, a judgment left to the institutions that set them.