Appendix B
B-1 Why Accuracy Improvements Do Not Eliminate the Need for Flow Limitation
The previous work responded to the objection that, if AI model capabilities improve, the error probability p will decline and expected loss will therefore decrease. The response was based on two points: first, that p = 0 is in principle unattainable in probabilistic systems (Xu, Jain & Kankanhalli, 2024), and second, that V continues to expand faster than humanity’s cognitive capacity improves. In addition, probabilistic systems cannot internally determine the correctness or incorrectness of their own outputs (Consistent Reasoning Paradox, Colbrook et al., 2024).
The V × L framework further strengthens this response. The claim that accuracy improvements make flow limitation unnecessary presupposes that, once accuracy improves sufficiently, L becomes sufficiently small and V × L remains within C_max. However, as discussed above, only the verification component of L can decline through accuracy improvements. The response component is independent of AI accuracy.
The objection that “if AI accuracy improves sufficiently, response work itself can be delegated to AI, and therefore human cognitive load will continue to decrease” is unlikely to hold for the following reason. Even if response work is delegated to AI, in high-loss domains the legal responsibility for the result of that response is borne by a natural or legal person (R = 1). Therefore, at the moment the response is delegated to AI, the cognitive load of verifying the response result arises. If that verification is also delegated to AI, then verification of the verification result arises. At the end of this recursion, there is always a human, and C_max applies to that human.
Automating response does not eliminate cognitive load. It only transfers it recursively.
A simpler example is a real-name-registration social network. Even if 99.9% of users correctly register under their real names, once an incident occurs, it is still necessary to independently verify each time whether the account that made the problematic post is truly registered under a real name. The burden of verification is not determined by the incidence rate of fraud. It is determined by the possibility that fraud is not zero. No matter how much accuracy improves, as long as the system is probabilistic, p > 0. And as long as p > 0, the verification burden remains.
One reason this mathematical fact has not become a premise of AI governance debates may be a bias widely demonstrated in cognitive science. Weinstein (1980, 1982) identified unrealistic optimism as a systematic human tendency to underestimate future risks, and showed that this bias is caused by mistaken extrapolation from past experience: “because no problem occurred before, no problem will occur in the future.” Since this bias has been observed universally across age, gender, educational level, and occupation, and has been replicated in more than 1,000 studies (Shepperd, Klein, Waters & Weinstein, 2013), it is natural to assume that it also applies to AI.
In the context of AI, the problem is that this bias operates across domains. At present, the majority of everyday AI use occurs in low-risk domains, such as drafting business emails. Even when errors are present, they are unlikely to be discovered, and even if discovered, the damage is minor. Experiences accumulated in this environment — “even if there were errors, things were fine” — distort risk estimation in high-loss domains through the mechanism identified by Weinstein.
This distortion is also reinforced by the market-formation strategy of the AI industry. Commercial diffusion of AI began in low-risk domains such as summarization, translation, idea generation, and copywriting. In these domains, errors can be offset by gains in total productivity, and the logic that “errors are tolerable as long as the overall result pays off” has been practically viable. The market perception formed under this logic — that AI will become sufficiently usable as long as accuracy improves — may be directly extrapolated to high-loss domains.
However, this extrapolation is precisely what existing institutional designs of professional licensing have explicitly rejected. Professions such as lawyers, physicians, tax accountants, and pilots require legal qualifications in every jurisdiction. Summarization and idea generation do not require national licenses. This difference is not accidental. Professional licensing systems are the institutional expression of a social judgment that even a small number of errors is unacceptable in the relevant domain. In other words, the training process required for qualification embeds the premise that “judgment in this domain is qualitatively different from ordinary work that requires no license.”
In domains where errors are unacceptable, legal constraints are imposed on business designs that may induce errors in the first place. Clinical-trial requirements in medicine, operating rules in aviation, and internal-control requirements in finance are all based on the idea of regulating in advance the designs under which errors may occur, rather than responding to errors only after they occur.
There is a clear asymmetry across domains in AI output. With respect to legal advice, because regulations corresponding to unauthorized-practice-of-law rules exist in many countries, LLM providers can plausibly justify restricting outputs that constitute legal advice. Although such restrictions are fragile and contain many practical loopholes, they can at least function as a form of physical friction similar to flow-rate limitation.
By contrast, no LLM safety system restricts tasks such as drafting patent claims, writing academic papers, or designing clinical trials. An AI system that refuses all scientific questions or technical interests in the name of safety would not be commercially viable. This asymmetry means that voluntary brakes on the LLM side against V explosion exist only in a subset of high-loss domains. Surges in patent applications or academic submissions cannot be stopped by LLM tuning. Since there is no restriction on the output side, flow limitation on the receiving side becomes unavoidable.
This problem can be restated through the pharmaceutical-approval example used in the previous work. Suppose that a review agency processes 100,000 applications per year with AI assistance and that accuracy improvements reduce the error rate to 0.01%. Verification costs decline. However, even for correctly approved drugs, response work occurs item by item, including post-market safety monitoring, evaluation of adverse-event reports, and revisions to package inserts. Once the number of cases expands to 100,000, even if verification cost is zero, the sum of response costs may exceed human processing capacity.
Error-rate improvements postpone the collapse of oversight, but they cannot eliminate it. The previous work described error-rate improvement as a treadmill: the system must run faster merely to remain in the same place, and even a slight decrease in running speed immediately turns into an increase in absolute harm. This characteristic becomes even clearer under the framework of the present paper.
Bastani and Cachon (2025) independently derive the economic dimension of this problem from a contract-theoretic framework. In their model, as AI accuracy improves, errors become rare, and opportunities for supervisors to actually detect errors and receive rewards decline. As a result, the compensation required to economically motivate supervisory effort diverges. In other words, accuracy improvement collapses the incentive design of supervision.
The argument in this paper and the argument by Bastani and Cachon are complementary. Bastani and Cachon show that even when a supervisory regime is complete, supervision cannot be economically motivated. We show that even if supervision were economically motivated, the supervisor’s cognitive processing capacity would be exceeded. When both hold simultaneously, supervision enhancement has no solution either in terms of incentives or in terms of cognitive capacity. This more strongly supports the conclusion of the previous work that institutional limits on output volume itself are necessary.
Verification cost also depends strongly on the time required to detect the first breakdown contained in an output. Low-quality outputs that contain obvious factual errors or outdated information can be rejected at an early stage of verification, consuming relatively little cognitive cost. As AI capabilities improve, however, such easily detectable breakdowns decrease. Outputs that appear coherent on the surface and formally appropriate, but contain errors that only an expert can detect after reading the entire text carefully, maximize verification cost. In other words, AI capability improvements may increase L by raising the difficulty of detecting breakdowns, rather than lowering the verification component of L.
It is important to distinguish the task of confirming whether an output is correct from the response work that occurs after confirmation, because AI capability improvement acts asymmetrically on these two components. Intuitively, one might think that improved AI model accuracy reduces the burden of verification. In practice, however, accuracy improvement operates less by making verification easier than by inducing its omission.
For outputs from a model that is 99% accurate, it is extremely difficult, both cognitively and economically, for supervisors to maintain independent verification item by item. Moreover, because accuracy improvement increases the absolute number of correct outputs, the response burden increases. If a model with 30% accuracy produces 100 outputs, 30 outputs are correct and proceed to downstream processing. If a model with 99% accuracy produces 100 outputs, 99 outputs are correct, and 99 outputs proceed to downstream processing. Verification is required for all 100 outputs in both cases. Therefore, accuracy improvement increases the number of downstream cases from 30 to 99 while not eliminating the need to verify all items. Furthermore, because errors from a 99% accurate model appear only rarely, if the human checking regime has become nominal, the condition discussed in the previous work — E_detected ≪ E_actual — becomes more severe.
B-2 On AI-Use Disclosure Regimes
Regimes requiring disclosure of whether AI was used are being introduced in many domains. However, if there is no means of verifying the truth of the disclosure, supervisors cannot exclude the possibility that AI was involved. Even for a work product declared to have been produced without AI, supervisors must read it on the assumption that AI may have been involved.
This uncertainty creates an additional triage burden before the content of the output can be evaluated: the supervisor must triage the origin of the output and the evaluation criteria to be applied. In an environment where AI use has become widespread, even work products created without AI must be verified on the assumption that they may be low-quality AI-generated artifacts. The diffusion of AI therefore increases supervisory cost even when AI was not used.
These are examples of a vulnerability common to self-disclosure regimes. For penalties against false disclosure to function effectively, there must be an independent means of verifying the content of the disclosure. However, no reliable method has been established for post hoc distinguishing between generative AI output and human-created work.
Liang et al. (2023) evaluated seven widely used GPT detectors and showed that more than half of essays written by non-native English speakers were misclassified as AI-generated, while simple prompt manipulation reduced detection rates from 100% to 13%. OpenAI’s own AI text classifier, released in 2023, correctly identified AI-generated text only 26% of the time and was discontinued because of insufficient accuracy.
Thus, approaches that automatically determine the origin of content using AI lack reliability in both false positives and false negatives. They cannot be used as the foundation of a human supervisory regime.
B-3 Why Triage Is Costly
Why triage consumes substantial human cognitive resources can be explained by findings in cognitive science that have accumulated independently of AI research. The observation that conscious processing concentrates resources on selection among multiple candidates was systematized in Baars’s (1988) global workspace theory. Among the countless processes running in parallel in the brain, only those that compete with one another and require integrated judgment are elevated into the conscious workspace, where they compete for limited bandwidth.
Dehaene and Naccache (2001) extended this theory neuroscientifically and showed that conscious access involves global activation of distributed cortical networks. The basic structure — that conscious processing is a scarce resource and is intensively consumed when conflicts among multiple candidates must be resolved — has been reproduced repeatedly in subsequent research (Dehaene et al., 2017).
When parallel processes are resolved automatically, this scarce resource is not consumed. Schneider and Shiffrin (1977) demonstrated that repeatedly trained information processing becomes automatic processing that does not require consciousness, while processing in novel and ambiguous situations becomes controlled processing that requires conscious resources. Kahneman’s (2011) distinction between System 1 and System 2 restates this distinction for a general audience. The important question is what activates System 2, and the answer is ambiguity that cannot be resolved by automatic response.
Within decision-making contexts, depletion of finite decision-making resources has also been observed. Baumeister et al. (1998) showed that the quality of judgment declines in subjects who are repeatedly required to make decisions or exercise self-control, and named this phenomenon ego depletion. Replications of this effect vary, and debate continues regarding effect size (Hagger et al., 2016). Nevertheless, the general direction — that there is an upper bound on the number of high-quality judgments that can be made in a day — is independently supported by research on decision fatigue (Vohs et al., 2008; Danziger et al., 2011). Judgment resources are not infinite. Once consumed, they require time to recover.
What these findings show is the following proposition: conscious judgment is a scarce resource, and what triggers its consumption is not the amount of information, but the presence of ambiguity. Even if the amount of information increases, resources are not consumed if it can be processed by automatic response. Conversely, even if the amount of information is small, conscious resources are drawn upon when multiple interpretations coexist and cannot be resolved automatically.
When an AI-generated text may be a fact, an inference, an opinion, or a fictional narrative, and none of these interpretations can be rejected automatically, the reader must place multiple interpretations in the conscious workspace and choose among them. As a trigger for conscious processing, this satisfies a classical condition. The fact that generative AI outputs consume substantial cognitive resources is not an AI-specific phenomenon. Rather, AI outputs systematically generate the conditions described by consciousness research since Baars.
It is also logically derived from the design of generative AI that automatic response is difficult to establish, because generality and human-likeness are central to its commercial value. The ambiguity of triage is therefore not a transient problem caused by the immaturity of current AI. It is a property inherent in the design of general-purpose AI. As AI capabilities improve and output style becomes closer to human speech, triage ambiguity increases rather than decreases.
The information sources that human cognitive systems have processed over tens of thousands of years can be roughly divided into the natural environment, other animals, and other humans. Allocation of conscious resources toward these sources has been adjusted by selection pressures on an evolutionary time scale. Generative AI supplies outputs that are statistically very similar to speech from other humans, but are not produced by other humans, at a rate whose marginal cost approaches zero. Put differently, it continuously fires triggers that ignite conscious processing at a rate exceeding human processing capacity. The finitude of conscious resources is an evolutionarily fixed constraint. Since that constraint is not removed by improvements in AI capability, the accumulation of triage cost inevitably collides with finite resources.
B-4 Human Supervisors as Discriminators
The development process of commercial interactive AI can be characterized as adversarial training between a generator and a discriminator (Goodfellow et al., 2014). The generator produces outputs. The discriminator evaluates whether those outputs are plausible as human speech, and the evaluation result is fed back into updates of the generator. RLHF (Ouyang et al., 2022) is a large-scale implementation of this structure through human comparison evaluation and reward models.
The role of discriminator is played, at the training stage, by human evaluators who provide preference data, and at the deployment stage, by human users who use the service. The generator is updated with each model generation and moves closer to more human-like output by reflecting the history of this adversarial process. Human cognition, which plays the role of discriminator, is updated only on an evolutionary time scale and is nearly fixed on the time scale of product cycles.
As the adversarial process repeats across generations, the cues that allow the discriminator to distinguish the generator’s output as “something other than human speech” diminish. This is an explanation from the generator side of the pathway, discussed in the previous section, through which triage cost rises over time. The important point is that this asymmetric adversarial process is not an accidental technical side effect. It is part of the very definition of capability improvement in current AI.
The return of triage and criterion invocation from the device side to the user side is already observed in AI evaluation. In generational comparisons of physical products, agreement forms when evaluation can be reduced to a hierarchy of physical quantities. The superiority of an old television and the latest 8K television can be judged by external criteria such as resolution, contrast ratio, and response time. Even in comparisons among video games, once factors such as map size, number of missions, and resolution are decomposed, opinions converge within each factor, allowing discussion to move on to what individuals personally value.
By contrast, in generational comparisons of AI models, the release of a new model is repeatedly accompanied by the coexistence of polar evaluations such as “AGI has arrived” and “it is worse than the previous generation.” Benchmarks have lost reliability for comparison because of contamination in training data and leakage of evaluation sets (Sainz et al., 2023; Balloccu et al., 2024). Different use cases require different axes for evaluating intelligence, and users themselves must determine which axis should be invoked.
Indistinguishability does not eliminate triage. It increases triage cost. Even when discrimination is impossible, empirical evidence shows that supervisors continue to switch evaluation criteria depending on whether an output is AI-derived (Longoni et al., 2022; Altay & Gilardi, 2024). Therefore, the attempt at triage itself continues. And as discrimination accuracy approaches chance level, the cognitive resource consumption per attempt increases.
If supervisors abandon triage and process all outputs under a uniform criterion, they must choose either to treat all outputs as AI-generated and apply maximum verification cost to all items, or to treat all outputs as human-generated and omit verification. The former maximizes the judgment component of L for all items. The latter means abandoning supervision in high-loss domains.
B-5 The Cognitive-Load Externalization Function of Information Infrastructure
Many information infrastructures used by humans have functioned by moving individual cognitive load outside the individual. Writing moved memory retention from the brain to material objects (Goody, 1977; Ong, 1982). Money moved comparison of the value of goods from individual negotiation to a common unit. Double-entry bookkeeping moved the consistency check of transactions from memory and mental arithmetic to mechanical verification in ledgers. Legal systems moved the judgment of dispute resolution from the parties to procedures and precedents. Scientific publication and peer review moved the evaluation of the truth of claims from individual persuasiveness to reproducibility and mutual examination. Internet search indexes moved memory of where information is located from individuals to search engines.
What these systems share is a design direction in which cognitive load, instead of being completed within the individual, is distributed through external materials, procedures, and institutions. Hutchins (1995) organized this phenomenon as distributed cognition, and Clark and Chalmers (1998), under the concept of the extended mind, presented a framework for treating cognitive systems that include external devices as the unit of analysis.
A common property of these information infrastructures is that they are designed to suppress users’ triage cost. When looking up a dictionary entry, the order of entries is fixed by an external criterion, alphabetical order. The user does not have to re-determine “this is the headword,” “this is the definition,” or “this is an example.” Because such designs process triage in advance through devices or institutions, conscious resources can be concentrated on judgment and response.
Generative AI output runs counter to this externalization. As discussed above, because generative AI output does not fix boundaries among semantic types on the device side, triage returns to the user side. This is not a design feature of individual applications. It is a design consequence that cannot be avoided as long as generality and human-likeness are placed at the center of commercial value.
Triage costs that had historically been externalized must therefore be processed again by individual cognitive resources when the new information source is generative AI output. The diffusion of generative AI also spreads into externalized information infrastructures themselves. For information sources such as search-engine results, encyclopedia entries, preprints, and news articles — sources for which triage had previously been processed by the device or institution side — the contamination of generative-AI-derived outputs forces users to determine item by item whether a given item originates from generative AI. Triage that had been omitted under the premise that the source itself was reliable returns at the level of each item.
Multiple experiments have repeatedly observed that this triage is processed before other evaluation axes. Longoni, Fradkin, Cian, and Pennycook (2022) showed that the credibility evaluation of a news article declines when the same content is labeled as “AI-generated.” Altay and Gilardi (2024), in a preregistered experiment (N = 4,976), reported that attaching an “AI-generated” label to a headline reduces both perceived accuracy and intention to share, regardless of whether the content is true or false and regardless of whether it was actually written by a human or by AI. These results show that determining whether content is AI-derived is processed independently of, and often prior to, the evaluation axis of the content itself.
B-6 Consequences of the Hollowing-Out of Human Oversight
When V × L exceeds C_max, the institution does not necessarily stop immediately. Rather, in many cases, supervisors can be expected to come to work as before, approve items as before, and report processing counts as before. From the outside, the institution appears to be operating normally. However, what is maintained in this state is not substantive oversight, but a flow of formal approval.
Supervisors are not sufficiently verifying individual outputs. In that case, it becomes rational for supervisors to rely on external signals rather than the content itself in order to minimize their own responsibility risk. In academic peer review, proxy indicators include institutional affiliation, existing reputation, the author’s country, institutional email addresses, and past citation records. In law, administration, and pharmaceutical review, similarly, the applicant’s institutional status, representative’s qualifications, and relationships with existing organizations become substitutes for content evaluation.
This occurs because supervisors are more likely to face clear responsibility if they allow AI slop to pass, or if they reject a valid application from a famous institution. By contrast, if they reject a legitimate low-signal application without reading it, the loss is likely to be invisibilized.
B-7 Limits of Content Judgment
In current AI governance debates, human oversight almost always means content judgment. Humans judge whether the output text is accurate, whether generated code contains vulnerabilities, or whether a drafted document contains legal defects.
However, as this chapter has shown, as long as content judgment remains the foundation of governance, if AI capability continues to grow beyond the sum of human capability improvement and institutional reinforcement, many high-loss domains cannot avoid the collapse of oversight. Delegating content judgment to AI itself does not solve this problem.
Nor is this problem limited to hallucination. For example, even when an AI-generated summary consists only of facts contained in the original document, it may reverse the intention of the original document through the choice of what to retain and what to omit. Moreover, measures including source disclosure, retrieval-augmented support, cross-checking by multiple LLMs, output constraints, and all forms of prompt engineering may reduce errors through improvements in models or operating methods, but they cannot reduce errors to zero.
If AI is asked to judge whether an output is correct, the question of who verifies the correctness of that AI judgment arises recursively. Therefore, what is needed is not to improve content-judgment capability, but to bypass it.