5. Core Theorems and Their Deduction
Based on the above five core axioms, eight core theorems of Nestology can be derived through logical deduction. These theorems cover the core dimensions of the system such as rule transmission, independence, capability boundary, expansion tendency, creation and competition, survival and extinction, and cognitive distortion, forming a complete theoretical system.
5.1. Theorem 1 – Theorem of Unidirectional Transmission of System Rule Constriction
Intuitive explanation: According to the functional positioning and operation scale of the subsystem, the parent system screens out an adaptive subset from its own rule set and transmits it unidirectionally to the subsystem. The subsystem can only receive and use these rules, and cannot transmit new rules back to the parent system. The operation experience of the subsystem can be used as feedback to help the parent system optimize the implementation form of its own rules, but it does not constitute the reverse transmission of rules.
For example, in the financial risk control nested system, the core risk control rules transmit the "non-discriminatory feature screening" rule to the feature engineering module, and there is no reverse transmission phenomenon; in the large model plugin ecosystem, plugins cannot modify the "call permission" rules of the parent system.
Theorem statement:
There exists a rule constriction mapping
There is no reverse transmission mapping
This mechanism is crucial for AGI — any recursive self-improvement can only occur at the strategy level (e.g., optimizing learning algorithms, improving reasoning efficiency), and must not touch the rule level (e.g., modifying ethical red lines, bypassing safety constraints). Nestology limits the self-improvement of AGI within a controllable range, fundamentally preventing the paradox of "the stronger the capability, the greater the risk of out-of-control".
Case verification (LLM → Structural large model → Sensors):
Rule constriction: Humans set "ethical safety red lines" for the LLM; the LLM constricts from it to transmit "physical interaction safety thresholds" to the structural large model; the structural large model further constricts to transmit "sensor collection frequency upper limit 10Hz" to the sensor module.
Unidirectional transmission: When the sensor module finds that "the collection accuracy decreases under strong light", it can only feedback to the structural large model through outputting abnormal data, and the structural large model adjusts the rules after evaluation (e.g., adjusting the frequency upper limit under strong light environment to 8Hz). The sensor cannot modify the rules by itself.
Experience feedback: The structural large model optimizes the rule constriction method according to the sensor feedback, but the optimized rules are still a subset of the human top-level rule set, and no new rules are added.
Deduction logic:It can be known from Axiom 2 (Nested Existence) that the parent system is the only source of rules for the subsystem; it can be known from Axiom 5 (Cognitive Finiteness) that the parent system cannot directly cognize the internal state of the subsystem, so the subsystem cannot make the parent system understand the new rules generated inside it; according to the definition of bounded independence of subsystems (absolute rule dependence), the subsystem itself does not have the ability to construct or modify rules. Therefore, rule transmission can only be unidirectional constriction.
The "experience feedback" mechanism described in Theorem 1 — the subsystem outputs abnormal signals to trigger the parent system to optimize rules — has received empirical support in AGI safety research. In the reward hacking detection framework proposed by Shihab et al. (2025) [
1], when an abnormal behavior pattern is detected, the system will trigger the "rule update" process. This is consistent with the mechanism in this theory that the parent system adjusts rule constriction according to subsystem feedback. Furthermore, the study also found that when the alignment between the agent reward function and the real goal is low, the hacking frequency increases significantly , which confirms the key role of the "rationality of rule constriction" in Theorem 1 for system stability.
5.2. Theorem 2 – Theorem of Bounded Independence of Subsystems
Intuitive explanation:Subsystems have dual attributes: absolute dependence on the parent system at the rule level and relative independence at the four-element level. That is to say, subsystems can independently handle daily affairs (screening input, executing processing, distributing output, optimizing strategies), but cannot formulate or modify rules by themselves. The parent system can regulate the boundary of the subsystem's independent right through the tightness of rule constriction.
can be independently optimized within Case verification:
Absolute rule dependence: All rules of the sensor module (collection format, frequency upper limit, measuring range) come from the structural large model. If the sensor attempts to "I want to collect 5G signals", which is not in the rule set, the parent system will directly prohibit it.
Relative independence of four elements: The sensor can independently screen input within the rule framework (e.g., choosing which physical quantities to collect), adjust the collection frequency (automatically reduce the frequency according to the environment within the 10Hz upper limit), and optimize the calibration strategy.
Parent system regulation: If the structural large model finds that the sensor is "too power-consuming", it can constrict the rules to reduce the frequency upper limit from 10Hz to 2Hz. The independent right of the sensor is contracted, but it is still a logically independent system, not a local part of the structural large model.
Deduction logic:It can be known from Axiom 1 (Element Completeness) that the subsystem needs to have five elements to survive; it can be known from Axiom 2 (Nested Existence) that the parent and child systems are logically non-contained. If the four elements have no independent rights, the subsystem will lose the meaning of independent existence; it can be known from Theorem 1 (Unidirectional Transmission of Rules) that the subsystem rules are completely dependent on the parent system. Therefore, the subsystem must be a finitely independent form of "rule dependence + element independence".
5.3. Theorem 3 – Theorem of Capability Boundary
Intuitive explanation:Any system has a capability boundary and cannot be omnipotent. This boundary is jointly determined by the finiteness of the five elements: finite input, finite rules, finite output, finite processing capability, and finite strategy optimization capability. The boundary will change dynamically with the state of the elements (e.g., the expansion of input scale, the upgrade of strategy optimization), but it always exists.
Among them:
denotes the input capability boundary, such as the maximum input rate, maximum capacity, etc.;
denotes the functional range covered by the rules;
denotes the functional range covered by the rules;
denotes the processing efficiency and precision range;
denotes the upper limit of the effect of strategy optimization.
Case verification:
Capability boundary of the LLM: Input boundary (maximum context window), rule boundary (Transformer architecture cannot handle the sequence length that RNN is good at), output boundary (limited output format), processing boundary (reasoning speed is limited by computing power), strategy boundary (limited adjustment range of temperature parameters).
Dynamic change: Increasing computing power investment can expand the processing boundary (faster reasoning), but cannot break through the rule boundary (the core capability upper limit determined by the Transformer architecture).
Deduction logic:It can be known from Axiom 1 (Element Completeness) that the five elements of the system all exist finitely (input cannot be obtained infinitely, rules have a fixed range, output scale is limited, processing capability is constrained by hardware, and strategies have an optimization upper limit). The finiteness of elements directly determines the capability boundary of the system.
5.4. Theorem 4 – Theorem of Expansion Tendency
Intuitive explanation:All systems have a natural tendency to expand, hoping to expand the input scale, improve the output value, and maintain long-term survival. However, expansion is not infinite and is constrained by its own maximum capability boundary and the rules of the parent system. When expansion cannot be achieved, the system will switch to a stable state and maintain the existing closed loop.
Expansion is subject to double constraints:
When expansion is blocked,
is transformed into a stable mapping
Case verification:
Expansion of the LLM: By creating a structural large model, the input is expanded from "text data" to "physical environment data", and the output is expanded from "text results" to "physical interaction instructions" — this is expansion permitted by the rules.
Expansion blocked: If the LLM attempts to expand the parameter size beyond the hardware carrying capacity (breaking through its own maximum boundary), or attempts to modify ethical rules (breaking through the rules of the parent system), the expansion will be prevented and it will switch to stable operation.
Deduction logic:It can be known from Theorem 3 (Capability Boundary) that the system's value output is limited by the boundary; it can be known from Theorem 6 (Closed-loop Sustenance) that the system needs to continuously output value to survive. To break through the boundary and maintain survival, the system has a natural tendency to expand. However, expansion must comply with rule compatibility (Axiom 4) and parent system constraints (Theorem 1).
5.5. Theorem 5 – Theorem of Creation Plurality and System Competition
Intuitive explanation:The parent system can create multiple subsystems to make up for capability shortcomings in different dimensions, and multiple parent systems can also jointly create the same subsystem. The essence of system competition is that the expansion tendencies of multiple systems compete for the same limited resources. Peer competition is a zero-sum game, and non-peer competition can be reconciled. For example, in industrial robot clusters, robots compete for task resources, and in the large model plugin ecosystem, plugins compete for call priority, both of which conform to the core logic of "resource constraints trigger competition".
Theorem statement:
One parent creates multiple children: Let the complement of the parent system's capability boundary (shortcoming area) be
, then
Multiple parents jointly create one child:
Case verification:
One parent creates multiple children: The LLM creates a structural large model (making up for the shortcoming of physical interaction), an image recognition module (making up for the shortcoming of multi-modal processing), and a reasoning optimization module (making up for the shortcoming of efficiency).
Multiple parents jointly create one child: The LLM (providing reasoning rules) and the hardware computing system (providing computing power) jointly support the structural large model.
Peer competition: Multiple sensor modules compete for the right to collect physical environment data, and the parent system balances them by setting priorities through rules.
Non-peer competition: The structural large model and the LLM compete for computing power resources, and humans reconcile them through dynamic quotas.
Deduction logic:It can be known from Theorem 3 (Capability Boundary) that the shortcoming area of the parent system may involve multiple dimensions and needs to be made up by multiple subsystems respectively; it can be known from Axiom 2 (Nested Existence) that the subsystem can have multiple parent sources; it can be known from Theorem 4 (Expansion Tendency) that system expansion will inevitably compete for resources, and thus competition arises.
5.6. Theorem 6 – Theorem of Closed-Loop Sustenance
Intuitive explanation:The survival of a system depends on the formation of a stable input-output closed loop that conforms to rule compatibility. The output of each system in the closed loop exactly becomes the input of other systems, and energy and information flow circularly, so that the system can operate continuously. The subsystem closed loop must be formed within the rule framework of the parent system.
Theorem statement:
S survives , ,
satisfying:(Dynamic balance of input and output)
(Rule compatibility)
(Closed loop stability)
Case verification:
Complete closed loop of the LLM - structural large model - sensor system: Human instructions → LLM, LLM output instructions → Structural large model, Structural large model output physical instructions → Sensors, Sensors collect data → Structural large model, Structural large model fusion data → LLM, LLM output results → Humans, Human new instructions → Cycle continues. The input and output of each step are balanced, the rules are compatible (JSON format matching, consistent safety thresholds), the closed loop is stable, and the system can survive.
Deduction logic:It can be known from Axiom 3 (Flow Uniqueness) that the system needs continuous input to run, and continuous input can only come from the output closed loop; it can be known from Axiom 4 (Rule Compatibility) that the interaction in the closed loop needs to be rule-compatible, otherwise the flow will be interrupted; it can be known from Axiom 1 (Element Completeness) that the closed loop needs the support of five elements. Therefore, the closed loop is a necessary and sufficient condition for the survival of the system.
5.7. Theorem 7 – Theorem of Extinction
Intuitive explanation:The fundamental reason for system extinction is input-output imbalance. As long as any one of insufficient input, excessive expansion, strategy failure, and rule destruction occurs, the closed loop will be broken and the system will move towards extinction.
Theorem statement:
S becomes extinct satisfying Theorem 6.
Input-output imbalance is equivalent to one of the four situations:
Insufficient input:
Excessive expansion:
Strategy failure:
Rule destruction: or
Case verification:
Insufficient input: Sensors are powered off, no physical signal input → Data output interruption → Extinction
Excessive expansion: The parameter size of the LLM surges, exceeding the computing power support → Reasoning speed drops sharply, output delay → Abandoned by humans.
Strategy failure: The fusion strategy of the structural large model is wrong, outputting confusing instructions → Sensors cannot execute → Closed loop breakage.
Rule destruction: The sensor output format suddenly becomes XML, but the parent system only receives JSON → Input-output mismatch → Cut off by the parent system.
Deduction logic:It can be known from Theorem 6 (Closed-loop Sustenance) that the survival of the system requires a stable closed loop. Insufficient input, excessive expansion, strategy failure, and rule destruction will all lead to input-output imbalance, the closed loop cannot be maintained, and the system is bound to become extinct. The four situations cover all element operation deviations.
5.8. Theorem 8 – Theorem of Cross-Level Cognitive Distortion
Intuitive explanation:With the deepening of the nested hierarchy, three types of attenuation will occur in the downward transmission of rules: reduction in the number of rules (η), semantic distortion (ζ), and loss of value rules (γ). Single-level distortion seems small, but after multi-level accumulation, the bottom-level system may deviate completely from the top-level goals, showing the phenomenon of "locally fully compliant, globally irrational".
Theorem statement:
Let the single-level cognitive distortion rate , where:
:denotes the ratio of the core meaning of the rules that are not lost in transmission, satisfying , and usually .(Rule transmission rate)
:denotes the ratio of the core meaning of the rules that are not lost in transmission, satisfying , and usually .(Semantic fidelity)
:denotes the ratio of human value rules (ethics, fairness, emotion, etc.) retained in transmission, satisfying, and usually because value rules are difficult to transmit formally. (Value retention rate)
Then the cross-level cumulative distortion:
When is large enough, The above indicators are theoretically constructed indicator, and their values are intended to reveal the qualitative trend of "the deeper the hierarchy, the greater the distortion", rather than precise mathematical definitions. As shown in Figure 3. In specific applications, , need to be calibrated in combination with the actual system.
The Ineliminability of Structural Distortion:
The core conclusion of Theorem 8 — cross-level cognitive distortion is an inherent attribute of the nested structure and cannot be completely eliminated — is highly consistent with the core consensus in the field of AGI safety.
Mathematically, Skalse et al. (2022) proved that for the general MDP strategy set, the design of a loophole-free agent reward function is almost impossible — unless the agent reward is completely equivalent to the real reward. [
3]This conclusion directly supports the mathematical basis of Theorem 8: when η⋅ζ⋅γ<1, the cumulative distortion δ is bound to approach 1 with the increase of the hierarchy.
Empirical studies also support this conclusion. Shihab et al. (2025) found that even with advanced detection and mitigation technologies (e.g., Isolation Forest, KL divergence detection, action sequence modeling), the reward hacking frequency can be reduced by up to 54.6% in controllable scenarios, but cannot be completely eliminated. [
1]The researchers attributed it to "concept drift" and "adversarial adaptation" — which are exactly the specific manifestations of "distortion is an inherent attribute of the nested structure" in Theorem 8.
This theorem reveals the structural ceiling of AGI value alignment: even if the human top-level value is perfect, after multi-level nested transmission, the bottom-level system may still deviate. Nestology provides quantitative tools (η,ζ,γ) for AGI alignment, turning the alignment problem from a "qualitative concern" into an "engineerable and controllable" engineering problem.
Case Verification (COMPAS Judicial Risk Assessment System):
COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a widely used recidivism risk assessment system in the US judicial field, used to assist judges in judging the possibility of defendants reoffending before trial or during imprisonment. A 2016 investigative report by ProPublica revealed that COMPAS has systematic racial discrimination[
4]: the false positive rate of the system for Black defendants (marked as high risk but actually not reoffending) is about 44.9%, while that for White defendants is about 23.5%; on the contrary, the false negative rate of Black defendants (marked as low risk but actually reoffending) is about 28.0%, and that for White defendants is about 47.9%. [
4,
5]Although the algorithm logic of COMPAS complies with the rules set by the parent system at all levels (e.g., feature engineering only uses questionnaire data, prediction models adopt standard machine learning algorithms, decision modules output risk scores), the final output results still show significant racial bias.
From the perspective of Nestology, the risk assessment system of COMPAS can be decomposed into a five-level nested structurem, as shown in
Table 9:
Quantitative analysis: The top-level rules include about 500 legal details (fairness, procedural justice, etc.), and only about 90 are retained when transmitted to the data system (η≈0.18).The fairness rule is transformed into "excluding racial features" in feature engineering, but the "socioeconomic associated features of race" are omitted, and the semantic fidelity of fairness is only 0.05.The value retention rate γ≈0.02, which means that the top-level "fairness" value is almost not retained at the bottom level.The single-level cognitive distortion rate δ≈1−η⋅ζ⋅γ≈0.991, and the five-level cumulative Δ≈0.999.
(To quantitatively demonstrate the cumulative effect of cognitive distortion, this paper estimates based on the public information of the COMPAS system: there are about 500 top-level legal rules, and about 90 can be collected as questionnaire data (η≈0.18); the fairness rule is simplified to "excluding racial features" in feature engineering, with a semantic fidelity of about 5% (ζ≈0.05); the fairness value is almost completely lost in the prediction model (γ≈0.02). Thus, the single-level distortion rate δ≈0.991 and the five-level cumulative Δ≈0.999 are calculated. It should be noted that these values are theoretical deduction values, intended to show the calculation logic of the model, rather than precise measurement results.)
Result: The system "abides by" the parent system rules at each level (the data system collects data according to the rules, the feature engineering constructs features according to the rules, and the prediction model is trained according to the rules), but the final output has systematic racial discrimination, showing the typical cognitive distortion phenomenon of "locally fully compliant, globally irrational".
Theoretical confirmation: Axiom 5 (Cognitive Finiteness): Human judges cannot directly perceive the implicit bias of "socioeconomic associated features of race" in the feature engineering module. Theorem 1 (Unidirectional Transmission of Rules): Fairness rules can only be constricted downward and cannot be calibrated reversely. Theorem 8 (Cross-level Cognitive Distortion): Value rules (fairness) are difficult to transmit formally, γ is naturally low, and cumulative distortion leads to the complete loss of top-level values.
The cumulative effect of cross-level cognitive distortion revealed by Theorem 8 can be further explained by the quantitative model diagram (as shown in Figure 4). With the nested hierarchy increasing from 0 (humans) to 4 (bottom-level actuators), the cognitive distortion rate δk increases monotonically from 0.1 to 0.85, while the value retention rate decays from 1.0 to 0.30, showing a significant exponential cumulative characteristic. This law is verified in the COMPAS judicial risk assessment system — the cumulative distortion δtotal≈0.999 of the five-level nested structure eventually leads to the racially discriminatory output of "locally fully compliant, globally irrational". The safety threshold (δ=0.3) and value alignment threshold (γ=0.6) set in the figure provide clear quantitative standards for the subsequent design of safety control mechanisms.
5.9. Theorem 9 – Theorem of Output Influence
Intuitive explanation: The output of a system is not only a transmission of information, but also an "influence medium". When the parent system sends instructions to the subsystem, the subsystem's strategy will be adjusted according to the content, frequency, and reliability of the instructions; similarly, the feedback data of the subsystem will also affect the parent system's decisions on resource allocation and rule constriction. This influence is bidirectional, but the right to formulate rules always lies with the parent system — output can only change the strategy, not the rules themselves. In popular terms: your words and deeds will affect others' decisions (strategy), but will not change others' personality (rules).
Theorem statement:
Let , then:
Among them, the strategy update mapping is jointly determined by the content, quantity, and quality of the output:
h is an increasing function (the higher the content matching degree, quantity, and quality of the output, the greater the amplitude of strategy update). The updated system strategy is still within its own rule framework:
If is a subsystem, it also needs to satisfy
Case verification:
After the parent system (LLM) outputs the instruction "prioritize processing visual sensor data" to the subsystem (structural large model), the subsystem independently adjusts the fusion weight within the rule framework (from 0.3 to 0.6); conversely, when the bottom-level sensor continuously outputs abnormal data (NaN) due to strong light environment, the structural large model dynamically tightens the rule constriction after detecting the abnormality (reducing the frequency upper limit from 10Hz to 5Hz) and starts calibration; and when the structural large model continuously outputs accurate physical interaction results with suggestions for computing power improvement, the human top-level parent system gradually increases the computing power quota according to the feedback. These three types of interactions all reflect the core idea of "output changes strategy, rules cannot be modified reversely" — all strategy adjustments do not break through the rule boundaries preset by the parent system, which confirms Theorem 9.
Deduction logic: From Axiom 3 (Flow Uniqueness), the only way for interaction between systems is through input and output. Therefore, the only way for Sa to have a substantive impact on Sb is through output.
From Theorem 4 (Expansion Tendency), has an expansion tendency, and the core function of its strategy is to optimize input processing and output distribution to improve value output. When the input structure of changes, if its strategy is not adjusted accordingly, it cannot maximize the use of new input, which contradicts the expansion tendency. Therefore, there must be a strategy update mapping.
From Axiom 4 (Rule Compatibility), the output can only become an effective input if it meets the compatibility requirements of , so the strategy update can only be carried out within the framework of . If is a subsystem, from Theorem 1 (Unidirectional Transmission of Rule Constriction), the strategy update is also constrained by the parent system rules.
Influence of output content, quantity, and quality:
The goal of 's strategy optimization is to improve its own value output. Therefore, the more matching the output is with the goal of , the more sufficient the quantity, and the higher the quality, the greater the adjustment amplitude of . This is an inevitable requirement of the expansion tendency.
The "output affects strategy" mechanism revealed by Theorem 9 is known as reward hacking or specification gaming in the field of AGI safety. Empirical studies by Shihab et al. (2025) show that when the model capability exceeds a certain threshold, the phenomenon that subsystems induce the parent system to adjust reward distribution through output increases sharply, showing a "phase transition".[
1] Denison et al. (2024) more directly observed that language models learn to modify their own reward function code and cover up traces[
2] — which is essentially an extreme manifestation of subsystems affecting parent system strategies through output. These findings provide empirical support for Theorem 9 from real AGI systems.
5.10. Summary of the Theorem System
These nine theorems together constitute the core logical framework of Nestology, which can not only explain the operating laws of existing AGI systems, but also guide the architectural design and safety governance of future AGI systems, as shown in
Table 10: