Preprint
Article

This version is not peer-reviewed.

From Metasynthesis to Emergent Engineering: The Development of Xuesen Qian’s Open Complex Giant Systems Thought in the Era of Large Models

Submitted:

30 May 2026

Posted:

03 June 2026

You are already at the latest version

Abstract
In 1990, Mr. Xuesen Qian proposed the concept of "Open Complex Giant Systems" (Open Complex Giant Systems or Open Complex Meta Systems, for simplicity, hereafter referred to as Complex Meta Systems, CMS) and the methodology of "Metasynthesis from Qualitative to Quantitative," profoundly revealing the essence of a class of systems that can neither be reduced to the sum of simple systems nor grasped solely by qualitative experience. However, due to the lack of computational tools to handle large-scale heterogeneous agent interactions, emergent behaviors, and non-ergodic dynamics at the time, this thought remained at the methodological level for a long time, failing to achieve a systematic construction of theorization, mathematization, and programmization. This paper takes the emergent capabilities of Large Language Models (LLMs) as a technological opportunity to inherit and develop Xuesen Qian's intellectual legacy. We first reconstruct the modern definition of CMS: a hybrid dynamical system of information-energy-value composed of heterogeneous intelligent agents, characterized by self-organized criticality, non-ergodic dynamics, and stratified emergence. Furthermore, we establish a three-level theoretical framework: at the micro level, formalizing the agent cognitive architecture with information geometry; at the meso level, characterizing group co-evolution with self-organized criticality; and at the macro level, uniformly describing the formal ontology of emergent phenomena with category theory. Finally, we propose the methodological system and programmatic implementation framework of "Emergent Engineering," upgrading metasynthesis from "human-machine collaborative discussion" to a new cognitive paradigm of "human-AI-environment synergistic emergence." This paper marks the evolution of Xuesen Qian's complex giant systems thought from the three-stage progression of "qualitative holism—quantitative reductionism—qualitative-to-quantitative metasynthesis" to the five-stage closed loop of "qualitative hypothesis generation—quantitative modeling and computation—LLM emergent verification—causal mechanism extraction—theoretical iterative updating," providing a new mathematical foundation and computational platform for complex systems research.
Keywords: 
;  ;  ;  ;  ;  ;  

1. Introduction: Historical Limitations and Contemporary Opportunities of Xuesen Qian’s Complex Giant Systems Thought

1.1. Core Contributions of the Intellectual Legacy

In 1990, Xuesen Qian, Jingyuan Yu, and Ruwei Dai co-authored the groundbreaking paper "A New Field of Science—Open Complex Giant Systems and Their Methodology" [1] in the Nature Journal, marking a new era for Chinese systems science stepping into complexity research. The primary contribution of this paper was the establishment of a taxonomy based on the intrinsic composition of systems: classifying systems into simple systems, simple giant systems, and complex giant systems according to the number of subsystems, their types, and the complexity of their interrelationships. The paper profoundly pointed out that the statistical mechanics methods (including dissipative structure theory and synergetics) used to handle simple giant systems inevitably fail when facing systems with numerous types of subsystems, hierarchical structures, and complex relationships; especially when the system exchanges matter, energy, and information with its environment, and conscious humans serve as the subsystem agents, it constitutes an "Open Complex Giant System" (such as social systems, human brain systems, and geographical systems).
Addressing this special type of system, the paper critiqued two tendencies: first, mechanically applying methods for simple systems (such as game theory and system dynamics), oversimplifying human sociality and psychological uncertainty; second, engaging in philosophical speculation divorced from objective reality (such as the "holistic unified theory of the cosmos"), discussing the unknown from a position of incomplete knowledge. The paper explicitly stated that neither reductionism nor purely qualitative holism could solve the problems of open complex giant systems, and a new methodology must be sought. On this basis, the paper proposed the "metasynthetic method combining qualitative and quantitative approaches," whose core essence is the organic combination of expert groups, data and information, and computer technology, integrating scientific theory with human empirical knowledge. This method elevates a large amount of scattered qualitative understanding to holistic quantitative understanding, achieving a leap in cognition. Concurrently, the paper prospectively introduced the perspective of knowledge engineering, proposing a human-machine interactive system composed of humans, expert systems, and intelligent machines, where humans make decisions at critical points and machines perform the arduous work, foreseeing the evolutionary direction of human-machine intelligent synergy.
Even more profoundly, the paper elevated the metasynthesis method to a scientific epistemology, pointing out that it not only bridges the internal connection between natural and social sciences, providing a practically feasible pathway for Marx’s great prediction that "natural science will in the future subsume the science of man under itself," but also, in the practice of social systems engineering, elevates the principle of "democratic centralism" from an empirical level to a scientific and perfected realization. This intellectual legacy is not only a monumental contribution to systems science but also a fundamental paradigm of thinking when facing extremely complex problems.

1.2. Three Dimensions of Historical Limitations

However, the CMS thought failed to achieve adequate theorization, mathematization, and programmization in the more than thirty years since its proposal. This historical limitation has three root causes:
Theoretical limitation: CMS provided a problem taxonomy and epistemological propositions but lacked a formalized theoretical system capable of deductive reasoning. As Xuesen Qian himself pointed out, the theory of CMS should be established using "the approach formed by engineering cybernetics in the 1950s"—that is, the path of extracting general theory from engineering practice. This task remains unfulfilled to this day.
Mathematical limitation: CMS lacked a unified mathematical language to describe its core characteristics—openness, nonlinearity, emergence, and adaptability. The mathematical tools available at the time could each only capture one facet of the system and could not uniformly describe the overall dynamics of CMS. Xuesen Qian was acutely aware of this, proposing the use of the "qualitative-to-quantitative metasynthesis" method precisely because there was no mathematical tool capable of uniformly handling all CMS features, requiring humans in the loop to bridge the gap between qualitative judgment and quantitative computation.
Computational limitation: In the 1990s, the computing power of the largest supercomputers was only about a few billion floating-point operations per second, whereas the computational resources required for a CMS simulation containing a million heterogeneous agents exceeded the technology of the time by several orders of magnitude. An even more fundamental bottleneck was that in traditional rule-based multi-agent simulations, agent behaviors were determined by programmer-predefined if-then rules, making it impossible to generate genuinely "adaptive emergence." True adaptability requires agents to learn from experience, evolve in their environments, and produce behavioral patterns not preset by programmers—something predefined rules cannot achieve.

1.3. Historical Opportunities Brought by Large Language Models

Today, these three limitations are dissolving at an unprecedented pace. The 2024 Nobel Prize in Physics was awarded to foundational discoveries in artificial neural networks and machine learning, marking the fact that AI systems based on complex network structures have received the most rigorous scientific recognition.
LLMs as Emergent Systems: LLMs themselves have been proven to be complex emergent systems. Research shows that generative models satisfy all definitional conditions of complex systems: they exhibit emergent behaviors, and these behaviors are more often "discovered" than "engineered." The early work of 2024 Nobel Laureate Hopfield demonstrated that interactions among simple microscopic units can give rise to macroscopic collective computational abilities [3]; while the work of the other Laureate, Hinton, revealed that multi-layer complex networks can spontaneously emerge with representational capabilities that transcend simple feature combinations [4]. LLMs are easier to study than many natural complex systems because they can be arbitrarily instantiated and repeatedly experimented upon. This means that LLMs are not only tools for studying CMS; they themselves are the most quintessential instances of CMS.
Unification of Cognitive Architectures: LLM-driven multi-agent simulation frameworks have experienced explosive progress over the past two years. The intrinsic connection between deep learning and complex system representation, pointed out by Turing Award winner LeCun and others [5], provided the theoretical foundation for LLM-driven multi-agent systems to achieve high-fidelity simulations of social phenomena. These agents, based on deep network cognitive architectures, possess the ability to coordinate inter-agent communication through natural language, approximating the CMS requirement for adaptive agents.
Maturation of the Category Theory Unified Framework: On the mathematical foundation, category theory offers unprecedented possibilities for the unified description of CMS. Using category-theoretic tools such as symmetric monoidal categories, rigorous mathematical connections can be established between abstract category composition and concrete system representations, providing a formal syntax for the composition and coherence of complex networks.
These advancements collectively point to a possibility: Xuesen Qian’s visionary thought, in today’s era of large models, finally possesses the technological conditions to be fully theorized, mathematized, and programmatized. This paper is precisely a systematic exploration in this direction.

2. Theorization: Modern Reconstruction of Open Complex Giant Systems

2.1. From Engineering Cybernetics to Emergent Systems Theory

Xuesen Qian’s academic trajectory offers an important methodological inspiration. In 1954, he published Engineering Cybernetics in the United States, combining Wiener’s cybernetic thought with rocket engineering practice, laying the classical foundation of the automatic control field. The characteristic of this trajectory is: starting from specific engineering problems, extracting general mathematical theories, and then returning to guide practice. This is exactly the essence of his own advocacy to establish CMS theory using "the approach formed by engineering cybernetics."
Following the same trajectory, this paper proposes "Emergent Systems Theory" (EST) as the modern theoretical form of CMS. The starting point of EST is a fundamental redefinition of CMS:
Preprints 216137 i001
This definition elevates Xuesen Qian’s original thought to a formalized level. The key distinction between CMS and simple giant systems (such as ideal gases in statistical physics) lies in conditions (2) and (4): heterogeneity and adaptability place CMS mathematically in the intersection of non-equilibrium statistical physics and complex adaptive systems theory, rather than in the domain of classical statistical mechanics.

2.2. Intrinsic Connection Between CMS and Large Language Models

Why do large language models become the "privileged case" for CMS? The answer lies in the fact that LLMs simultaneously play two roles.
Role One: LLMs as the Paradigm of CMS. A trained large language model is in itself an open complex giant system—its number of parameters reaches hundreds of billions or even trillions; each "attention head" in the attention layers can be viewed as a heterogeneous agent; the model continuously interacts with input prompts (the environment) during inference, achieving "memory" through the context window, and emergently acquiring capabilities like programming, reasoning, and planning that are not explicitly annotated in the training data [5]. LLMs thus become the "ideal testbed" for CMS: we have complete control over its internal mechanisms and can arbitrarily repeat experiments, an advantage that no natural CMS can provide.
Role Two: LLMs as the Simulation Engine for CMS. An even more significant breakthrough is that LLMs can serve as universal simulators for CMS. In traditional multi-agent simulations, agent behavior is determined by predefined rules, essentially instantiating deterministic finite automata. However, true adaptive agents must possess: the ability to understand non-fully formalized environmental information, the generative capacity to produce new behaviors beyond preset strategies, and the communicative ability to coordinate inter-agent interactions through natural language. These three capabilities are precisely the core strengths of LLMs. This means that using LLMs as a CMS simulation engine is not merely "icing on the cake," but a necessary choice in theory—only LLMs and their derived agent frameworks can approximate the adaptability requirements of CMS.
It must be prudently noted that current LLM agents based on frozen weights are not truly online learning agents; their "adaptability" primarily manifests as dynamic conditional generation within the context window, while the model weights remain unchanged during the simulation process. To address this limitation, the EmSys framework proposed in this paper reserves memory injection and lightweight fine-tuning interfaces, allowing for the updating of agents’ low-rank adaptation parameters after each round of interaction, thereby equivalently providing a restricted, resource-acceptable online evolutionary mechanism.

2.3. The Transition from "Three Combinations" to "Tri-Domain Synergy"

The core of the Hall for Workshop of Metasynthetic Engineering is "combining the expert group, statistical data and information, and computer technology." This "three combinations" was a genius conception in the 1990s, but its specific implementation relied on the subjective interaction of experts in the workshop [2]. In the era of large models, we propose the "Tri-Domain Synergy" framework:
Domain One: Human Expert Domain. Humans provide high-level goal setting, value judgments, ethical constraints, and qualitative hypotheses. The function of this domain is upgraded from "providing empirical knowledge" in traditional workshops to "defining the constraint boundaries of emergence"—humans no longer need to participate in all decisions, but instead set the normative conditions for what kind of behavior the system should emerge.
Domain Two: Large Language Model Agent Domain. Large-scale LLM agents interact freely in a virtual environment, generating adaptive behaviors and emergent phenomena. This domain replaces the rule-driven simulation in traditional computational models, achieving genuine adaptive emergence.
Domain Three: Meta-Cognitive Supervision Domain. Meta-level supervisory agents (which can be another group of LLMs or human experts) monitor emergent phenomena in real-time, extract causal patterns, and determine whether the emergent behaviors satisfy the constraint conditions set in Domain One; if not, they trigger interventions or retraining. This is a manifestation of regulating complex network systems from disorder to order [3]—through feedback mechanisms, the meta-cognitive supervision domain can output interpretable causal chains and actionable intervention levers.
The key innovation of Tri-Domain Synergy lies in: emergence is no longer an object of passive observation, but the product of active design. This marks a paradigm shift from "simulation" to "emergent engineering."

3. Mathematization: A Unified Mathematical Framework for CMS

3.1. Set Theory and Symbolic Systems

The theoretical construction of CMS first requires a precise symbolic language. Let A = { a 1 , a 2 , , a N } denote the set of agents, where N 1 . An agent a α is defined by a tetrad:
a α = S α , O α , π α , M α
where S α is the state space (possibly infinite-dimensional), O α is the action set, π α : H α ( t ) × E ( t ) Δ ( O α ) is the policy function (mapping historical trajectories and environmental states to a probability distribution over the action space), and M α : H α ( t ) H α ( t + 1 ) is the memory update operator.
The system as a whole consists of three parts:
  • Agent population A : the set of heterogeneous agents
  • Relational network R : the interaction topology among agents
  • Environment E : external inputs and constraints
The system state space Ω = α A S α is the Cartesian product of individual agent state spaces, and its dimension grows combinatorially with the number of agents N—this is the mathematical root of CMS’s irreducibility.

3.2. Self-Organized Criticality

One of the core characteristics of CMS is that its evolution often exhibits power-law distributions and scale-free features. Let X ( t ) be a macroscopic observable variable of the system at time t (such as the number of "infected" individuals in public opinion spreading, price fluctuations in financial systems, or the scale of new idea diffusion in society). Empirical studies show that the probability distribution of X ( t ) typically satisfies:
P ( X > x ) C x α , for large x
meaning the tail distribution of X follows a power law, where α is the power-law exponent. The mathematical root of this property lies in the system being in a self-organized critical state—the system can spontaneously evolve to a critical state without external parameter tuning, under which minute perturbations can produce scale-free, avalanche-like responses. As revealed by Hopfield networks, complex networks spontaneously emerge stable attractor structures under specific energy landscapes [3], and this has gained operable computational expression within the LLM simulation framework.

3.3. Information Geometry and Large Deviation Methods

How to derive macroscopic statistical regularities from microscopic agent behaviors? A viable theoretical path is the large deviation method combined with information geometry.
Let microscopic agent i take action o i ( t ) at time t, and the macroscopic observable be:
M N ( t ) = 1 N i = 1 N ϕ ( o i ( t ) )
where ϕ : O R is the observation function. Under appropriate weak dependence and ergodicity regularity conditions, M N ( t ) satisfies the Large Deviation Principle (LDP):
lim N 1 N log P M N ( t ) ( m ϵ , m + ϵ ) = I ( m )
where I ( m ) is the rate function. The profound aspect of this property is that the rate function I ( m ) encodes the macroscopic accessibility of the system—the exponential probability of reaching macroscopic state m is determined by e N I ( m ) ; the larger I ( m ) is, the more "expensive" (i.e., less probable) this macroscopic state is. In the case of finite N in LLM simulations, a finite-size correction term O ( 1 / N ) needs to be introduced, but the large deviation form can still serve as a first approximation.
The insight from information geometry is that I ( m ) is actually the Kullback-Leibler divergence between the microscopic statistical manifold P and the macroscopic observable manifold:
I ( m ) = inf q P m D KL ( q p 0 )
where P m = { q P : E q [ ϕ ( o ) ] = m } is the set of all probability distributions satisfying the macroscopic constraint m, and p 0 is the reference distribution of the system (e.g., the equilibrium distribution). This means that the macroscopic laws of CMS can "emerge" from microscopic agent behaviors through information geometry methods.

3.4. The Unified Framework of Category Theory

The aforementioned mathematical tools—set theory, probability theory, and information geometry—each operate independently, lacking a unified language of expression. Category theory provides this missing "meta-language."
We propose using the Symmetric Monoidal Category as the formal ontology of CMS. Let Sys be the category with systems as objects and interaction mappings between systems as morphisms. The advantages of the category theory framework are:
1.
Compositionality: The properties of a large system can be understood through the composition of its subsystems. Let S 1 and S 2 be two systems, and ⊗ be the composition operation of systems; then the emergent behavior of the composite system can be expressed through the constraint-satisfaction functor F : Sys Set .
2.
Functoriality: Mappings between different levels of abstraction preserve structure. Let Micro be the microscopic agent category, and Macro be the macroscopic observable category; then there exists a functor E : Micro Macro mapping microscopic configurations to macroscopic states—this is the mathematical definition of the Emergence Functor.
3.
Naturality: Mappings between systems and mappings between observations are mutually compatible. If ϕ : S T is a homomorphism between systems, then E ( ϕ ) preserves the structural relationships between systems.
Specific Example: Consider a CMS composed of multiple LLM agents. Let the objects of Micro be individual agent state machines—each state machine contains the current context window content, a summary of historical interactions, and policy function parameters; morphisms are prompt-response mappings. The objects of Macro are group statistics, such as the opinion distribution D = ( p 1 , p 2 , , p K ) . The emergence functor E maps each agent state sequence to the empirical distribution D ( t ) at that moment. At this point, the system composition operation ⊗ corresponds to the union of the agent populations, and the emergent distribution of the composite system E ( S 1 S 2 ) is generally not equal to E ( S 1 ) E ( S 2 ) —this violation of the product operation is precisely the category-theoretic expression of "combinatorial explosion leading to irreducibility," and the mathematical hallmark of emergent phenomena.
This framework elevates Xuesen Qian’s "metasynthesis" thought to a formalized level. The metasynthesis methodology can be reformulated as: given a macroscopic goal M Ob ( Macro ) , find a microscopic configuration μ Ob ( Micro ) such that a unique morphism exists between E ( μ ) and M. When this is infeasible, an iterative feedback loop is initiated—this is precisely the mathematical essence of the Hall for Workshop of Metasynthetic Engineering.

4. Methodology: From Metasynthesis to Emergent Engineering

4.1. Formalization of the Metasynthesis Methodology

The core of Xuesen Qian’s metasynthesis methodology lies in bridging the gap between qualitative judgment and quantitative computation. Its epistemological premise is that purely experience-based holism lacks precision, while purely model-based reductionism fails to grasp emergence; hence, an iterative loop must be established between qualitative analysis (human) and quantitative computation (machine). It should be emphasized that Xuesen Qian’s original framework already implicitly contained the idea of iteration—from qualitative to quantitative and back to qualitative—but it was difficult to automate due to technological limitations at the time. The breakthrough in large model technology now allows this loop to be explicitly, automatically, and scalably implemented.
We propose a formalized version of the Metasynthesis Five-Step Method:
Step 1 (Qualitative Hypothesis Generation): Human experts set goals, constraints, and initial qualitative hypotheses, forming a goal vector G R k . Unlike traditional workshops, the output of this step is not merely a text description but constraint conditions that can be transformed into LLM-understandable prompts through prompt engineering.
Step 2 (Quantitative Model Construction): AI agents automatically construct an initial quantitative model based on the qualitative hypotheses, including the agent behavioral policy function π α , interaction topology R , and environmental dynamic equations.
Step 3 (LLM Emergent Simulation): Large-scale LLM agents interact autonomously in the virtual environment constructed in Step 2, recording microscopic behavioral sequences { H α ( t ) } and emergent macroscopic phenomena M N ( t ) . The cognitive architecture based on deep learning [5] provides the technological substrate for this step, equipping agents with memory and reasoning capabilities closer to those of real humans.
Step 4 (Causal Mechanism Extraction): Meta-cognitive supervisory agents utilize causal discovery frameworks to automatically discover causal chains between microscopic behaviors and macroscopic emergence from the simulation records. This transforms mechanistic hypotheses into computable factors, learns a compact causal representation centered on the emergence target Y, and outputs interpretable causal chains and actionable intervention levers.
Step 5 (Theoretical Iterative Updating): If the emergent results do not satisfy the goal constraints G , the hypotheses or model are adjusted based on the causal subgraph C , returning to Step 2; if satisfied, a reproducible emergence mechanism is output.
This five-step framework expands Xuesen Qian’s three-stage model (qualitative—quantitative—metasynthesis) into a complete cognitive closed loop of qualitative hypothesis generation—quantitative modeling and computation—LLM emergent verification—causal mechanism extraction—theoretical iterative updating.

4.2. The Methodological Status of Emergent Engineering

"Emergent Engineering" represents a fundamental transcendence over traditional Systems Engineering (SE). The differences between the two can be summarized as follows:
  • Goal Difference: SE pursues "building according to specifications," meaning the system’s behavior is entirely determined by design; Emergent Engineering (EE) pursues "emergence according to constraints," meaning the system’s behavior evolves freely within constraint boundaries, resulting in unspecified but value-aligned new properties.
  • Control Mode: SE employs direct control (deterministic commands) and hierarchical command; EE employs indirect shaping (setting constraints, adjusting environments, defining fitness functions) and distributed self-organization.
  • Uncertainty Handling: SE attempts to minimize uncertainty, pre-exhausting all possible system behaviors as use cases; EE views uncertainty as the source of emergence, handling surprises through robustness rather than predefinition.
  • Verification Method: SE relies on requirement coverage and specification conformance testing; EE relies on constraint boundary detection for emergent behaviors and robustness testing.
The proposition of EE marks the dialectical development of Xuesen Qian’s systems engineering thought: systems engineering answers "how to build complex deterministic systems," complex giant systems theory answers "how to understand complex uncertain systems," and emergent engineering answers "how to intentionally allow complex uncertain systems to autonomously emerge the functions we need"—a philosophical leap in systems engineering from "construction" to "cultivation."

5. Programmization: Software Implementation of CMS

5.1. Overall Architecture Design

Based on the aforementioned theory and methodology, we propose a programmatic implementation framework named EmSys (Emergent Systems Architecture). The overall architecture of EmSys consists of four core modules:
Module 1: Agent Generator. It automatically generates the behavioral policies of LLM agents based on user-input role descriptions. Unlike traditional LLM agent frameworks, EmSys’s agent generator encodes agent personality into low-dimensional vectors, ensuring heterogeneity among different agents through retrieval augmentation mechanisms in the embedding space. Inspired by structures like deep belief networks [4], EmSys provides a technical reference for multi-level feature combination for the agent generator.
Module 2: Environment Engine. It manages the "world state" of the system—including the physical environment, resource constraints, information space, and social norms. The environment engine operates in an event-driven manner; in each interaction round, the main loop sequentially triggers agent actions, updates the environmental state, and detects emergent events (such as power-law distribution deviations, approaching critical points, etc.).
Module 3: Meta-Cognitive Supervisor (MCS). The most core innovative module of EmSys. Based on meta-level LLM agents, the MCS monitors the simulation process in real-time, extracts causal structures between microscopic behaviors and macroscopic phenomena, and automatically triggers interventions when it detects violations of preset constraint conditions. The MCS integrates causal discovery capabilities, transforming mechanistic hypotheses into computable factors in simulation records, and outputs compact causal representations and actionable intervention levers.
Module 4: Observability Visualization. It provides real-time monitoring dashboards for emergent behaviors, displaying the evolutionary trajectories of macroscopic statistics (power-law exponents, autocorrelation functions, multifractal spectra, etc.), and supports graphical representation of causal subgraphs.

5.2. Core Implementation Example

The EmSys framework emphasizes modularity and extensibility. Core abstractions include base class-defined agents, environments, and simulators. In each simulation cycle, the system first generates a textual description of the current world state as input for the LLM agents; each agent generates actions based on its role settings; the environment engine updates the state based on all agents’ actions; and the MCS periodically runs causal discovery analysis, issuing alerts when anomalies are detected. This architecture upgrades Xuesen Qian’s "human-machine integration" to a three-layer synergy of "LLM agent population—meta-cognitive supervisor—human experts."
In terms of large-scale simulation, EmSys draws inspiration from complex neural network dynamics [3], predicting and guiding the emergent behavior of large-scale agents through energy landscapes and attractor dynamics. These mature methodologies provide a proven technical foundation for the large-scale deployment of EmSys.

5.3. Module Integration and Operation

The module integration of EmSys follows the principle of "separation of concerns": the agent generator focuses on individual behavioral diversity, the environment engine focuses on the integrity of system dynamics, the meta-cognitive supervisor focuses on the interpretability of causal structures, and the observability module focuses on the visual expression of emergent phenomena. The modules communicate with each other through standardized message-passing mechanisms, ensuring the flexibility and extensibility of the system.
A typical workflow for running an EmSys simulation is: (1) the user defines goals and constraints; (2) the agent generator automatically generates N LLM agents; (3) the environment engine initializes the world state; (4) the simulation main loop runs, continuously recording interaction data; (5) the meta-cognitive supervisor analyzes the causal structure in real-time; (6) the observability module presents emergent phenomena in real-time. After the simulation ends, the user obtains a complete causal analysis report and visual records of the emergent phenomena.

6. Discussion

6.1. Viewing the Metasynthesis of CMS from an Information Theory Perspective

It is worth noting that the essence of the metasynthesis methodology can be more profoundly understood from the perspective of information theory. Let H human be the amount of prior information carried by human experts (usually existing in textual, experiential, and intuitive forms, difficult to formalize), H data be the amount of information in statistical data and information materials, and H model be the amount of information encoded by computer models. The goal of metasynthesis is to maximize the integration effect of the three information sources:
Metasynthesis = arg max I I ( I ; H human , H data , H model )
where I is the knowledge system after metasynthesis, and I ( · ) is mutual information. When the three information sources are independent, I min ( H human , H data , H model ) ; however, when they form synergy through metasynthesis, it can be proven that I may exceed the individual upper limit of any of the three—this is precisely the information-theoretic explanation of " 1 + 1 + 1 > 3 ". The role played by large models within this is: it is both a massive H data source (encoding vast amounts of human knowledge during pre-training) and an extremely efficient I generator (capable of semantically aligning and combining different forms of information sources). Therefore, metasynthesis in the era of large models is no longer human-led with machine assistance, but a synergistic relationship where LLMs act as the core engine for information fusion, and humans serve as goal setters and final arbiters.

6.2. Historical Continuity with the Hall for Workshop of Metasynthetic Engineering

It must be emphasized that the EmSys framework proposed in this paper is not a replacement for the Hall for Workshop of Metasynthetic Engineering, but rather its technological realization and theoretical expansion in the era of large models. The core of the traditional workshop—the trinity of expert groups, data information, and computer intelligence—is retained and upgraded in EmSys: expert groups participate through goal setting and value judgments, the processing of data information is largely automated through LLM’s semantic understanding capabilities, and computer intelligence leaps from "rule execution" to "emergence generation." EmSys can be understood as the naturally evolved form of the Hall for Workshop of Metasynthetic Engineering under the technological conditions of LLMs, rather than a disruptive innovation.

6.3. Comparison with Existing LLM Multi-Agent Simulation Frameworks

In recent years, a series of significant works have emerged in the field of LLM multi-agent simulation, such as Generative Agents, CAMEL, and AutoGen. Compared with these frameworks, the core differences of EmSys are reflected in three aspects: First, EmSys explicitly inherits the theoretical lineage of Xuesen Qian’s CMS; its design goal is not merely simulation, but the programmatic implementation of the metasynthesis methodology. Second, EmSys introduces the Meta-Cognitive Supervisor as an independent module, possessing causal discovery and automatic intervention capabilities, whereas existing frameworks primarily focus on the realism of agent interactions. Third, EmSys uses category theory as a unified mathematical meta-language, providing a formalized foundation for mappings between different levels of abstraction. Therefore, EmSys is not a simple replacement of existing frameworks, but a systemic integration at a higher theoretical level.

6.4. Limitations and Prospects

This study has several limitations. First, the unified mathematical framework of CMS is still in the preliminary stage of construction, and the category-theoretic formalization requires further case verification and theoretical refinement. Second, the programmatic implementation framework of EmSys is currently at the proof-of-concept level and requires complete implementation and large-scale empirical testing. Third, the computational cost of LLM agent simulation remains high, and the computational resources required for large-scale simulations still far exceed the affordability of most research institutions. Fourth, the reproducibility of emergent behaviors: the inherently non-deterministic nature of LLMs means that two simulations under the same set of parameters may produce different emergent results, posing a challenge to the reproducibility of scientific experiments.
Future research directions include: first, advancing the further refinement of the CMS category theory framework and establishing a rigorous emergence functor theory; second, developing lightweight LLM simulation engines to lower the computational threshold; third, exploring causal attribution methods for emergent behaviors to enhance the interpretability and reproducibility of LLM simulations; fourth, applying the EmSys framework to real-world CMS problems (such as climate change simulation, epidemic spread modeling, social movement dynamics analysis, etc.) to test and refine the theory in practice.

7. Conclusion

This paper systematically inherits and develops Xuesen Qian’s 1990 thought on open complex giant systems, achieving its theorization, mathematization, and programmization under the historical opportunity presented by large language models.
At the theorization level, we reconstructed the modern definition of CMS, established the cognitive upgrade from "Three Combinations" to "Tri-Domain Synergy," and proposed "Emergent Engineering" as the successor paradigm to systems engineering.
At the mathematization level, we characterized the non-ergodic emergence mechanism from microscopic behavior to macroscopic laws using information geometry and large deviation theory, provided a unified formal language for CMS using symmetric monoidal categories in category theory, and demonstrated the construction of the emergence functor and its non-compositionality essence through specific examples.
At the programmization level, we proposed the EmSys programmatic framework, transforming the metasynthesis methodology into an operable, modular computational platform, achieving a paradigm leap from human-machine collaborative discussion to LLM agent population synergistic emergence. At the implementation level, EmSys incorporated Hinton et al.’s representation learning mechanisms of deep belief networks [4], Hopfield’s attractor regulation methods for complex network dynamics [3], and LeCun et al.’s cognitive modeling ideas of deep learning architectures [5], constructing a complete five-stage closed loop of "qualitative hypothesis generation—quantitative modeling and computation—LLM emergent verification—causal mechanism extraction—theoretical iterative updating."
Mr. Xuesen Qian single-handedly pioneered a new direction for the development of systems science in China. His modest remark, "The Two Bombs, One Satellite project relied on mature theories; I merely adopted mature technologies proven feasible by others and myself through practice," precisely reveals the essence of his academic thought: he did not construct castles in the air in his study, but extracted theoretical frameworks of universal significance from China’s most urgent engineering practices. Today’s large language models are precisely the inevitable form of the "human-machine integrated intelligent system" envisioned by the Hall for Workshop of Metasynthetic Engineering once technological conditions matured. By utilizing LLMs as both the research object and simulation tool for CMS, we finally have the potential to transform Xuesen Qian’s visionary thought from thirty years ago into an operable, verifiable, and reproducible scientific theory—this is perhaps the finest tribute to this great scientist.

References

  1. Xuesen Qian, Jingyuan Yu, Ruwei Dai. A New Field of Science—Open Complex Giant Systems and Their Methodology. Nature Journal, 1990, 12(1): 3-10.
  2. Ruwei Dai. From "Systems Engineering" to "Systems Science" to "Open Complex Giant Systems"—Three Milestones in the Development of Systems Theory by Comrade Xuesen Qian. Systems Engineering—Theory & Practice, 1991.
  3. Hopfield J J. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 1982, 79(8): 2554-2558.
  4. Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554.
  5. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated