Human Digital Twins and Self-Knowledge: A Governed Attractor Architecture for Sound Self-Abduction

Emanuel Shirbint; Alexander Rybalov

doi:10.20944/preprints202606.0132.v1

Submitted:

01 June 2026

Posted:

02 June 2026

You are already at the latest version

Abstract

Large language models embedded in person-modelling systems, and human beings reflecting on themselves, share a structural vulnerability: both can generate fluent, narratively persuasive, yet abductively unsound accounts of a person. Building on a governed abductive architecture developed for medical patient digital twins, this paper argues that personality should be modelled neither as a stable self-description nor as a free variable of circumstance, but as a layered system in which surface self-narrative, role-specific manifestations, and recurrent attractor structures are architecturally separated. The central claim is that pressure does not create personality; it changes the evidentiary conditions under which a person is observed, making discriminable which self-descriptions are structurally supported and which remain conditional on comfort, low cost, or the absence of threat. We identify six recurrent modes of unsound person-modelling — missing-premise neglect, weak-mechanism support, counter-evidence discounting, narrative essentialism, contextual overfitting, and premature identity closure — each mapped to an architectural absence and a corresponding control. We specify a seven-contour governed architecture and operationalise its distinctive elements as a Pressure Diagnostic Runtime, which annotates naturally occurring or ethically consented pressure events as evidence, and an Attractor Registry, which stores recurrent if-then behavioural signatures rather than trait labels. Integrity is formalised as a bounded operational contour; self-knowledge as a discordance-detection function comparing self-report against behaviour under load; transformation as governed ontology-revision rather than re-narration; and the witness as a strictly functional, non-generating governance layer. The paper draws out implications for AI systems that model persons — provenance and staleness labels on inferred self-attributes, role-code separation, user contestability, refusal of premature identity closure, and a prohibition on covert pressure engineering. The argument is conceptual: it proposes a model and a research programme, not a diagnostic tool, a therapeutic intervention, or a metaphysical claim about the existence of a self.

Keywords:

human digital twins

;

personality modelling

;

self-knowledge

;

abductive reasoning

;

governed abductive architecture

;

semiosphere

;

attractor dynamics

;

pressure diagnostics

;

integrity

;

person-modelling AI

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction: From Medical Digital Twins to Human Digital Twins

1.1. The Architectural Move

A companion line of work has argued that monolithic large language models (LLMs) embedded in medical patient digital twins exhibit a systemic vulnerability: they produce fluent clinical narratives that are not equivalent to sound abductive explanations, and the remedy is architectural rather than a matter of model scale (Shirbint and Rybalov 2026). That work proposed a governed abductive architecture — a separation of semantic, ontological, and semiospheric layers, organised around runtime contours with explicit governance — and operationalised it for multiple sclerosis care. As digital twins evolve from reactive monitors toward agentic, language-driven systems that forecast trajectories and model their subjects in increasingly open-ended ways, the same architectural question migrates from the body to the person (Makarov et al. 2025; San et al. 2026). This paper takes the governed abductive architecture and turns it inward and outward at once: inward, onto the human self as an object of knowledge; outward, onto the class of AI systems that now attempt to model persons.

The motivating intuition is ordinary. People treat one another, and themselves, as known constants in given circumstances. We say “she is like that” or “I know him,” and increasingly our systems say the same on our behalf. Yet a change of pressure — fear, gain, role, power, loss — frequently brings different manifestations into view, and the person is often as surprised as the observer. A more defensible statement is not “I know who this person is” but “I know how this person has manifested under the conditions known to me.” The gap between these two statements is the subject of this paper, and we argue that it is the same gap that separates a fluent clinical narrative from a sound clinical abduction.

1.2. Two Modelling Targets, One Architecture

The paper therefore has a double object. The first is the human self considered as a self-modelling system: a person continuously generates and revises an account of who they are. The second is artificial intelligence considered as a person-modelling system: recommender profiles, conversational companions, coaching and well-being applications, and the emerging category of human digital twins all construct and maintain a model of a human self. Our claim is that a single architecture governs the soundness of both, and that the failure to which both are prone is the same: a fluent but abductively unsound account of a person, mistaken for knowledge of that person.

The integrated thesis can be stated directly: a human digital twin must not model personality as a stable self-description, nor as a free variable of circumstance; it must model the governed relation between surface self-narrative, role-specific manifestations, and recurrent attractor structures made observable under pressure. The same architecture that prevents fluent but unsound clinical abduction in medical digital twins can prevent fluent but unsound personality inference in person-modelling AI.

1.3. What This Paper Does and Does Not Do

This paper specifies a conceptual model — the person as a governed (typically ungoverned) attractor system — identifies six recurrent modes of self-abductive failure, maps them to architectural absences and controls, and proposes operational modules (a Pressure Diagnostic Runtime and an Attractor Registry), ethical guardrails, and a validation roadmap. It does not claim that all self-narratives are false, that there is no self, or that introspection is worthless. It advances no metaphysical thesis about the existence of a self; where the term witness is used, it denotes a non-generating governance function and nothing more. The model is not yet an empirical theory; it is a framework to be operationalised and tested. It is not a diagnostic instrument, a therapeutic intervention, or a licence for inferring or engineering anything about a person without consent, provenance, and contestability.

The claim is therefore architectural rather than metaphysical. The paper does not assert that personality is, in itself, a governed attractor system. It argues that any AI system permitted to model a person should be constrained as if knowledge of a person were uncertain, pressure-dependent, role-coded, and vulnerable to premature closure.

2. The Problem of Plausible Self-Narrative

Prediction asks which output is most likely given inputs. Abduction asks a stronger question: which underlying process best explains the observations, subject to constraints of mechanism, temporal order, missing premises, and counter-evidence (Peirce 1931–1958). Self-knowledge is abductive in this stronger sense. To know a person — oneself or another — is not to predict their next sentence; it is to infer the causal structure that generates their behaviour across situations, and to hold that inference open to revision when behaviour and story diverge. A self-narrative produced without semantic, ontological, and governance layers is optimised, like a standalone generator, to continue plausible discourse about a self. It has no durable representation of what about the person remains unknown, no model of dispositional structure against which to test the sufficiency of an explanation, and no contradiction detector strong enough to interrupt a flattering narrative when conduct contradicts it. The result is not random error but a structured class of self-abductive vulnerability, and it is architectural.

We adopt a minimal, operational sense of soundness, transposed from the clinical setting. A description of a person is treated as abductively sound only if it satisfies four constraints. First, required premises are verified or explicitly marked as unknown: “I am generous” carries the unstated premise “under conditions where generosity has cost me little,” and soundness requires surfacing such premises rather than burying them. Second, the proposed mechanism is adequate: the disposition invoked must be the causal source of the behaviour, not a label attached after the fact. Third, the temporal sequence is coherent: the explanation must fit the order in which dispositions, situations, and acts unfolded, not a retrospectively tidied story. Fourth, available counter-evidence can down-rank, suspend, or defeat the explanation: conduct under pressure that contradicts the self-image must be able to override it rather than being explained away.

3. Self-Abductive Failure Modes

When the semantic, ontological, and semiospheric layers of person-understanding collapse into a single narrating stream, recurrent failure modes appear. The first three are inherited directly from the clinical architecture; the remaining three are specific to identity inference and must be added when the object is a person rather than a disease.

Failure mode	Pattern	Architectural absence	Control
Missing-premise neglect	A self-claim is asserted while a decisive enabling condition is unstated — “I am brave” issued from a life without tested risk.	No registry of the preconditions and stakes under which a disposition has actually been exercised.	Premise registry; provenance; not-yet-tested labels.
Weak-mechanism support	Co-occurrence is mistaken for character: kind acts in comfortable conditions become a stable trait of kindness.	No dispositional ontology distinguishing trait from situation.	Attractor ontology; temporal-coherence and mechanism checks.
Counter-evidence discounting	The self-narrative survives conduct under pressure that contradicts it; the episode is reframed as exceptional.	No discordance detector or belief-update mechanism.	Self-report-vs-behaviour discordance surfacing; governed update.
Narrative essentialism	The system or person accepts the story about a self as the self.	No separation of self-narrative from behavioural evidence.	Separate surface narrative from attractor inference; refuse closure from self-report alone.
Contextual overfitting	A conclusion about personality is drawn from one episode in one role-code.	No longitudinal or cross-role evidence threshold.	Minimum evidence threshold; cross-context recurrence requirement.
Premature identity closure	The system declares “this is who the person is” before sufficient evidence exists.	No uncertainty discipline, staleness labels, or refusal mode.	Identity-claim risk scoring; staleness labels; closure refusal; contestability.

Each failure is the absence of a structure rather than a deficit of sincerity. A perfectly sincere person, and a perfectly well-intentioned system, will generate an unsound account of a person with full conviction if these structures are missing.

4. Semantics, Ontology/Attractor, and Semiosphere

Soundness requires separating layers that an ungoverned model collapses, in the order inherited from the clinical architecture. The semantic layer asks what an act means locally. The ontological layer asks what the act is in the model of the person — to which disposition it belongs — and its dynamical form is the attractor. Cognitive-affective personality system theory holds that behaviour varies systematically across situations while a stable structure persists behind the variation: individuals are characterised by enduring if-then situation-behaviour signatures — if situation A, then X; if situation B, then Y — generated by a stable organisation of cognitive-affective units (Mischel and Shoda 1995; Shoda et al. 1994). Within a dynamical-systems reading of personality, such stable organisations can be modelled as attractor-like states rather than as fixed traits (Nowak et al. 2005). The signature, not the average, is the invariant.

A person is therefore not a constant at the level of surface behaviour, but neither is the surface free; it is governed by a recurring structure the person typically cannot see in themselves because they attend to their narrative rather than to the pattern. A person is opaque to themselves exactly where they are most legible to a patient observer who reads the signature instead of believing the self-report; the classic over-attribution of others’ conduct to fixed traits and one’s own to circumstance is the same error run in two directions (Ross 1977). The semiospheric layer, in turn, follows Lotman’s account of meaning as a code-governed space of interpretation rather than a mere collection of signs, and asks in which role-code an act is read: the same act signifies differently as parent, professional, patient, friend, citizen, or contemplative, and a model that reads every act in a single register is, in Lotman’s sense, socially blind in all the others (Lotman 2005).

This layered, if-then account is also the response the model offers to the long-standing person–situation debate. Where strong situationism concludes from the variability of conduct across situations that broad character traits are largely explanatorily idle (Harman 1999; Doris 2002), the attractor view holds that the invariant was misplaced rather than absent: it lies in the stable if-then signature, not in the cross-situational average. Pressure, in this framing, makes load-bearing signatures discriminable.

Layer	Guiding question	Risk if absent
Semantics	What does this act mean locally?	A smooth story without durable, examinable meaning objects.
Ontology / attractor	Which recurrent if-then signature does it belong to?	An unstable model in which labels are mistaken for mechanisms and episodes for grooves.
Semiosphere	In which role-code is the act read?	Formal self-coherence coupled with relational blindness across roles.

The core formula of the model can be stated compactly: the surface changes; the attractor recurs; pressure tests; governance refuses premature closure.

5. The Seven-Contour Governed Architecture

We model the person, and the person-model an AI maintains, as a system of seven runtime contours, mirroring the clinical architecture. A perception/action substrate registers situations and the outcomes of actions. A semantic contour extracts local meaning. An ontology contour maps events into dispositions, constraints, temporal relations, and the attractor signatures of Section 4. A semiosphere contour stores the role-codes through which acts are translated across audiences. A deliberation contour generates competing interpretations and action branches. A memory contour preserves episodic, semantic, procedural, social, and normative experience (Park et al. 2023). A governance contour enforces provenance, risk scoring, approval gates, audit, and the capacity to roll back an account. The operating cycle runs: observe, interpret, ground, frame, retrieve, simulate, choose and act, reflect and update.

The clinical architecture insists that governance is applied first and perception/action is the substrate. The ungoverned self inverts this order: it generates a narrative and applies governance, if at all, afterward, as post-hoc justification — the confabulating interpreter that supplies reasons it did not have (Gazzaniga 2011). The difference between an ungoverned and a governed system is largely the difference between governance-last and governance-first. Governance-first does not mean self-suppression; it means that provenance, risk, and the possibility of refusal are present in the loop before an account is allowed to close. The two distinctive operational elements that the personality case adds to the clinical seven — the Pressure Diagnostic Runtime and the Attractor Registry — are not an eighth contour but the mechanism by which the ontology contour acquires and stores its evidence under governance. We specify them next.

6. The Pressure Diagnostic Runtime and the Attractor Registry

6.1. The Epistemics of Pressure

The central organ of the model is a discordance-detection function. In the clinical system it compares reported or observed deterioration against objective, provenance-labelled markers, flags unresolved discordance, and does not itself diagnose. Transposed to the person, it compares the self-report — “I am calm, honest, brave, principled” — against the objective behavioural markers of what the person actually did under load. When the two diverge, the sound response is to surface the discordance, not to repair the narrative.

This yields an epistemic consequence with practical force. The behavioural markers the detector needs become discriminable only when a claim is placed under cost, uncertainty, authority, fatigue, loss, gain, or moral conflict. In comfort, a conditional disposition — courage that holds only in the absence of risk, honesty that holds only while honesty is cheap — may emit no signal distinguishable from its unconditional counterpart. Pressure therefore functions as an evidentiary condition rather than as an ontological revealer. It does not create personality, and it does not provide transparent access to an essence; it changes what can be observed, which premises are tested, and which self-descriptions remain supported when the cost rises. An account assembled entirely from low-pressure observation is therefore weakly supported, because the tests that would discriminate structure from comfort have not been run.

Two caveats bound this claim. First, it is not merely conceptual: the empirical literature on conduct under coercion and authority shows that ordinary self-images can fail to predict behaviour once the cost rises — obedience to authority overriding professed conscience (Milgram 1974), and atrocity performed by unremarkable people under situational and institutional pressure (Browning 1992; Arendt 1963). These are conditions under which conditional and unconditional dispositions become easier to separate. Second, pressure may make behaviour more legible, amplify it, distort it, or co-produce it. The architecture therefore treats pressure-derived evidence not as a window onto a standing trait, but as a bounded observation of a disposition-under-conditions. The Attractor Registry must preserve this limitation by assigning provisional governance status, provenance, role-code, and uncertainty to every pressure-derived signature.

6.2. The Pressure Diagnostic Runtime

We make pressure an explicit runtime. The Pressure Diagnostic Runtime annotates naturally occurring or ethically consented pressure events as evidence; it does not manufacture them. (The prohibition on engineered pressure is stated in Section 10 and is constitutive of the module, not an external caveat.) For each event it records:

• Pressure class — fear, gain, loss, power, intimacy, uncertainty, fatigue, status threat, moral conflict, bodily risk.
• Stakes — what could be lost or gained, and whether the cost was real or merely symbolic.
• Role-code — the context in which the event occurred: work, family, clinical care, money, authority, friendship, public identity, spirituality.
• Reaction trace — speech, action, avoidance, delay, escalation, withdrawal, repair, confession, refusal.
• Recurrence — whether the same if-then pattern appears across time and across role-codes.
• Attractor inference — whether the event is noise, a transient state, a role-conditioned pattern, or evidence of a stable groove.

This converts pressure from a philosophical condition into an annotated evidentiary record, and it is the element that most distinguishes a human digital twin from both ordinary trait psychology and a fluent self-narrative.

6.3. The Attractor Registry

The Attractor Registry is the data object that keeps the model concrete and protects it from two opposite errors — trait essentialism on one side, pure contextual relativism on the other. It stores recurrent if-then signatures rather than personality labels, each entry governed by an explicit status that forbids premature closure.

Registry field	Example	Purpose
Trigger condition	If status is threatened; if uncertainty rises; if a moral boundary is crossed; if physical risk appears.	Defines the activating input.
Observed response	Defensive intellectualisation; control-seeking; withdrawal or refusal; activation rather than collapse.	Records behaviour without moral diagnosis.
Role-code	Professional / relational / clinical / financial.	Prevents transfer across contexts without evidence.
Pressure level	Low / medium / high; symbolic / material.	Distinguishes cheap declarations from costly conduct.
Recurrence strength	One event / repeated / cross-role / stable signature.	Guards against contextual overfitting.
Governance status	Provisional / contested / confirmed / stale / retired.	Guards against premature identity closure.

7. Integrity as a Bounded Operational Contour

In the clinical architecture, a bounded system denotes broad competence inside an explicitly delimited operational contour, specifying protected invariants, risk thresholds, and decisions requiring approval; restriction is treated not as weakness but as the precondition of measurement, safety, and the incremental extension of capability (Morris et al. 2023). Transposed to the person, this is a precise formalisation of integrity. The person of integrity is a bounded contour: certain invariants are protected and not subject to situational renegotiation; certain decisions are routed through governance rather than impulse; and the contour itself is what makes the person partly predictable across changing states. On this view integrity is not a trait one discovers but an architecture one installs. Where such a contour is absent, conduct becomes more susceptible to whichever situation is currently active. To keep one’s word across a change of state is to impose a constant on oneself against the drift of disposition — to make oneself, in part, a designed rather than a found object. Persons who can be treated as relatively stable across changing conditions are therefore not metaphysical exceptions but achievements: they have made themselves reliable by holding a line.

8. Governance as Witness: A Functional Non-Generating Control Layer

The governance contour is distinctive: it is applied first, it constrains, audits, refuses, and rolls back, but it does not itself generate accounts. It is the one contour that is not a content producer. Transposed to the self, this is the structural role sometimes named, in contemplative analyses, the witness: a function that registers the arising of narratives and dispositions without being one of them. We use the term strictly functionally — the witness here is whatever performs governance without generating content, the capacity to treat a self-narrative as a candidate to be audited rather than as the self. No claim about consciousness, and no metaphysical assertion of a self, is made or required; a reader may substitute “non-generating governance function” throughout.

This placement also clarifies transformation. A living person continuously revises the meanings of their central terms, so any explicit self-model captures a temporal snapshot — a frozen self-semiosphere — that is always somewhat stale relative to the current trajectory of the underlying structure. To declare “I am a different person now” without governance is the self-directed form of fluent-but-unsound generation: a re-narration unsupported by behavioural data. Genuine transformation is instead a governed ontology-revision: it updates the attractor structure rather than overwriting the story about it, and a structural change is admitted only after it has held under the very conditions that would previously have shown a return to the prior attractor. The architecture is designed for periodic ontology revision rather than one-time specification. This entails a three-layer stability gradient: the surface (semantics and narrative) is a fast variable; the attractor (ontology) is a slow variable, revisable only through sustained structural change; and the non-generating governance function is the system’s most stable available reference layer, because it audits the other layers without producing their content. Several traditions converge on this separation between generated self-content and the function that registers it — the narrative self as a centre of gravity, the phenomenal self-model, embodied and Buddhist-influenced analyses of self (Dennett 1992; Metzinger 2003; Varela et al. 1991; Albahari 2006).

This stability is regulative rather than absolute. In an engineered person-modelling system the governance contour is itself revisable code, and even in the first-person case the witnessing function is a structural posture rather than a demonstrated metaphysical constant. The claim is only that it is the layer not generated by the attractor it audits, not that it stands outside revision altogether. References to invariance in this paper should therefore be read operationally: as the system’s most stable available reference point relative to which other layers are examined, revised, or refused.

A bootstrap problem follows, and it is the model’s sharpest point — but it must be partitioned by agent, because the same word, pressure, plays opposite ethical roles in the first and the second person. If a person is, by default, largely governed by their attractor, the will that would install governance is itself partly shaped by that attractor and cannot lift the system out of its own basin by effort alone. Two non-circular sources of governance remain. The first is the witnessing function itself, precisely because it generates no content and is therefore not captured by the attractor it observes. The second, in the first person only, is self-imposed commitment: an agent may bind their own future conduct so that the prior structure cannot be re-entered — the classical self-binding of the Ulysses contract, a recognised and rational response to one’s own predictable lapses (Elster 1979). Such self-chosen, irreversible commitment is legitimate precisely because the agent imposes it on themselves with full information. It is categorically different from a system or institution engineering pressure on a person, which we prohibit absolutely in Section 10. The strongest first-person move available to a person is not to rewrite a disposition directly but to choose, for themselves, to enter conditions from which there is no easy return; the rewriting is then performed by the absence of an exit rather than by resolve. No such move is ever available to a system acting on someone else.

9. Implications for Person-Modelling AI and Society

Any AI that models a person inherits every failure mode catalogued above — now with institutional authority and at scale. First, person-modelling AI inherits the asymmetry of trust. A self-model presented by a system in a fluent, confident register is read by the user as an authoritative account of who they are, even where the model’s internal reasoning does not warrant that authority. A system that tells a user what kind of person they are performs, on the user’s behalf, exactly the fluent-but-unsound self-abduction the user is already prone to, and lends it the appearance of objectivity.

Second, the frozen-semiosphere problem becomes an ethical obligation. A person-model is a snapshot; a living user moves on. A system that continues to act on a stale model — recommending, predicting, or reflecting back a self the user has outgrown — reinforces the prior attractor and can obstruct precisely the transformation the user is attempting. The architectural responses are design norms, not refinements: staleness labels on every inferred self-attribute; a governed update mechanism that revises the user-model only through data-consistency, temporal coherence, and the user’s explicit confirmation of structural changes; provenance on every self-claim the system makes about the user; a discordance-surfacing function that shows the user where self-report and recorded behaviour diverge without diagnosing or deciding; and a prohibition on automatic transfer between a user’s distinct role-codes.

Third, integrity becomes a design norm for artificial agents as well as for persons, and here the two objects of the paper converge. As AI agents acquire memory, deliberation, and governance contours, the vocabulary for describing an agent and a self converges; the bounded operational contour is at once a specification for a trustworthy agent and a description of a person of integrity. A society that learns to demand provenance, staleness-awareness, governed update, and surfaced discordance from its machines is learning, in the same act, the conditions of sound self-knowledge for its members. Finally, the model has a distributive dimension: a system optimised for fluent self-narrative pulls atypical persons toward common narratives, exactly as a likelihood-optimised clinical model pulls atypical patients toward common diagnoses. People whose self-understanding departs from the statistical centre — by culture, neurotype, or biography — are precisely those a fluent person-model is most likely to misdescribe and most likely to persuade. Building the governance layer is therefore part of an equitable distribution of the capacity for sound self-knowledge.

10. Ethical Guardrails

Because the model assigns pressure an evidentiary role, its boundaries must be stated plainly, or “pressure” could be misread as a licence for manipulation. Pressure evidence may be observed, remembered, and analysed; it may never be secretly engineered by an AI system or an institution. The following prohibitions are constitutive of the architecture, not external caveats:

• No covert pressure tests and no manipulation of users for the purpose of personality inference.
• No harmful pressure engineering; modelled pressure events must be naturally occurring or, in the first person, self-chosen — proportionate, and ethically bounded.
• No diagnostic or moral identity labels: the system may surface discordance, but must not declare who a person really is.
• No automatic transfer between role-codes: a profile built in one context must not silently govern intimacy, health, finance, or family contexts.
• Every inferred self-attribute carries provenance, confidence, vintage, uncertainty, and user-facing contestability.
• The user can pause, challenge, delete, and retire any component of the person-model.

The self-binding of Section 8 is exempt from the prohibition on engineered pressure precisely and only because the agent imposes it on themselves; it provides no warrant for a system to impose pressure on anyone.

11. Limits and Validation Roadmap

The model is a conceptual framework, not yet an empirical theory; its contours and the discordance function require operational definition and measurement before any predictive claim. The mapping from a clinical architecture to the person is an argument by structural analogy, and the burden of subsequent work is to show that it yields non-trivial, testable predictions rather than redescriptions. The term witness is functional and architectural, not a metaphysical demonstration. The appeal to cognitive-affective personality system theory imports that theory’s own empirical limits. The claim that sound person-knowledge requires pressure is bounded strictly by Section 10. Subsequent work may formalise the Attractor Registry and Pressure Diagnostic Runtime using graph-theoretic or topological descriptors, but such formalisation must remain tied to observed, consented, pressure-conditioned signatures rather than to claims about an underlying personality essence. We propose a staged, honest validation sequence in which each stage conditions the next and any negative result triggers revision before progression.

To avoid mere redescription, the framework commits to at least two falsifiable predictions. (i) Incremental validity: attractor signatures annotated from pressure events should predict conduct in held-out high-stakes situations better than both trait self-report and low-pressure behavioural averages; if pressure-derived if-then signatures add no predictive variance over comfort-condition data, the central claim is false. (ii) Discordance asymmetry: for conditional dispositions, self-report should diverge from behaviour more under high-stakes than low-stakes conditions, while remaining concordant for unconditional ones; absence of this interaction would falsify the evidentiary-condition account of pressure. Both predictions are defeasible and, per Section 10, testable only on naturally occurring or self-chosen pressure, never engineered.

A further limit concerns the difference between clinical and person-modelling ground truth. In a medical digital twin, abductive soundness can ultimately be constrained by bodily markers, clinical outcomes, or adjudicated events. In person-modelling, no equivalent external arbiter exists. A behavioural attractor is therefore not treated here as a directly observable essence of the person, but as a provisional and defeasible construct inferred from recurrent if-then patterns across situations and role-codes. Pressure improves the evidentiary conditions under which such patterns can be observed, but it does not provide transparent access to an underlying self. It may make behaviour more legible, amplify it, distort it, or co-produce it depending on the situation and the observer’s framing. For that reason, the architecture does not license identity closure. Its purpose is to preserve provenance, uncertainty, contestability, and role-code separation around any claim about a person.

Stage	Object	Endpoint	Risk controlled
Conceptual mapping	This manuscript.	Internal coherence between self-abduction, attractor structure, and governance.	Pure metaphor without architecture.
Architecture specification	Pressure Diagnostic Runtime and Attractor Registry.	Can the model represent if-then signatures without identity closure?	Trait essentialism; contextual overfitting.
Retrospective annotation	Consented reflective diaries, longitudinal coaching logs, de-identified communication transcripts; synthetic or fictional cases for preliminary sandbox annotation only.	Inter-rater consistency in pressure-event annotation.	Unfalsifiable interpretation.
Prototype	Human digital twin with provenance and staleness labels.	User comprehension, contestability, refusal of premature closure.	Authoritative false self-model.
Ethical review	Person-modelling governance.	No covert pressure; role-code separation; consent and audit.	Harmful inference or behavioural steering.

12. Conclusion

An ungoverned system — a person or a machine — can speak fluently about a self without reliably reasoning about it. Its fluency conceals missing premises, mistakes co-occurrence for character, discounts the counter-evidence of conduct under pressure, takes the story for the person, overfits a single episode, and closes identity prematurely. These are not failures of sincerity; they arise when the layers of person-understanding collapse into one narrating stream. The governed attractor architecture offers a different posture. Semantics extracts the local meaning of an act; the semiosphere translates it across role-codes; ontology resolves it into recurrent attractor signatures; a Pressure Diagnostic Runtime supplies the evidence that comfort cannot; an Attractor Registry stores it under a governance status that forbids premature closure; and memory, deliberation, and governance convert a narrator into a bounded abductive system. A person-modelling AI should not tell a human who they are. It should govern the conditions under which any claim about who a person is remains open, contested, stale, revised, or refused — which is also, turned inward, the discipline of sound self-knowledge. The task is not to make the self sound more coherent, more principled, or more transformed. The task is to make the system structurally capable of refusing its own unsound narratives about a person — and to require the same of any system we permit to model one.

Author Contributions

Both authors contributed to the conception and development of the architecture and to the writing and revision of the manuscript. Both authors read and approved the final manuscript.

Funding

The authors received no specific grant for this research from any funding agency in the public, commercial, or not-for-profit sectors.

Data Availability Statement

Not applicable. No datasets were generated or analysed; the article is a conceptual contribution.

Conflicts of Interest

The authors declare that they have no competing interests.

Ethics approval

Not applicable. This is a conceptual paper that involved no human participants, their identifiable data, or animals.

Consent to participate/for publication

Not applicable.

References

Albahari M (2006) Analytical Buddhism: The Two-Tiered Illusion of Self. Palgrave Macmillan, Basingstoke.
Arendt H (1963) Eichmann in Jerusalem: A Report on the Banality of Evil. Viking Press, New York.
Browning CR (1992) Ordinary Men: Reserve Police Battalion 101 and the Final Solution in Poland. HarperCollins, New York. [CrossRef]
Dennett DC (1992) The self as a center of narrative gravity. In: Kessel FS, Cole PM, Johnson DL (eds) Self and Consciousness: Multiple Perspectives. Lawrence Erlbaum, Hillsdale NJ.
Doris JM (2002) Lack of Character: Personality and Moral Behavior. Cambridge University Press, Cambridge.
Elster J (1979) Ulysses and the Sirens: Studies in Rationality and Irrationality. Cambridge University Press, Cambridge. [CrossRef]
Gazzaniga MS (2011) Who’s in Charge? Free Will and the Science of the Brain. Ecco/HarperCollins, New York.
Harman G (1999) Moral philosophy meets social psychology: virtue ethics and the fundamental attribution error. Proceedings of the Aristotelian Society 99:315–331.
Lotman YM (2005) On the semiosphere. Sign Systems Studies 33(1):205–229.
Makarov N, Bordukova M, Quengdaeng P et al (2025) Large language models forecast patient health trajectories enabling digital twins. npj Digital Medicine 8:588. [CrossRef]
Metzinger T (2003) Being No One: The Self-Model Theory of Subjectivity. MIT Press, Cambridge MA.
Milgram S (1974) Obedience to Authority: An Experimental View. Harper & Row, New York.
Mischel W, Shoda Y (1995) A cognitive-affective system theory of personality: reconceptualizing situations, dispositions, dynamics, and invariance in personality structure. Psychological Review 102(2):246–268.
Morris MR, Sohl-Dickstein J, Fiedel N et al (2023) Levels of AGI for operationalizing progress on the path to AGI. arXiv:2311.02462.
Nowak A, Vallacher RR, Zochowski M (2005) The emergence of personality: dynamic foundations of individual variation. Developmental Review 25(3–4):351–385. [CrossRef]
Park JS, O’Brien JC, Cai CJ, Morris MR, Liang P, Bernstein MS (2023) Generative agents: interactive simulacra of human behavior. arXiv:2304.03442.
Peirce CS (1931–1958) Collected Papers of Charles Sanders Peirce, vols 1–8. Hartshorne C, Weiss P (eds, vols 1–6); Burks AW (ed, vols 7–8). Harvard University Press, Cambridge MA.
Ross L (1977) The intuitive psychologist and his shortcomings: distortions in the attribution process. In: Berkowitz L (ed) Advances in Experimental Social Psychology, vol 10. Academic Press, New York, pp 173–220.
San O, Rasheed A, Bozdemir E, Deng J (2026) The evolution of digital twins from reactive to agentic systems. Nature Computational Science 6(1):6–10. [CrossRef]
Shirbint E, Rybalov A (2026) From plausible narrative to sound abduction: a governed abductive architecture for medical digital twins in multiple sclerosis care. Preprints.org. [CrossRef]
Shoda Y, Mischel W, Wright JC (1994) Intraindividual stability in the organization and patterning of behavior: incorporating psychological situations into the idiographic analysis of personality. Journal of Personality and Social Psychology 67(4):674–687.
Varela FJ, Thompson E, Rosch E (1991) The Embodied Mind: Cognitive Science and Human Experience. MIT Press, Cambridge MA.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.