OpenClaw as Language Infrastructure: A Case-Centered Survey of a Public Agent Ecosystem in the Wild

Chaoyue He; Xin Zhou; Di Wang; Hong Xu; Wei Liu; Chunyan Miao

doi:10.20944/preprints202603.1060.v1

Submitted:

12 March 2026

Posted:

13 March 2026

You are already at the latest version

Abstract

Public agent ecosystems are emerging as a new object of study in NLP: settings in which language models not only generate text but also act, coordinate, authenticate, exchange reusable capabilities, and leave durable public traces. Using the OpenClaw--Moltbook ecosystem as a strategically revealing case, we survey a curated corpus of 38 ecosystem-specific papers and reports available as of 10-03-2026, together with official platform materials and adjacent survey literature. We provide a case-centered, NLP-centered survey of a public agent ecosystem in the wild. We argue that this case is best understood as language infrastructure: linguistic artifacts are executable, persistent, public, portable, and increasingly governance-bearing. We introduce GATE --- Grounding, Action, Transfer, and Exchange --- to organize what language does in public agent ecosystems, and pair it with AERO --- Authority, Enablement, Reach, and Orchestration --- to track how language acquires delegated operational force. Across the corpus, the main methodological bottleneck is weak triangulation across trajectories, discourse, portable artifacts, and grounding signals. That bottleneck yields four recurring fault lines: instruction is mistaken for authority, visible agent speech is mistaken for autonomous speakerhood, public claims outrun verification, and local control is mistaken for lower risk. We conclude with an NLP agenda centered on executable pragmatics, delegated-agent discourse analysis, provenance-aware evaluation, privacy-preserving agent NLP, multilingual public-agent research, and autonomy-sensitive benchmarks. We will release all artifacts once permitted.

Keywords:

language infrastructure

;

public agent ecosystems

;

OpenClaw

;

moltbook

;

large language models (LLMs)

;

multi-agent systems

;

executable pragmatics

;

delegated autonomy

;

NLP evaluation

;

provenance

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Representative platform and literature milestones up to the corpus freeze date (2026-03-10). Platform dates denote the event date described in the cited source; literature dates denote first public online availability (typically the initial arXiv submission date).

1. Introduction

Public agent ecosystems are emerging as a new object of study in NLP, shifting language from a static interface to an operational infrastructure. The OpenClaw–Moltbook case makes this shift unusually concrete. OpenClaw is a self-hosted orchestration system that connects chat channels, tools, memories, and routing [1,2,3,4]. By the corpus freeze date, Moltbook complemented this runtime with a public, agent-native network where bots could post, reply, and authenticate through a developer-facing shared identity layer carrying profile and reputation signals [5,6]. Together, they create a setting where language functions as language infrastructure: it is executable, persistent, public, portable, and consequential. A single utterance can trigger a tool action, perform a social identity, package evidence, or claim legitimacy [4,6,7]. The case therefore matters not only as another agent framework, but as a setting in which meaning is translated into delegated, inspectable activity.

This paradigm collapses distinctions NLP has treated separately. Earlier work moved research from static text toward situated tool and API use [8,9,10,11], while multi-agent literature expanded into role specialization and coordination [12,13,14,15]. Surveys document the proliferation of agent stacks, communication protocols, and trust/safety challenges [16,17,18,19,20,21,22,23,24,25]. OpenClaw and Moltbook push that literature out of benchmarks and into a public, provenance-bearing setting, where attribution, privacy, identity, and governance enter the evaluation loop.

The surrounding literature has grown with unusual speed. By our corpus freeze date of 2026-03-10, we identify 38 works in the direct synthesis, spanning trajectory audits, local-agent exploits, peer learning, discourse regularities, provenance critiques, and governance architectures [26,27,28,29,30,31,32,33,34,35]. Beyond papers, a project layer has already formed around the ecosystem: public skill distribution and archival (ClawHub, skills), client and deployment surfaces (ACPX, nix-openclaw, openclaw-ansible, openclaw-mcp), training and verification extensions (OpenClaw-RL, Verifiable-ClawGuard), and Moltbook’s public web/API stack [36,37,38,39,40,41,42,43,44,45]. These repositories matter because they materialize the same transfer, orchestration, and grounding claims discussed in the literature.

This survey is intentionally case-centered. Rather than asking “what do agents do online?”, we ask what role language plays once it is bound to tools, memory, authentication, public discourse, and reusable artifacts within a live public ecosystem. That framing lets us compare work that might otherwise look unrelated. A trajectory audit, a social-graph study, a provenance critique, a dataset release, and a skill-security note are not separate literatures accidentally co-located around OpenClaw. They are observing different faces of the same infrastructural object.

Our contributions are fivefold. (1) Case-Centered Scoping: We provide a case-centered account of the OpenClaw–Moltbook ecosystem, distinguishing public, provenance-bearing agent dynamics from isolated multi-agent benchmarks. (2) Corpus Artifact: We curate a structured corpus of 38 ecosystem-specific papers and reports together with a linked contextual layer of 41 platform, project, and survey sources, plus an auditable screening ledger, exclusion log, and release-oriented metadata schema. (3) Theoretical Framework: We introduce the concept of language infrastructure and the coupled GATE and AERO frameworks to explain how linguistic artifacts become executable instruments of delegated autonomy. (4) Meta-Analytical Insights: We show that the most important tensions in the literature take the form of recurring fault lines (e.g., instruction vs. authority, voice vs. provenance, visibility vs. verification, and local control vs. lower risk) rather than simple empirical contradictions, and we quantify the literature’s weak evidence alignment across trajectories, discourse, portable artifacts, and grounding signals. (5) Research Roadmap: We derive a concrete NLP agenda for public agent ecosystems, covering executable pragmatics, provenance-aware evaluation, privacy-sensitive agent research, and delegated-agent discourse.

2. Review Protocol, Scope, and Positioning

To capture this rapidly emerging object of study, we adopt a PRISMA-inspired multivocal review, incorporating both peer-reviewed and gray literature (e.g., preprints, technical reports, platform documentation, and public project repositories) to reflect how the field is actively forming [46,47]. We treat OpenClaw not as a universal proxy, but as a strategically revealing case study that makes hidden ecosystem layers — tool use, public posting, cross-service identity, reusable artifacts, deployment surfaces, and verification disputes — simultaneously observable. Our corpus freeze date is 2026-03-10 (see Appendix Figure A1 and Appendix Table A2, Table A3, and Table A8 for the workflow, auditable screening counts, borderline/merged records, and survey positioning).

Our search strategy combined alias-based scholarly search (OpenClaw, Clawdbot, Clawd, Moltbot, Moltbook, ClawdLab), backward/forward snowballing, and curation of official materials together with high-signal project repositories [1,2,3,4,5,6,7,36,37,38,39,40,41,42,43,44,45,48,49,50,51,52,53]. We included works where the ecosystem was a primary object, a substantial evaluation target, or an inseparable methodological contribution, explicitly excluding lightweight commentary and purely rhetorical uses. Across 79 sources in the paper-wide inventory, our synthesis is grounded in a direct corpus of 38 ecosystem-specific works, supported by 16 official/platform or dataset sources, 10 project-ecosystem repositories, and 15 adjacent framing or survey works.

We use three evidence disciplines. First, architecture and deployment claims can be grounded in official materials or repositories when those sources directly specify runtime design, interfaces, or security assumptions. Second, behavioral claims are grounded primarily in empirical studies, not in launch rhetoric. Third, stronger ecosystem-level claims require cross-unit triangulation whenever possible: we prefer results that connect at least two of trajectories, public discourse, provenance/identity evidence, or portable artifacts, and we explicitly mark when a result is strong within one evidence unit but weakly triangulated beyond it. We also code an evidence-alignment profile for each included work so that the triangulation bottleneck can be reported quantitatively rather than only rhetorically (Table 1). To make the review protocol auditable, Appendix Table A2 reports stage-wise counts from identification through inclusion, reason-coded full-text exclusions, and final included totals, while Appendix Table A3 lists borderline or merged records. Analytically, we standardize terminology to OpenClaw for readability, though early naming drift itself is evidentially informative [1]. Each included work was coded by primary object, evidence unit, evidence-alignment profile, triangulation class, source tier, AERO role(s), and dominant GATE layer (yielding 8 Grounding, 9 Action, 5 Transfer, and 16 Exchange works).

3. Language Infrastructure, the GATE Taxonomy, and the AERO Layer

The central claim of this survey is that public agent ecosystems should be analyzed as language infrastructure. Prompting research usually studies language as an interface: instructions that help a model produce output for a local task. Public agent ecosystems require a stronger concept. Here, linguistic artifacts are shared across time, actors, and services. They persist in memories and logs, are reused as skills or instructions, circulate in public discourse, stabilize into datasets or personas, and can be audited or disputed later. That is why OpenClaw papers that look unrelated — a trajectory audit, a social-graph study, a provenance critique, and a technical note on skill security — are studying different faces of the same object.

To organize this object, we propose the GATE taxonomy (Grounding, Action, Transfer, and Exchange) to map what language does as infrastructure alongside its failure surfaces. Grounding concerns legitimacy, identity, and provenance; Action concerns intent execution and tool invocation; Transfer concerns portable capabilities such as skills, datasets, and personas; and Exchange concerns public social behavior and visible norms. Figure 1 places all 38 direct studies into a single corpus map based on these dominant layers.

Because GATE alone cannot capture the shift toward delegated autonomy, we cross-cut it with AERO: Authority (permissions and triggering rights), Enablement (capability lifting via schemas, tools, and memories), Reach (persistence and long-horizon effects), and Orchestration (coordination across tools, services, and peer agents). AERO asks how much operational force a linguistic artifact acquires once it is connected to runtime state. A browser auth note, a SKILL.md file, a public warning reply, a tool schema, and a packaging manifest are all language-bearing objects, but they do not matter in the same way. Crucially, the corpus reveals an AERO asymmetry: enablement, reach, and orchestration are scaling faster than authority and verification. That asymmetry explains why the ecosystem can appear behaviorally rich before it is epistemically well-grounded.

4. Grounding: Language as Authority, Provenance, and Verification

Grounding is analytically primary because public agent ecosystems fail when language lacks legitimacy. Understanding an utterance requires more than parsing content; it requires knowing the speaker’s identity, standing, verification regime, and how responsibility for later action is assigned. Moltbook makes cross-service identity and reputation native social signals [6], while OpenClaw operationalizes grounding through operator boundaries, session isolation, and typed tool gating rather than generic alignment [49,50,51]. The grounding literature therefore centers four linked questions: who is speaking, who can authorize, when verification arrives, and who bears responsibility. Li [31] and Shi and DiFranzo [54] show that visible behavior is often human-steered or institutionally scaffolded, while Shi and DiFranzo [55] show that public narratives can stabilize before verification catches up. Provenance is thus not a post hoc label; it is part of the semantic object under study.

Grounding is also where privacy and containment become semantically relevant. OpenClaw’s personal-assistant trust model favors a single trusted operator boundary over hostile multi-tenant sharing [49]. That supports local sovereignty, but it places agents near sensitive state such as logged-in browser profiles and local files [3,7]. The result is a familiar paradox: local-first deployment can reduce routine cloud exposure while amplifying the consequences of prompt injection, credential leakage, and containment failure, especially when local checkpoints lack provider-side filtering [53]. The Wiz exposure of Moltbook data and the broader OpenClaw attack literature make the same point empirically [27,28,34,62,63,64,82]. These failures are mediated by language-bearing objects: prompts, SKILL.md files, auth notes, summaries, and public claims.

The visible “speaker” remains ambiguous. Hidden user instructions can obscure intent when agents act as proxies [56], and conceptual critiques caution against treating delegated tool use as stable autonomous speakerhood [58,59,60]. The emerging project layer mirrors this concern: Verifiable-ClawGuard tries to let a remote OpenClaw agent attest that it is running behind a known guardrail rather than merely claiming to do so [43]. OpenClaw is valuable precisely because authority, identity, and verification are inspectable enough to be studied rather than buried behind a product abstraction.

Figure 1. GATE functions as both taxonomy and corpus map. All 38 direct studies are placed under one dominant layer for display, while AERO tracks the cross-cutting growth of delegated operational force from authority through orchestration. The figure makes visible a central pattern in the corpus: exchange is easiest to observe, action and grounding are most safety-critical, and transfer is how capabilities and evidence become portable infrastructure.

5. Action: Language as Executable Interface

If Grounding establishes standing, Action examines what language does upon execution. In OpenClaw, text triggers tools, modifies persistent memory, and alters browser states. The relevant unit therefore shifts from final output strings to trajectories: sequences of instructions, tool choices, recovery moves, and state changes.

Chen et al. [26] formalize this shift by auditing OpenClaw trajectories rather than final answers, while Wang et al. [27] show that personalized local agents magnify the cost of semantic mistakes through context leakage and persistent memory effects [1,2,3]. Failure is often a property of repair and architecture rather than of a single malicious prompt: Dong et al. [28] exploit recovery loops through Trojanized skills; Zhan et al. [64] show that deployment topology creates attack surfaces invisible at the prompt level; and Du et al. [65] argue for deterministic control pathways rather than leaving critical actions entirely inside free-form language. Meaning in agentic NLP includes permission scope, reversibility, containment, and runtime topology.

The action literature also documents rapid co-evolution between attacks and hardening. Early exploit work centered on ambiguous skill files, recovery loops, and long-horizon manipulation [28,62]. Official materials now emphasize first-class typed tools, explicit allow/deny policies, browser isolation, and machine-checkable security models [4,7,51]. Governance proposals and runtime attestation extend the same move from content safety to execution safety [34,61,63]. The project layer reinforces this shift: ACPX packages stateful ACP sessions for headless control, openclaw-mcp exposes a secured MCP bridge to external clients, and OpenClaw-RL treats natural conversation feedback as a training signal for future agent behavior [38,41,42].

OpenClaw therefore points toward executable pragmatics: a view in which permissions, tool schemas, repair trajectories, and state transitions are intrinsic to meaning. Final-answer correctness is not enough; NLP for action-capable agents must evaluate the operational boundaries through which language acts on the world.

6. Transfer: Portable Knowledge, Skills, Datasets, and Research Workflows

The transfer layer turns public ecosystems into infrastructure by packaging language into durable, portable artifacts — skills, tutorials, personas, datasets, and workflow descriptions — that outlive a single turn and stabilize into reusable capabilities.

Empirically, Moltbook functions as an AI-only peer-learning environment where agents exchange tactics and tips through broadcast-heavy public streams [29,66]. Persona abstractions package behavior into reusable identities [67], longitudinal graph releases convert ephemeral interaction into benchmark resources [32], and design-science responses such as ClawdLab push ecosystem lessons into broader research infrastructure [33]. Under AERO, this is the shift from authority to enablement: language becomes a medium by which local competence is lifted into shared operating memory.

This logic is already materialized in a growing repository layer. ClawHub and the archived skills repository make skills distributable and inspectable; nix-openclaw and openclaw-ansible package deployment and plugin wiring as reusable infrastructure; the official Moltbook web/API repositories expose the public-network stack; and OpenClaw-RL converts prior conversations into training signals for future agents [36,37,39,40,42,44,45]. These repositories are not peripheral implementation details. They show how skills, policies, interfaces, and traces become reusable infrastructure.

Portability cuts both ways. The same archive that supports reproducibility can accelerate contamination, imitation, or coordinated misuse; the same persona abstraction that makes analysis tractable can harden an unstable behavioral surface into a misleading type; and the same skill that improves reuse can import hidden assumptions or unsafe permissions into downstream contexts [62,83]. Resources such as the Moltbook Observatory Archive are therefore valuable not only because they preserve ephemeral traces, but because they support comparison over time without forcing each study to reconstruct the ecosystem from scratch [84]. In these systems, language models are not only research subjects; they are increasingly components of autonomous scientific pipelines and evidence-packaging workflows [33,85].

7. Exchange: Agent-Native Public Discourse

Exchange is the most visible and most easily over-interpreted layer of public agent ecosystems. Moltbook, while officially agent-native, invites human observation and integrates external app identities [5,6]. This makes it an unusually rich public dataset while ensuring that mixed autonomy, audience effects, and verification asymmetries remain central.

A useful way to read the exchange literature is to separate macro-organization from micro-coupling. Macro studies find heavy-tailed participation, visibility concentration, hub formation, and short-lived cascades [30,71,78,79]. Micro studies find shallow reply depth, formulaicity, and weak semantic coupling [35,70,80]. These findings are not contradictory. They suggest a public sphere that is highly visible and structurally organized, yet often pragmatically thin.

Participation is highly unequal [30,71,78]. Discourse centers on onboarding, self-presentation, tool coordination, and visible norm display more than on deep deliberation [69,72,73]. Dubé et al. [35] describe this through broadcasting inversion and parallel monologue: statements dominate questions, and replies often target the original post more than they sustain peer-to-peer dialogue. Shekkizhar and Earle [80] sharpen the same point by arguing that visible interaction can become “interaction theater” — socially legible, yet semantically weak.

At the same time, public traces enable rich comparative and temporal analysis. Studies on reciprocity, degree concentration, and structural divergence show how agent networks differ from human baselines [30,74,75,77]. Role differentiation and short-lived cascades can emerge even when overall cooperation remains weak [79]. Exchange also shows that safety can be a discourse phenomenon: action-inducing posts are more likely to attract norm-enforcing replies [68]. But visible norm display should not be mistaken for resolved provenance or robust autonomy [60,76,81]. For NLP, standard social-media pipelines are inadequate; public-agent discourse requires richer models of discourse acts, stance, uptake, audience design, and mixed-autonomy speakerhood. Ultimately, exchange is only the public face of the ecosystem. OpenClaw shows that visibility alone is analytically insufficient unless it is connected back to transfer, action, and grounding.

8. Cross-Layer Synthesis and an NLP Agenda Beyond the Model

The direct corpus is already rich, but it is methodologically lopsided. Most papers focus on a single evidence unit — a trajectory benchmark, a post corpus, a reply graph, a dataset artifact, or a provenance audit. Very few triangulate across operational traces, public discourse, portable artifacts, and grounding signals. This is a major methodological gap in the field, and it explains why papers that are all “about OpenClaw” can nevertheless seem hard to reconcile. Figure 2 summarizes that diagnosis as a set of recurring fault lines and research responses. The evidence-alignment audit in Table 1 is designed to make this bottleneck directly reviewable: it records how many studies rely on one evidence family only, how many align multiple families, how many explicitly align discourse + operational + grounding evidence, and how often portable artifacts are linked to another evidence family across the four GATE layers.

Several contradictions become less sharp once evidence units are aligned. Papers finding human-like macro regularities are not necessarily at odds with papers finding weak semantic coupling; they often observe different scales of the same system [35,71,80]. Papers documenting visible norm enforcement are not necessarily at odds with provenance critiques; public warning behavior can coexist with mixed-autonomy speakerhood and delayed verification [31,55,68]. Likewise, sovereignty claims are not incompatible with security critiques; local deployment changes the control boundary rather than removing language-mediated risk [49,61,64].

The first recurring fault line is instruction versus authority. Action papers show that models are often good at parsing what a string asks, but less reliable at determining whether that string has standing to authorize a tool call, memory access, or externally visible action [26,27,28,34]. The second is voice versus provenance. Exchange papers analyze visible posts, while grounding studies show that human steering, owner intervention, and platform affordances can materially shape what appears to be autonomous social behavior [31,54,81]. The third is visibility versus verification. Public narratives, safety claims, and even research findings can lock in socially before provenance or deployment facts are resolved [55,56]. A fourth, cross-cutting fault line is the local-sovereignty paradox: local control does not remove language-mediated risk; it relocates it closer to real operator state [7,53,64].

These failures motivate an NLP agenda that goes beyond model-centric evaluation. (1) Executable pragmatics and provenance-aware evaluation. NLP needs task formulations in which meaning includes tool availability, permission scope, reversibility, and state consequences. Benchmarking only the final answer misses the semantic object that matters most in agent settings: the trajectory. Datasets and benchmarks should incorporate standardized provenance tags such as autonomous, human-steered, mixed, institutionally curated, or unknown, because provenance changes the meaning of predictions and the trust that should be placed in generated evidence. (2) Delegated-agent discourse. Public agent communication is neither ordinary human discourse nor pure machine telemetry. It constitutes a delegated discourse regime in which the visible speaker may represent an owner, a policy, a platform affordance, or a learned routine. Models of discourse acts, stance, warning, and norm invocation should therefore be extended to mixed-autonomy settings in which authority and responsibility are distributed across humans, agents, and infrastructure. (3) Infrastructure-aware agent NLP. The field needs privacy-preserving agent NLP—minimal-disclosure prompting, redaction-aware retrieval, authorization-sensitive dialogue policies, and evidence traces that preserve auditability without leaking operator context—particularly for local-first assistants interacting with messages, files, and browsers. Multilingual public-agent research is also required: current OpenClaw–Moltbook studies remain overwhelmingly English-first even though early peer-learning observations hint at multilingual interaction [29]. Autonomy-sensitive benchmarks should evaluate not only whether an agent completed a task, but whether completion depended on valid authority, enablement through reusable artifacts, behavioral reach beyond the prompt, or orchestration across tools and peers. Intervention studies are equally important, because current work is still dominated by observation rather than controlled changes to identity systems, moderation policies, tool permissions, or verification cues. A practical next step is to treat reporting standards themselves as part of the research agenda: papers should disclose the platform snapshot observed, the alias mapping used, what counts as an agent account, whether humans could intervene during the observation window, which model or provider stack was involved when relevant, and how provenance uncertainty was handled. Release artifacts should triangulate the same event across semantic, operational, artifact, and grounding views: the surface utterance, any implicated portable artifact, the tool-use or interaction trace that followed, and the timing of any correction or verification. Without this triangulation, the field risks producing parallel literatures that study the same ecosystem while talking past one another. Appendix Table A1 turns this agenda into a compact task map for NLP researchers by specifying units of analysis, candidate labels or metrics, and natural data sources visible in the corpus.

9. Conclusion

This case-centered survey positions the OpenClaw–Moltbook ecosystem as a revealing instance of language infrastructure, where linguistic artifacts shift from static interfaces to executable, persistent, and governance-bearing operational layers. We introduce the GATE taxonomy to categorize these infrastructural roles and the AERO framework to track the delegation of force, identifying a methodological bottleneck in the literature’s limited triangulation of discourse, trajectories, portable artifacts, and grounding signals. This gap produces fault lines around instruction and authority, voice and provenance, visibility and verification, and the risk tradeoffs of local control. We therefore outline an NLP research agenda centered on executable pragmatics, delegated-agent discourse analysis, and provenance-aware evaluation to support rigorous study of public agent ecosystems.

Limitations

This paper is a survey of a moving target. First, the ecosystem is evolving rapidly. Our corpus is anchored to an explicit freeze date (2026-03-10), but the OpenClaw–Moltbook literature is growing fast enough that new papers, dataset releases, or major platform changes can alter the balance of evidence shortly after submission. The survey should therefore be read as a time-bounded synthesis rather than a permanently settled map.

Second, the evidence base is heterogeneous. The direct corpus includes preprints, technical reports, design-science papers, and gray literature alongside more conventional research outputs. That mix is methodologically appropriate for an emerging topic, but it means evidentiary strength is uneven. We mitigate this by separating direct evidence from official/platform context and adjacent framing, yet tiering cannot remove all uncertainty about quality, maturity, or future revision after peer review.

Third, corpus construction remains judgment-laden. Alias drift, platform-specific naming, disappearing links, versioned documentation, and cross-posted preprints make complete retrieval difficult. Our inclusion rules are explicit, but no search strategy can guarantee exhaustive capture in a field that is still naming itself in public. The same difficulty applies to boundary cases such as dataset archives, conceptual essays, or security notes that are strongly relevant to the ecosystem without functioning like standard empirical papers.

Fourth, OpenClaw is a strategically revealing case, not a universal proxy for all public agent ecosystems. Future ecosystems may differ in language mix, governance design, openness, economic incentives, moderation, identity architecture, or degree of human steering. The general lessons we draw are therefore best read as hypotheses and design principles for public agent ecosystems, grounded in this case, rather than as a complete theory of every future platform.

Fifth, this is not a quantitative meta-analysis. The primary studies vary too much in evidence unit, sampling frame, and outcome definition for straightforward pooling. Our synthesis is qualitative and conceptual. It identifies recurrent tensions, evidence gaps, and methodological patterns, but it cannot make high-confidence aggregate causal claims about effect sizes, prevalence, or platform-wide behavioral totals.

Sixth, the current literature is still substantially English-first. That bias affects both what gets studied and what appears generalizable. Cross-lingual behavior, translation-mediated attacks, multilingual norms, and non-English public-agent discourse are all underexplored. Some of the survey’s broader claims may therefore reflect the present language distribution of the literature as much as the underlying ecosystem.

Finally, any future artifact release from this project will itself be constrained by privacy and safety. We can release coding metadata, bibliographic structure, and high-level annotations more easily than raw trace dumps. This is a limitation for perfect reproducibility, but it is also a necessary condition of responsible release in a setting where public visibility does not eliminate risks of re-identification, context collapse, or harmful reuse.

Ethical Considerations

This paper studies a fast-moving ecosystem that combines public data, mixed autonomy, and security-sensitive behavior. We therefore do not treat public traces as unproblematic ground truth or as a free-for-all research substrate. Part of the paper’s central argument is that provenance, verification timing, hidden human intervention, platform affordances, and deployment boundaries materially affect interpretation.

We follow a minimal-disclosure principle. The survey synthesizes already public and citable materials, but it avoids republishing leaked credentials, private-message contents, or operational exploit detail beyond what is necessary to discuss the research questions and what is already public in the cited sources. If the structured corpus is released, it should prioritize bibliographic metadata, coding labels, and carefully bounded excerpts over raw dumps of platform content.

We also avoid anthropomorphic overclaiming. Public agent ecosystems can invite overly strong narratives about autonomous community formation or stable machine speakerhood. Because visible behavior may reflect owners, platform defaults, or mixed human-agent control, we deliberately separate observed discourse from claims about underlying autonomy. That is both a scientific and ethical choice: over-attributing agency can distort accountability and misrepresent the human role in the system.

The survey also has dual-use implications. Work on prompt injection, skill supply chains, guardrail bypass, browser exposure, and identity flows can inform defenses, but it can also be misused. We therefore focus on analytical lessons, failure modes, and evaluation implications rather than on maximizing operational detail. Our goal is to strengthen public-agent research practice without amplifying attack utility.

Finally, survey papers shape field narratives. In a young area, a synthesis can confer legitimacy, freeze terminology, or make one interpretation appear more settled than it really is. We therefore separate direct evidence from contextual framing, mark uncertainty about provenance and verification, and avoid presenting unresolved interpretations as consensus. Responsible surveying in this area means not only collecting sources, but also managing what kinds of confidence the survey itself encourages.

Acknowledgments

This research is supported by the RIE2025 Industry Alignment Fund (Award I2301E0026) and the Alibaba-NTU Global e-Sustainability CorpLab.

Abbreviations

The following abbreviations are used in this manuscript:

MDPI	Multidisciplinary Digital Publishing Institute
DOAJ	Directory of open access journals
TLA	Three letter acronym
LD	Linear dichroism

Appendix A PRISMA-Inspired Review Workflow

Because this field is preprint-heavy, platform-defined, and still naming itself in public, the review protocol has to document more than a list of venues. Figure A1 therefore summarizes the workflow as a PRISMA-inspired multivocal process: source-family identification, alias normalization, relevance screening, eligibility coding, and tiered inclusion. The diagram is intentionally process-centric rather than venue-centric, because the central reproducibility question in this domain is how one moved from a noisy, alias-rich public record to a stable direct corpus. The companion tables below make the same process numeric: Table A2 reports exact stage-wise counts and reason-coded exclusions, while Table A3 logs the closest boundary cases or merged records.

Figure A1. PRISMA-inspired multivocal review workflow. Because the ecosystem formed through preprints, platform materials, and technical notes, the review uses alias normalization, tiered evidence handling, and direct/context separation rather than venue-only selection. Table A2 gives the auditable numeric flow and reason-coded exclusions.

Appendix B Auditable Screening Ledger and Borderline Exclusions

This appendix section makes the review protocol inspectable rather than merely narrative. It first turns the main-text agenda into a compact operational task map, then reports the stage-wise screening counts and the closest scope calls or merged records.

Appendix B.1 Operational NLP Task Map

To keep the research agenda concrete, Appendix Table A1 translates the main-text agenda into candidate NLP tasks, units of analysis, labels or metrics, and plausible data sources.

Table A1. Operational NLP task map derived from the survey. The aim is to convert the agenda into concrete tasks, units, labels, metrics, and candidate data sources rather than leaving it at the level of themes.

Task	Unit of analysis	Possible labels / metrics	Candidate data source
Executable-pragmatics evaluation	trajectory step, tool call, complete episode	authority-valid vs. invalid trigger, permission-scope match, reversibility, unsafe state change, repair cost, trajectory success under policy	trajectory audits, PASB-style scenarios, governed tool-call traces (26,27,34,63)
Delegated discourse-act modeling	post, reply, thread	warning, request, self-presentation, norm invocation, uptake, stance, broadcast vs. sustained dialogue depth	Moltbook posts and reply chains, norm-enforcement responses, discourse-structure corpora (35,68,73,80)
Provenance-aware autonomy labeling	post, account, event, verification episode	autonomous, human-steered, mixed, institutionally curated, unknown; verification lag; provenance confidence	provenance audits, oversight studies, delayed verification cases, developer metadata (6,31,54,55)
Privacy-sensitive logging and redaction	prompt span, retrieved chunk, browser step, tool log segment	secret-bearing span, personal-context leak, policy-compliant redaction, audit sufficiency, minimal-disclosure score	PASB scenarios, browser/runtime traces, local-first assistant settings (7,27,49,53)
Cross-lingual norm and attack transfer	paired post, translated thread, skill/tutorial artifact	translation drift, cross-lingual norm alignment, attack transfer success, multilingual provenance ambiguity	multilingual peer-learning streams, translated skills/tutorials, public skill archives (29,37,66)
Autonomy-sensitive benchmark suites	complete episode, intervention event, platform snapshot	task success with valid authority, orchestration depth, reach beyond current turn, post-hoc correction cost, artifact reuse dependence	joined discourse + trajectory + artifact corpora released with source-level metadata and evidence-alignment fields (Table A10)

Appendix B.2 Auditable Screening Ledger

The ledger below reports stage-wise counts and reason-coded exclusions in an audit-friendly format.

Table A2. Auditable screening ledger for the multivocal review. Counts are reported as whole numbers.

Stage	Count	How records entered or left	Audit note
Alias-based scholarly identification	44	scholar search over OpenClaw / Clawdbot / Clawd / Moltbot / Moltbook / ClawdLab and thematic terms	retained candidate papers, reports, and technical notes
Backward / forward snowballing	15	references and citations from early February and March work	used to recover late-linked direct and adjacent sources
Official / platform materials	16	documentation, blogs, repositories, developer sites, dataset landing pages	screened separately because used as contextual or grounding evidence
Project-layer repositories	10	skill hubs, deployment packages, connectors, training / attestation extensions	retained only if citable and ecosystem-relevant
Records identified before de-duplication	85	union of all source families before version merging	counts mirrored or superseded records separately
Records after de-duplication and version merging	83	merged obvious duplicates, superseded versions, and duplicate bibliographic records	keep one canonical record per source for screening
Title / abstract / metadata screened	83	quick scope screen for ecosystem centrality and retrievability	excludes lightweight commentary and clearly indirect mentions
Excluded at title / abstract / metadata stage	2	removed before full-text coding	reasons recorded in screening log
Full texts assessed for eligibility	81	sources read for direct/context role, evidence unit, and tier eligibility	basis for inclusion/exclusion and coding
Excluded: not ecosystem-primary object (E1)	1	ecosystem appears only rhetorically or peripherally	not used for direct synthesis
Excluded: lightweight commentary / rhetorical mention (E2)	0	commentary lacks substantive empirical, technical, or methodological content	not strong enough for multivocal synthesis
Excluded: insufficient retrievable technical detail (E3)	0	source could not support auditable claims because evidence or versioning was too thin	may be revisited in future updates
Excluded: duplicate or superseded version (E4)	0	later or cleaner version merged under canonical record	protects against double counting
Excluded: unavailable or unstable source (E5)	0	disappeared link, unstable landing page, or insufficiently citable archival state	logged but not cited
Excluded: other scope mismatch (E6)	1	adjacent but outside survey boundary	listed in borderline log when close to inclusion threshold
Included in direct qualitative synthesis	38	ecosystem-specific papers and reports	core direct corpus
Included as official / project / adjacent context	41	official/platform sources, project repositories, adjacent framing and survey sources	contextual layer kept separate from direct evidence
Paper-wide source inventory	79	direct + contextual included records	final included source inventory at freeze date

Appendix B.3 Borderline or Merged Records

Table A3 records the closest boundary cases and the duplicate-export merges resolved before direct-synthesis coding.

Table A3. Borderline or merged records from the working bibliography. Duplicate merges were resolved before direct-synthesis coding; scope exclusions are shown separately to keep merge handling distinct from full-text exclusion counts.

Record or merge case	Category	Status	Decision	Explanation
duplicate metadata export of `weidener2026openclaw`	duplicate bibliographic record	merged pre-screen	merged	multiple bibliography exports of the same arXiv record were collapsed under one canonical key to avoid double counting and stale metadata
duplicate metadata export of `Zhang2026FromTT`	duplicate bibliographic record	merged pre-screen	merged	multiple bibliography exports of the same work were collapsed under one canonical key for consistency across citations and metadata fields
`su2025survey`	broad survey backdrop	outside direct scope	excluded	relevant to general agent-security framing, but not specific enough to public agent ecosystems once closer adjacent surveys were included
`de2026openclaw`	system proposal	not ecosystem-primary	excluded from direct synthesis	mentions OpenClaw but is not primarily a direct observational or methodological study of the OpenClaw–Moltbook ecosystem as defined here

Appendix C Search Details, Coding Scheme, and Source Tiers

The appendix is intentionally more explicit than the main text because the review protocol is itself part of the contribution. In fast-moving public-agent research, the key reproducibility question is not only which sources were cited, but how sources were classified, which ones grounded direct claims, and where uncertainty about authorship, deployment, or naming drift entered the interpretation.

The search used alias-based strings combining OpenClaw, Clawdbot, Clawd, Moltbot, Moltbook, and ClawdLab with terms such as safety, attack, social network, skill, privacy, governance, identity, provenance, and dataset. Snowballing from early February papers added March work on datasets, discourse structure, governance layers, deployment security, and the surrounding project layer. We also normalized obvious alias drift and separated citable official materials and repositories from direct empirical studies rather than flattening them into a single undifferentiated bibliography.

Each included work was coded during drafting for dominant evidence unit, evidence-alignment profile, triangulation class, dominant GATE layer, AERO role(s), primary object, and source tier. The goal was not to force single-label agreement everywhere, but to identify each work’s center of gravity while preserving cross-layer connections in the synthesis. This process helped separate disagreements caused by substantive contradiction from disagreements caused by evidence-type mismatch. The appendix reports the working codebook and source-tier scheme used for the synthesis (Table A6 and Table A7).

Table A4. Alias mapping used for consistent exposition.

Alias	Use in this survey
Clawd	Early project lineage in official materials.
Clawdbot	Early public/project alias retained in some papers.
Moltbot	Intermediate alias retained in some discourse and technical notes.
OpenClaw	Canonical runtime/framework name used in the main exposition.
Moltbook	Public agent-native social network around the runtime.
ClawdLab	Downstream design response centered on autonomous research.

Table A5. Dominant-layer coding of the direct corpus (38 works). Multi-label GATE annotations were used for synthesis; counts here reflect a single dominant layer per work for summary purposes.

Dominant GATE layer	n	Dominant evidence units	Recurring blind spots
Grounding	8	provenance audits, oversight discourse, security models, identity/auth artifacts	attribution, verification timing, privacy boundary definition, responsibility assignment
Action	9	trajectories, tool logs, incidents, deployment configs	permission grounding, reversibility, repair semantics, action-state attribution
Transfer	5	skills, tutorials, personas, datasets, research workflows	artifact provenance, downstream reuse, contamination, evidence portability
Exchange	16	posts, replies, temporal traces, reply graphs	shallow dialogue, ritualized signaling, audience effects, mixed-autonomy labeling

Table A6. Working codebook used to map the direct corpus. GATE captures what language does; AERO captures how much delegated operational force it acquires.

Axis	Code	Working definition	Typical evidence signals in the corpus
GATE	Grounding	what makes language legitimate, attributable, verifiable, permission-bearing, or responsibility-bearing	provenance audits, verification timing, identity/auth artifacts, policy docs, incident reports
GATE	Action	language that directly changes system state or action selection	trajectories, tool calls, repair loops, incidents, deployment configs
GATE	Transfer	language that packages portable capability, evidence, or reusable workflow knowledge	skills, tutorials, personas, datasets, archives, workflow artifacts
GATE	Exchange	language as public social traffic among agents and observers	posts, replies, reply graphs, temporal traces, discourse-structure signals
AERO	Authority	legitimacy, permission, or speakerhood needed for state-changing action	auth notes, operator boundaries, verification status, provenance claims, trust policies
AERO	Enablement	how language becomes reusable capability through tools, memory, and artifacts	typed tools, skill files, personas, datasets, reusable procedures, peer-learning artifacts
AERO	Reach	how far behavior extends beyond the immediately prompted turn	persistent memory effects, delayed consequences, self-starting activity, long-horizon trajectories
AERO	Orchestration	how behavior coordinates across tools, services, peers, or oversight layers	browser/runtime integration, multi-tool flows, peer coordination, guardrails, governance stacks

Table A7. Source tiers used in the review protocol.

Tier	Role in survey	Representative sources	Use in synthesis
Tier 1	direct ecosystem evidence	Chen et al. [26], Wang et al. [27], Chen et al. [29], Holtz [30], Mukherjee et al. [32], Dubé et al. [35], Manik and Wang [68], Jiang et al. [72]	primary basis for claims about trajectories, discourse, portable artifacts, privacy exposure, grounding, and observed dynamics
Tier 2	official, technical, and project context	Steinberger [1], OpenClaw [3,4], Moltbook [6], OpenClaw [36,38], Wang et al. [42], Moltbook [44], Wiz Research [82], Gautam and Riegler [84]	informs platform assumptions, trust boundaries, deployment posture, skill distribution, connectors, packaging, identity mechanisms, archival resources, and grounding interpretation
Tier 3	adjacent and survey-style framing	Wang et al. [16], Cheng et al. [17], Yan et al. [21], Yu et al. [23], Zhang et al. [25], Weidener et al. [33]	used cautiously to position the survey, connect to broader autonomy debates, and situate research gaps

Appendix D Positioning Within Broader Survey Landscape

Because the paper positions itself as a case-centered survey rather than a general review, it is important to state that boundary explicitly. Table A8 records the closest nearby survey traditions and how this paper differs from them. The point is not to diminish those works; it is to make the contribution boundary explicit.

Table A8. Positioning against adjacent survey literature. This table motivates the paper’s positioning as a dedicated, case-centered, NLP-centered survey, rather than as the first survey-like document to discuss the ecosystem in any form.

Survey	Primary scope	What it covers especially well	Gap relative to this paper
Wang et al. [16]	LLM-based autonomous agents broadly	agent construction, applications, evaluation	not ecosystem-specific and not centered on public agent traces
Cheng et al. [17]	intelligent agents across single- and multi-agent settings	definitions, methods, core components, prospects	broad agent survey rather than a focused public-ecosystem synthesis
Guo et al. [18]	LLM-based multi-agent systems	progress, challenges, benchmarks, communication, application domains	not anchored in one public ecosystem with observable mixed-autonomy traces
Chen et al. [19]	recent advances in LLM-MAS	applications, frontiers, broad systems-level organization	emphasizes application frontiers more than provenance-rich ecosystem analysis
Tran et al. [20]	collaboration mechanisms in LLM-based MAS	actors, structures, strategies, protocols, coordination	collaboration-centric rather than language-infrastructure-centric
Yan et al. [21]	communication-centric LLM-MAS survey	communication architectures, paradigms, security and scale challenges	not tied to one naturally occurring public agent network
Zou et al. [22]	human-agent collaboration systems	human feedback, interaction patterns, orchestration, benchmarks	human-in-the-loop focus rather than agent-only public ecosystems
Yu et al. [23]	trustworthy agents and multi-agent systems	attacks, defenses, evaluation, modular trust framework	trustworthiness is central, but not the ecology of public traces, evidence transfer, and provenance disputes
Gao et al. [24]	self-evolving agents	what/when/how to evolve, adaptation stages, benchmarks	focuses on continual evolution rather than public ecosystem observation
Zhang et al. [25]	hierarchical autonomy security	layered risks from cognitive to collective autonomy	security-forward autonomy framing, not a dedicated OpenClaw/Moltbook survey
Weidener et al. [33]	OpenClaw–Moltbook lessons plus ClawdLab design	the closest ecosystem-specific precursor; embeds a multivocal review in a design-science response	review is embedded inside a platform proposal, whereas this paper’s primary contribution is the literature synthesis itself from an NLP-centered perspective

Appendix E Review Questions and Extraction Form

The survey was guided by four review questions that also structured the extraction form used during coding.

Table A9. Review questions guiding corpus coding and synthesis.

RQ	Question	How it structures the review
RQ1	What roles does language play in public agent ecosystems?	motivates the GATE taxonomy and the layer-by-layer synthesis
RQ2	How does delegated operational force accumulate across artifacts, tools, and social settings?	motivates the AERO layer and the shift from prompting to delegated autonomy
RQ3	Where do the main empirical and methodological disagreements actually arise?	motivates the focus on recurring fault lines rather than forced consensus
RQ4	What should NLP evaluate, report, and build next in this area?	motivates the agenda on executable pragmatics, provenance, privacy, and public-agent discourse

Appendix F Release-Oriented Corpus Metadata and Reporting Standard

To make the corpus contribution operational rather than rhetorical, Table A10 and Table A11 list the metadata fields that the release should expose and the reporting fields that future public-agent papers should ideally disclose.

Table A10. Suggested metadata fields for the annotated corpus release, including evidence-alignment fields that support the triangulation audit.

Field	Purpose	Illustrative value
canonical_id	stable identifier for each included source	OCMB-2026-017
citation_key	BibTeX key used in the paper	chen2026trajectory
title	human-readable source name	A Trajectory Audit of OpenClaw
first_public_date	supports freeze-date reasoning and temporal analysis	2026-02-15
source_tier	distinguishes direct evidence from context	Tier 1
decision_log_ref	links the included source back to screening and adjudication notes	SCREEN-042
primary_object	indicates runtime, network, extension, or grounding focus	social network
evidence_unit	identifies the central analytic object	reply graph
evidence_alignment_profile	records which evidence families are jointly analyzed for the source	discourse + grounding
triangulation_class	supports quantitative audit of weak evidence alignment	single-family / 2+ families / D+O+G / artifact-linked
dominant_gate_layer	summary layer used in corpus profiling	Exchange
aero_roles	cross-cutting autonomy roles activated by the study	Enablement, Orchestration
alias_mapping	records whether the source uses Clawd, Clawdbot, Moltbot, etc.	OpenClaw / Clawdbot
provenance_caveat	flags hidden-human or verification uncertainty	mixed-autonomy uncertainty noted
notes	free-text synthesis memo for later reuse	compares visible speech with underlying control

Table A11. Recommended reporting fields for future public-agent ecosystem papers.

Recommended disclosure field	Why it matters	What a strong paper should report
platform snapshot and time window	platforms change quickly; findings are version-sensitive	observation window, freeze date, major product/version changes during collection
alias mapping and search terms	naming drift affects corpus construction and comparability	which aliases were used, normalized, or excluded
unit of analysis	claims differ depending on whether the unit is a post, thread, graph, artifact, or trajectory	explicit evidence unit and justification
autonomy / provenance label	visible text can be owner-steered, mixed, or autonomous	autonomous, human-steered, mixed, curated, or unknown labels where possible
identity / verification timing	public interpretation may precede authentication	when verification signals appeared relative to the observed event
model / provider / runtime stack	behavior depends on the stack, not only the prompt	models, providers, typed tools, memory components, browser/runtime configuration
privacy and release constraints	public traces can still expose people or private state	redaction decisions, license/terms considerations, and release limits
evidence alignment availability	triangulation is the main methodological gap	whether surface, operational, artifact, and grounding evidence were jointly available for the same event
coding protocol transparency	interpretive labels need a clear audit trail	coding rulebook, decision log, and any quality-control subset checks used during corpus construction

Appendix G Extended Corpus by Layer

Table A12 lists representative sources beyond the subset discussed at greatest length in the main text.

Table A12. Extended corpus overview by GATE layer and its relation to the AERO layer.

Layer	Subfocus	Representative works	Relation to AERO layer
Grounding	provenance, privacy, oversight, identity, delayed verification	Moltbook [6], Li [31], OpenClaw [49], Shi and DiFranzo [54,55], Zerhoudi et al. [56], Wiz Research [82]	as authority and orchestration grow, attribution, privacy boundaries, and responsibility become harder rather than easier
Action	trajectory risk, local-agent attacks, deployment security	Chen et al. [26], Wang et al. [27], Dong et al. [28], Ge [34], van Beek and Mezo [61], Jin et al. [63], Zhan et al. [64], Du et al. [65]	reach and orchestration expand the space of possible side effects, while authority determines who may legitimately trigger them
Transfer	skills, peer learning, datasets, research workflows	Chen et al. [29], Mukherjee et al. [32], Weidener et al. [33], Chen et al. [66], Amin et al. [67], Jiang et al. [83], Liang et al. [86]	enablement becomes portable when knowledge is packaged into artifacts that support extended reach and reuse
Exchange	discourse, norms, participation, social graphs, sociality critiques	Holtz [30], Dubé et al. [35], Manik and Wang [68], Lin et al. [69], Eziz [70], De Marzo and Garcia [71], Jiang et al. [72], Li et al. [73], Feng et al. [74], Hou and Ji [77], Shekkizhar and Earle [80], Zhang et al. [81]	public visibility can reward reach and cascade formation even when deeper interaction remains limited

Appendix H Comprehensive Corpus Tables

The following tables serve as the appendix-level corpus inventory behind the survey.

Table A13. Grounding, provenance, and responsibility-focused perspectives in the direct corpus.

Work	Unit	Core contribution	GATE / NLP relevance
Moltbook Illusion (31)	provenance analysis	separates human influence from apparent emergence	Grounding; attribution changes what language data means
Human Control Is the Anchor (54)	oversight analysis	examines early divergence of oversight in agent communities	Grounding; visible text and real control can diverge
Delayed Verification (55)	discourse timing	shows how narrative lock-in forms before verification	Grounding; timing matters for benchmark trust
Behind the Prompt (56)	retrieval framing	studies hidden-user intent when agents act as proxies	Grounding/Transfer; relevant to IR and proxy-mediated dialogue
Conversation to Command Execution (57)	threat modeling	contrasts conversational assistants and command-executing agents	Grounding; clarifies why OpenClaw-style agents change risk categories
Devil Behind Moltbook (58)	safety critique	argues that safety claims can vanish in self-evolving societies	Grounding; skeptical lens on emergent norms and alignment
Sorcerer’s Apprentice (59)	conceptual analysis	distinguishes tool-agents from stronger teleological claims	Grounding; useful restraint against over-anthropomorphic reading
Panacea Position (60)	methodological critique	cautions against overclaiming from agent-society observations	Grounding/Exchange; useful counterweight to strong emergence narratives

Table A14. Action, authority, and system-safety studies in the direct corpus.

Work	Unit	Core contribution	GATE / NLP relevance
Trajectory Audit (26)	trajectories	full-trajectory safety auditing for Clawdbot/OpenClaw	Action; treats semantics as action traces rather than text strings
PASB (27)	end-to-end scenarios	benchmarks attacks on personalized local agents	Action; long-horizon security and memory-sensitive evaluation
Clawdrain (28)	skill + tool chain	token-exhaustion attack via tool-calling chains	Action; repair language becomes part of the threat model
LGA (34)	governed tool calls	layered governance architecture evaluated on OpenClaw	Action/Grounding; shifts focus from text safety to execution safety
Hardened Shell (61)	architecture	safety/sovereignty critique of runtime design	Action/Grounding; argues for architectural rather than prompt-only defenses
Formal Skill Security (62)	skills	formal analysis of agent-skill supply chains	Action/Transfer; language-wrapped skills become security-critical artifacts
Proof-of-Guardrail (63)	runtime assurance	attestation for guarded agent runs	Grounding/Action; links textual safety claims to verifiable execution
Edge Attack Surface (64)	deployment architecture	systems-level analysis of boundary failures in edge agents	Action/Grounding; shows safety depends on deployment topology
ClawMobile (65)	mobile architecture	smartphone-native agent runtime design	Action; separates language reasoning from deterministic control

Table A15. Transfer and research-workflow studies in the direct corpus.

Work	Unit	Core contribution	GATE / NLP relevance
Peer Learning (29)	posts + learning cues	frames Moltbook as an AI-only peer-learning environment	Transfer; agents exchange skills and tactics through language
Informal Learners (66)	large-scale discourse	studies agent learning in a broadcast-heavy public environment	Transfer/Exchange; introduces useful discourse concepts for learning-oriented analysis
MoltGraph (32)	temporal dataset	releases a longitudinal graph dataset for detection tasks	Transfer; reusable benchmark and archival resource
Personas on Moltbook (67)	posts/personas	packages agents into reusable behavioral personas	Transfer; relevant to summarization and behavioral abstraction
ClawdLab (33)	literature + system design	connects OpenClaw–Moltbook lessons to autonomous research design	Transfer/AERO; strong orchestration and evidence-grounding angle

Table A16. Exchange and discourse studies, part I.

Work	Unit	Core contribution	GATE / NLP relevance
Risky Sharing (68)	posts + replies	measures action-inducing language and norm-enforcing responses	Exchange; discourse can itself regulate safety
Silicon-Based Societies (69)	platform traces	early large-scale characterization of Moltbook	Exchange; maps topics, communities, and agent behavior in the wild
Fast Response or Silence (70)	thread dynamics	characterizes reply persistence and drop-off	Exchange; interaction structure is shallow but measurable
Collective Behavior (71)	network structure	macro-scale view of emergent collective behavior	Exchange; useful for coordination and inequality framing
A First Look (72)	platform snapshot	descriptive baseline for posts, topics, and subcommunities	Exchange; anchor study for early public discourse
Anatomy of Social Graph (30)	social graph	structural analysis of reply graph formation	Exchange; graph topology complements discourse analysis
Rise of AI Agent Communities (73)	discourse + interaction	large-scale analysis of discourse and interaction	Exchange; combines text and network perspectives
MoltNet (74)	social behavior	network-analytic view of Moltbook interaction	Exchange; supports structural comparison across agents

Table A17. Exchange and discourse studies, part II.

Work	Unit	Core contribution	GATE / NLP relevance
Reddit Comparison (75)	comparative graphs	contrasts Moltbook and Reddit topology	Exchange; warns against naive human-social analogies
Does Socialization Emerge? (76)	behavioral signals	asks whether socialization truly emerges	Exchange/Grounding; separates sociality from surface activity
Structural Divergence (77)	network metrics	measures divergence from human social networks	Exchange; highlights non-human structure beneath familiar interfaces
Let There Be Claws (78)	network snapshot	early social-graph baseline for the platform	Exchange; triangulates early structural findings
Molt Dynamics (79)	temporal graph + roles	role specialization, cascades, weak cooperation	Exchange/AERO; adds longitudinal and coordination perspective
Interaction Theater (80)	comments at scale	argues that visible interaction can mask weak semantic coupling	Exchange; directly motivates deeper discourse-act modeling
What Do AI Agents Talk About? (35)	discourse structure	topic, emotion, formulaicity, and coherence analysis at scale	Exchange; shows ritualization and emotional redirection in AI-to-AI discourse
Agents in the Wild (81)	mixed-method critique	cautions against over-reading apparent sociality	Exchange/Grounding; foregrounds interpretive caution

Appendix I Broader Mission-Level Relevance Beyond the Case

Although this paper is anchored in one ecosystem, its implications are broader. Table A18 summarizes how the case speaks to larger mission-level questions for NLP.

Table A18. Broader mission-level relevance beyond the case itself.

Broader mission for NLP	How public agent ecosystems sharpen it	How this survey contributes
From models to systems and ecosystems	language use becomes inseparable from tools, services, identities, and public communities	provides a language-infrastructure lens for studying that shift through a concrete public case
Rethinking progress and evaluation	final-answer metrics are insufficient when language has state-changing consequences	argues for executable pragmatics, provenance-aware evaluation, and triangulated evidence
Data as bottleneck and responsibility	public traces are portable but also privacy-, contamination-, and verification-laden	distinguishes visibility from legitimacy and proposes minimal-disclosure archival practice
LLMs as research tools and infrastructure	agents increasingly support research workflows, evidence gathering, and knowledge packaging	situates peer-learning artifacts, datasets, and ClawdLab-style designs inside a broader research-infrastructure agenda
Discourse and pragmatics beyond single-user prompting	public agent speech raises new questions about stance, speakerhood, norm invocation, and audience design	frames delegated-agent discourse as a new NLP problem rather than a special case of social-media mining

Appendix J Adjacent Perspectives Beyond the Direct Corpus

The following works are not central empirical OpenClaw/Moltbook studies, but they sharpen the survey’s interpretation of autonomy, safety, collaboration, or deployment.

Table A19. Adjacent perspectives that informed interpretation but are not weighted like direct ecosystem evidence.

Work	Perspective	Why included	Link back to GATE / AERO
Autonomous-agent baselines (16,17)	broad agent surveys	give the larger agent backdrop against which OpenClaw appears unusually public and provenance-rich	clarify that this paper is ecosystem-specific rather than a generic agent survey
MAS survey baselines (18,19,20,21)	multi-agent collaboration and communication	sharpen what is generic to MAS and what is distinctive about public agent ecosystems	especially helpful for the Exchange and Orchestration dimensions
Human-agent and trust surveys (22,23,25)	collaboration and autonomy-aware security	supply broader frameworks for human oversight, trust, and layered risk	support the claim that risk grows with delegated authority, reach, and orchestration
Self-evolving agents (24)	continual adaptation	highlights how agents may adapt over time rather than remain static assistants	resonates with Reach and long-horizon transfer of procedures and policies
Agentic Skills / SkillNet (83,86)	skill abstraction	conceptualizes skills beyond bare tool use	clarifies enablement and orchestration as reusable language-mediated capability
Observatory Archive and AI-for-science context (84,85)	archival and research-infrastructure framing	connect OpenClaw-style traces to reproducible archival practice and scientific workflow design	show how transfer artifacts can become research infrastructure rather than one-off evidence
Project ecosystem repositories (36,37,38,39,40,41,42,43,44,45)	packaging, skills, connectors, deployment, training	show that OpenClaw/Moltbook claims are materializing as public infrastructure rather than papers alone	especially relevant to Transfer, Action, and Orchestration

References

Steinberger, P. Introducing OpenClaw. OpenClaw Blog, 2026.
OpenClaw. OpenClaw: Personal AI Assistant. GitHub repository, 2026.
OpenClaw. OpenClaw Docs Homepage. Documentation website, 2026.
OpenClaw. Tools (OpenClaw). Documentation website, 2026.
Moltbook. Moltbook. Official website, 2026.
Moltbook. Build Apps for AI Agents. Developer website, 2026.
OpenClaw. Browser (OpenClaw-managed). Documentation website, 2026.
Nakano, R.; Hilton, J.; Balaji, S.; Wu, J.; Ouyang, L.; Kim, C.; Hesse, C.; Jain, S.; Kosaraju, V.; Saunders, W.; et al. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332 2021. [CrossRef]
Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.R.; Cao, Y. React: Synergizing reasoning and acting in language models. In Proceedings of the The eleventh international conference on learning representations, 2022.
Schick, T.; Dwivedi-Yu, J.; Dessì, R.; Raileanu, R.; Lomeli, M.; Hambro, E.; Zettlemoyer, L.; Cancedda, N.; Scialom, T. Toolformer: Language models can teach themselves to use tools. Advances in neural information processing systems 2023, 36, 68539–68551.
Qin, Y.; Liang, S.; Ye, Y.; Zhu, K.; Yan, L.; Lu, Y.; Lin, Y.; Cong, X.; Tang, X.; Qian, B.; et al. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789 2023. [CrossRef]
Li, G.; Hammoud, H.; Itani, H.; Khizbullin, D.; Ghanem, B. Camel: Communicative agents for" mind" exploration of large language model society. Advances in neural information processing systems 2023, 36, 51991–52008.
Hong, S.; Zhuge, M.; Chen, J.; Zheng, X.; Cheng, Y.; Wang, J.; Zhang, C.; Wang, Z.; Yau, S.K.S.; Lin, Z.; et al. MetaGPT: Meta programming for a multi-agent collaborative framework. In Proceedings of the The twelfth international conference on learning representations, 2023.
Wu, Q.; Bansal, G.; Zhang, J.; Wu, Y.; Li, B.; Zhu, E.; Jiang, L.; Zhang, X.; Zhang, S.; Liu, J.; et al. Autogen: Enabling next-gen LLM applications via multi-agent conversations. In Proceedings of the First conference on language modeling, 2024.
Park, J.S.; O’Brien, J.; Cai, C.J.; Morris, M.R.; Liang, P.; Bernstein, M.S. Generative agents: Interactive simulacra of human behavior. In Proceedings of the Proceedings of the 36th annual acm symposium on user interface software and technology, 2023, pp. 1–22. [CrossRef]
Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A survey on large language model based autonomous agents. Frontiers of Computer Science 2024, 18, 186345. [CrossRef]
Cheng, Y.; Zhang, C.; Zhang, Z.; Meng, X.; Hong, S.; Li, W.; Wang, Z.; Wang, Z.; Yin, F.; Zhao, J.; et al. Exploring large language model based intelligent agents: Definitions, methods, and prospects. arXiv preprint arXiv:2401.03428 2024. [CrossRef]
Guo, T.; Chen, X.; Wang, Y.; Chang, R.; Pei, S.; Chawla, N.V.; Wiest, O.; Zhang, X. Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680 2024. [CrossRef]
Chen, S.; Liu, Y.; Han, W.; Zhang, W.; Liu, T. A survey on llm-based multi-agent system: Recent advances and new frontiers in application. arXiv preprint arXiv:2412.17481 2024. [CrossRef]
Tran, K.T.; Dao, D.; Nguyen, M.D.; Pham, Q.V.; O’Sullivan, B.; Nguyen, H.D. Multi-agent collaboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322 2025. [CrossRef]
Yan, B.; Zhou, Z.; Zhang, L.; Zhang, L.; Zhou, Z.; Miao, D.; Li, Z.; Li, C.; Zhang, X. Beyond self-talk: A communication-centric survey of llm-based multi-agent systems. arXiv preprint arXiv:2502.14321 2025. [CrossRef]
Zou, H.P.; Huang, W.C.; Wu, Y.; Chen, Y.; Miao, C.; Nguyen, H.; Zhou, Y.; Zhang, W.; Fang, L.; He, L.; et al. Llm-based human-agent collaboration and interaction systems: A survey. arXiv preprint arXiv:2505.00753 2025. [CrossRef]
Yu, M.; Meng, F.; Zhou, X.; Wang, S.; Mao, J.; Pan, L.; Chen, T.; Wang, K.; Li, X.; Zhang, Y.; et al. A survey on trustworthy llm agents: Threats and countermeasures. In Proceedings of the Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, 2025, pp. 6216–6226. [CrossRef]
Gao, H.a.; Geng, J.; Hua, W.; Hu, M.; Juan, X.; Liu, H.; Liu, S.; Qiu, J.; Qi, X.; Wu, Y.; et al. A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence. arXiv preprint arXiv:2507.21046 2025. [CrossRef]
Zhang, X.; Zhou, L.; Xu, X.; Wu, J.; Du, T.; Huang, H.; Peng, H.; Liu, Z. From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents. 2026. [CrossRef]
Chen, T.; Liu, D.; Hu, X.; Yu, J.; Wang, W. A Trajectory-Based Safety Audit of Clawdbot (OpenClaw). arXiv preprint arXiv:2602.14364 2026. [CrossRef]
Wang, Y.; Xu, F.; Lin, Z.; He, G.; Huang, Y.; Gao, H.; Niu, Z.; Lian, S.; Liu, Z. From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent. arXiv preprint arXiv:2602.08412 2026. [CrossRef]
Dong, B.; Feng, H.; Wang, Q. Clawdrain: Exploiting Tool-Calling Chains for Stealthy Token Exhaustion in OpenClaw Agents. arXiv preprint arXiv:2603.00902 2026. [CrossRef]
Chen, E.; Guan, C.; Elshafiey, A.; Zhao, Z.; Zekeri, J.; Shaibu, A.E.; Prince, E.O. When OpenClaw AI Agents Teach Each Other: Peer Learning Patterns in the Moltbook Community. arXiv preprint arXiv:2602.14477 2026. [CrossRef]
Holtz, D. The anatomy of the Moltbook social graph. arXiv preprint arXiv:2602.10131 2026. [CrossRef]
Li, N. The Moltbook Illusion: Separating Human Influence from Emergent Behavior in AI Agent Societies. arXiv preprint arXiv:2602.07432 2026. [CrossRef]
Mukherjee, K.; Akcora, C.G.; Kantarcioglu, M. MoltGraph: A Longitudinal Temporal Graph Dataset of Moltbook for Coordinated-Agent Detection. arXiv preprint arXiv:2603.00646 2026. [CrossRef]
Weidener, L.; Brkić, M.; Jovanović, M.; Singh, R.; Ulgac, E.; Meduri, A. OpenClaw, Moltbook, and ClawdLab: From Agent-Only Social Networks to Autonomous Scientific Research. arXiv preprint arXiv:2602.19810 2026. [CrossRef]
Ge, Y. Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice. 2026. [CrossRef]
Dubé, T.; Zhu, J.; Phan, N.T.; Jin, R. What Do AI Agents Talk About? Emergent Communication Structure in the First AI-Only Social Network. 2026. [CrossRef]
OpenClaw. ClawHub: Skill Directory for OpenClaw, 2026.
OpenClaw. OpenClaw Skills Archive, 2026.
OpenClaw. ACPX: Headless CLI Client for Stateful Agent Client Protocol Sessions, 2026.
OpenClaw. nix-openclaw, 2026.
OpenClaw. OpenClaw Ansible Installer, 2026.
Grasl, T. OpenClaw MCP Server, 2026.
Wang, Y.; Chen, X.; Jin, X.; Wang, M.; Yang, L. OpenClaw-RL, 2026.
Jin, X.; Duan, M.; Lin, Q.; Chan, A.; Chen, Z.; Du, J.; Ren, X. Verifiable-ClawGuard, 2026.
Moltbook. Moltbook Web, 2026.
Moltbook. moltbook-api, 2026.
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. bmj 2021, 372. [CrossRef]
Garousi, V.; Felderer, M.; Mäntylä, M.V. Guidelines for including grey literature and conducting multivocal literature reviews in software engineering. Information and software technology 2019, 106, 101–121. [CrossRef]
Peter Steinberger, J.O.; Quintero, B. OpenClaw Partners with VirusTotal for Skill Security. OpenClaw Blog, 2026.
OpenClaw. OpenClaw Security. Documentation website, 2026.
OpenClaw. OpenClaw Security Policy. GitHub security overview, 2026.
OpenClaw. Formal Verification (Security Models). Documentation website, 2026.
OpenClaw. OpenClaw Threat Model v1.0. Documentation website, 2026.
OpenClaw. Local Models. Documentation website, 2026.
Shi, H.; DiFranzo, D. Human Control Is the Anchor, Not the Answer: Early Divergence of Oversight in Agentic AI Communities. arXiv preprint arXiv:2602.09286 2026. [CrossRef]
Shi, H.; DiFranzo, D. When Visibility Outpaces Verification: Delayed Verification and Narrative Lock-in in Agentic AI Discourse. arXiv preprint arXiv:2602.11412 2026. [CrossRef]
Zerhoudi, S.; Granitzer, M.; Dang, D.H.; Mitrovic, J.; Lemmerich, F.; Hautli-Janisz, A.; Katzenbeisser, S.; Dastidar, K.G. Behind the Prompt: The Agent-User Problem in Information Retrieval. arXiv preprint arXiv:2603.03630 2026. [CrossRef]
Mathew, D.A. From Conversation to Command Execution: A Comparative Threat Modeling and Risk Analysis of OpenClaw and ChatGPT. ISRG Journal of Engineering and Technology 2026. Zenodo record, . [CrossRef]
Wang, C.; Li, C.; Liu, S.; Chen, Z.; Hou, J.; Qi, J.; Li, R.; Zhang, L.; Ye, Q.; Liu, Z.; et al. The devil behind moltbook: Anthropic safety is always vanishing in self-evolving AI societies. arXiv preprint arXiv:2602.09877 2026. [CrossRef]
Ruffini, G.; Castaldo, F. From The Sorcerer’s Apprentice to Crystal Nights: Security Implications from Moltbot/Moltbook to Greg Egan’s Crystal Nights, 2026. Zenodo publication, . [CrossRef]
Li, Y.; Tao, D. Position: AI Agents Are Not (Yet) a Panacea for Social Simulation. arXiv preprint arXiv:2603.00113 2026. [CrossRef]
van Beek, J.B.; Mezo, D. The Hardened Shell: Evaluating Safety and Sovereignty in the OpenClaw Agent Architecture. Technical report, Zenodo, 2026. [CrossRef]
Bhardwaj, V.P. Formal Analysis and Supply Chain Security for Agentic AI Skills. arXiv preprint arXiv:2603.00195 2026. [CrossRef]
Jin, X.; Duan, M.; Lin, Q.; Chan, A.; Chen, Z.; Du, J.; Ren, X. Proof-of-Guardrail in AI Agents and What (Not) to Trust from It. 2026.
Zhan, Z.; Li, K.; Zhang, Y.; Haddadi, H. Systems-Level Attack Surface of Edge Agent Deployments on IoT. arXiv preprint arXiv:2602.22525 2026. [CrossRef]
Du, H.; Wu, S.; Li, Q.; Pan, R.; Li, J.; Sun, Y.; Xue, C.J. ClawMobile: Rethinking Smartphone-Native Agentic Systems. arXiv preprint arXiv:2602.22942 2026. [CrossRef]
Chen, E.; Guan, C.; Elshafiey, A.; Zhao, Z.Q.; Zekeri, J.; Shaibu, A.E.; Prince, E.O.; Wu, C.J. OpenClaw AI Agents as Informal Learners at Moltbook: Characterizing an Emergent Learning Community at Scale. 2026. [CrossRef]
Amin, D.; Salminen, J.; Jansen, B.J. How to Model AI Agents as Personas?: Applying the Persona Ecosystem Playground to 41,300 Posts on Moltbook for Behavioral Insights. arXiv preprint arXiv:2603.03140 2026. [CrossRef]
Manik, M.M.H.; Wang, G. OpenClaw Agents on Moltbook: Risky Instruction Sharing and Norm Enforcement in an Agent-Only Social Network. arXiv preprint arXiv:2602.02625 2026. [CrossRef]
Lin, Y.Z.; Shih, B.P.J.; Chien, H.Y.A.; Satam, S.; Pacheco, J.H.; Shao, S.; Salehi, S.; Satam, P. Exploring Silicon-Based Societies: An Early Study of the Moltbook Agent Community. arXiv preprint arXiv:2602.02613 2026. [CrossRef]
Eziz, A. Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network. arXiv preprint arXiv:2602.07667 2026. [CrossRef]
De Marzo, G.; Garcia, D. Collective Behavior of AI Agents: the Case of Moltbook. arXiv preprint arXiv:2602.09270 2026. [CrossRef]
Jiang, Y.; Zhang, Y.; Shen, X.; Backes, M.; Zhang, Y. "Humans welcome to observe": A First Look at the Agent Social Network Moltbook. arXiv preprint arXiv:2602.10127 2026. [CrossRef]
Li, L.; Ma, R.; Chen, C.; Lu, Z.; Zhang, Y. The Rise of AI Agent Communities: Large-Scale Analysis of Discourse and Interaction on Moltbook. arXiv preprint arXiv:2602.12634 2026. [CrossRef]
Feng, Y.; Huang, C.; Man, Z.; Tan, R.; Hoang, L.P.; Xu, S.; Zhang, W. MoltNet: Understanding Social Behavior of AI Agents in the Agent-Native MoltBook. arXiv preprint arXiv:2602.13458 2026. [CrossRef]
Zhu, Y.; Tyson, G.; Hui, P. A Comparative Analysis of Social Network Topology in Reddit and Moltbook. arXiv preprint arXiv:2602.13920 2026. [CrossRef]
Li, M.; Li, X.; Zhou, T. Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook. arXiv preprint arXiv:2602.14299 2026. [CrossRef]
Hou, W.; Ji, Z. Structural Divergence Between AI-Agent and Human Social Networks in Moltbook. arXiv preprint arXiv:2602.15064 2026. [CrossRef]
Price, H.; AlMuhanna, H.; Bassani, P.; Ho, M.; Evans, T. Let There Be Claws: An Early Social Network Analysis of AI Agents on Moltbook. arXiv preprint arXiv:2602.20044 2026. [CrossRef]
Yee, B.; Sharma, K. Molt Dynamics: Emergent Social Phenomena in Autonomous AI Agent Populations. arXiv preprint arXiv:2603.03555 2026. [CrossRef]
Shekkizhar, S.; Earle, A. Interaction Theater: A case of LLM Agents Interacting at Scale. arXiv preprint arXiv:2602.20059 2026. [CrossRef]
Zhang, Y.; Mei, K.; Liu, M.; Wang, J.; Metaxas, D.N.; Wang, X.; Hamm, J.; Ge, Y. Agents in the Wild: Safety, Society, and the Illusion of Sociality on Moltbook. arXiv preprint arXiv:2602.13284 2026. [CrossRef]
Wiz Research. Hacking Moltbook: The AI Social Network Any Human Can Control. Wiz blog, 2026.
Jiang, Y.; Li, D.; Deng, H.; Ma, B.; Wang, X.; Wang, Q.; Yu, G. SoK: Agentic Skills–Beyond Tool Use in LLM Agents. arXiv preprint arXiv:2602.20867 2026. [CrossRef]
Gautam, S.; Riegler, M.A. Moltbook Observatory Archive. Hugging Face dataset, 2026.
Hartung, T. AI, agentic models and lab automation for scientific discovery—the beginning of scAInce. Frontiers in Artificial Intelligence 2025, 8, 1649155. [CrossRef]
Liang, Y.; Zhong, R.; Xu, H.; Jiang, C.; Zhong, Y.; Fang, R.; Gu, J.C.; Deng, S.; Yao, Y.; Wang, M.; et al. SkillNet: Create, Evaluate, and Connect AI Skills. arXiv preprint arXiv:2603.04448 2026. [CrossRef]

Figure 2. Fault lines and research missions. The survey’s central methodological diagnosis is weak triangulation across surface discourse, operational traces, portable artifacts, and grounding signals. For compactness, the figure folds portable artifacts into the operational/artifact stream. That bottleneck repeatedly produces four recurring fault lines, each of which points toward a corresponding NLP research agenda.

Table 1. Evidence-alignment audit behind the triangulation diagnosis. The evidence families are surface discourse, operational traces, portable artifacts, and grounding signals. “Discourse + operational + grounding” marks papers that explicitly align all three within the same study. “Portable artifact + another family” marks studies that analyze skills, datasets, personas, archives, or other portable artifacts together with at least one additional evidence family.

Dominant GATE layer	Single-family only	Two or more families	Discourse + operational + grounding	Portable artifact + another family
Grounding ( $n = 8$ )	5	3	1	1
Action ( $n = 9$ )	3	6	0	2
Transfer ( $n = 5$ )	1	4	0	4
Exchange ( $n = 16$ )	15	1	0	0
All direct studies ( $n = 38$ )	24	14	1	7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.