Submitted:
30 August 2025
Posted:
01 September 2025
You are already at the latest version
Abstract

Keywords:
Highlights
- The Billion Agent Problem: By 2030, billions of AI agents will require coordination—without an OS-level abstraction, we risk computational chaos at unprecedented scale.
- Impossible Today, Essential Tomorrow: Deterministic scheduling for stochastic models and 10 ms HRT guarantees for LLMs may seem unattainable, yet they define the research frontier we must solve.
- From Bare Metal to OS: Agent-OS lifts today’s “bare metal” agent programming into managed, portable applications with guaranteed behaviors and resource isolation.
- Security by Architecture: A zero-trust microkernel ensures no agent executes without contracts, capabilities, and audit trails—making trust enforceable, not optional.
- The Missing Layer: Between foundation models and applications lies a void—Agent-OS defines the substrate to make agents as reliable as today’s cloud services.
1. Introduction
2. Background and Related Work
3. Requirements for an Agent Operating System
3.1. Functional Requirements (FR)
3.2. Non-Functional Requirements (NFR) — Accessible, LLM-Aware
3.3. Latency Classes (Time Semantics)
3.4. Agent Contract (excerpt)
- Real-time enforcement (FR1, NFR7). The declared class.latency selects the scheduler and checks: HRT → EDF/RM with deadline/jitter admission tests; SRT → priority queues, streaming, barge-in; DT → best-effort queues. Unschedulable deployments are rejected at admit time.
- Tools (FR3, FR6).capabilities is the allowlist; everything else is blocked or requires consent in security.consent_for.
- Memory/RAG (FR2).memory fixes namespace, retention, and grounding rules; citations are required when require_grounding=true.
- Models/placement (FR7).modelPolicy constrains model choice and context limits; routing enforces these when composing prompts/completions.
- Compute budgets (FR1/7).compute informs admission control and prevents token/CPU starvation.
- Audit/telemetry (FR5, NFR8).observability ensures prompts, sources, and tool hashes are traceable for audits.
4. Conceptual Architecture
4.1. Design Principles
4.2. Layered Model
(1) User & Application layer
(2) Workflow & Orchestration layer
(3) Agent Runtime layer
(4) Services layer
(5) Kernel layer
4.3. Service Composition and Orchestration of Multi-Agent Applications
How composition works (control plane).
How composition executes (data plane).
Semantics and guarantees.
Agents as services.
Concise example.
4.4. Contract Binding Semantics: Strict, Smooth, and Flexible
Strict binding (reject on mismatch).
Smooth binding (progressive, version-compatible).
Flexible binding (policy-bounded substitutions).
- Normalize request: Merge configuration layers hierarchically (request override → role template → tenant policy → platform defaults).
- Validate hard constraints: Reject immediately if any must requirement fails (e.g., HRT scheduling feasibility, mandated on-premise placement).
- Resolve smooth upgrades: Select highest compatible versions within specified ranges; activate canary deployment with automatic rollback.
- Apply flexible substitutions: Within prefer boundaries, optimize for cost/latency while maintaining SLOs; document reasoning in trace.
- Seal binding: Generate immutable binding manifest containing selected model/tool versions, placements, and SLOs; attach to OpenTelemetry trace.
4.5. Cross-Layer Responsibilities
Security and Governance
Multitenancy
Operations
Interoperability
4.6. End-to-End Flows with LLM Agents Examples
4.7. Deployment Topologies (LLM-Aware)
On the Device (Edge-Only)
A Mix of Device and Cloud (Hybrid)
Completely Offline (Air-Gapped On-Premises)
4.8. Analogy: Traditional OS vs. Agent-OS
5. Enabling the Architecture: Standards and Protocols
5.1. Mapping Protocols to the Agent-OS Layers
- Model Context Protocol (MCP) as the System Call Interface: MCP provides the schema-driven mechanism for agents to invoke external capabilities. In our architecture, it serves as the formal "system call" interface. When an agent in the Agent Runtime needs to perform an action, it formulates an MCP-compliant tool call. This request traverses the Kernel, which validates the call against the agent’s contract, before dispatching it to the Tool Service in the Services Layer for execution. This ensures every action is portable, auditable, and policy-checked.
- Agent-to-Agent (A2A) as the Inter-Process Communication Bus: The A2A protocol standardizes communication between agents, fulfilling the role of Inter-Process Communication (IPC). When the Workflow & Orchestration layer delegates a task from one agent to another, the agents (residing in the Agent Runtime) communicate via the A2A Bus within the Services Layer. This enables complex, multi-agent workflows where agents can negotiate, delegate, and securely exchange context, regardless of their underlying implementation.
- OpenTelemetry (OTel) as the Unified Observability Fabric: OTel provides the standardized framework for tracing, logging, and metrics. It is the backbone of the Observability component in the Services Layer. As a request flows downward from the User Layer through the Kernel and back upward as data, OTel captures a unified trace. This provides the end-to-end lineage crucial for debugging, auditing, and performance analysis, linking user prompts to tool calls and final outputs.
5.2. Architectural Scenarios in Action
- Hard Real-Time (HRT): Autonomous Robotics Coordination: In a smart factory, a robotic arm’s agent must operate with deterministic guarantees. Its Agent Contract specifies latency_class: HRT. The Kernel uses an EDF/RM scheduler to guarantee its 10ms deadline. The robot coordinates with a nearby logistics agent via the A2A Bus to confirm a package’s position. When it needs to actuate its gripper, it issues an MCP-formatted tool call to the Tool Service. Every action is captured in an OTel trace, providing a verifiable audit log for safety-critical operations.
- Soft Real-Time (SRT): Interactive Citizen Services Assistant: A citizen interacts with a city services chatbot. The agent’s contract is set to latency_class: SRT, and the Kernel prioritizes its execution to ensure a sub-250ms time-to-first-token. The agent uses an MCP tool call to the Services Layer to query a knowledge base for permit information. If the query requires information from another department, the Workflow layer orchestrates an A2A message to a specialist agent. The entire interaction, including RAG retrieval timings and tool latencies, is traced with OTel to monitor and maintain the quality of service.
- Delay-Tolerant (DT): City-Scale Planning Analytics: A municipal planning department runs an overnight job to analyze traffic patterns from diverse data sources. The orchestrator spawns multiple agents with latency_class: DT contracts. The Kernel schedules these as low-priority, preemptible tasks to optimize cost. The agents use MCP calls to access data stores and modeling tools in the Services Layer. The final report is compiled and stored, with OTel providing a complete data lineage trace that ensures the analysis is reproducible and auditable.
6. Challenges and Open Research Agenda
- Complexity and Performance Overhead: The structured, multi-layered design, while beneficial for security and modularity, introduces inherent complexity. Each layer of abstraction—from orchestration to kernel-level policy checks—can add latency. This performance penalty is especially problematic for Hard Real-Time (HRT) workloads where deterministic deadlines are non-negotiable, and for Soft Real-Time (SRT) applications where user-perceived lag can render a system unusable. The complexity may also create a steep learning curve, restricting adoption by developers accustomed to more direct, albeit brittle, agent pipelines.
- Ecosystem Fragmentation: The success of the Agent-OS model hinges on interoperability, which is threatened by a fragmented standards landscape. While open protocols like MCP and A2A are gaining traction, they compete with numerous proprietary APIs from major platform vendors. Without clear convergence, the ecosystem risks devolving into walled gardens, forcing developers to build for specific targets and undermining the core goal of agent portability.
- Scalable Governance and Safety: Enforcing security policies, managing consent, and maintaining auditable logs across millions of agents executing billions of tool calls is a monumental challenge. The computational and logistical overhead of such rigorous governance can become prohibitive at scale. Ensuring safety is not merely a matter of policy enforcement but also of defending against novel attack vectors like sophisticated prompt injection and data exfiltration, which require continuous adaptation.
- Stochastic Nature of LLMs: Unlike deterministic software components, large language models operate as probabilistic systems, producing outputs based on statistical inference rather than guaranteed logic. This stochasticity complicates integration with real-time systems, where predictability and bounded latency are paramount. For HRT scenarios, even small variations in token generation speed or output uncertainty can lead to missed deadlines or unsafe actions. Reconciling the non-deterministic behavior of LLMs with the deterministic guarantees required by real-time scheduling remains an open and fundamental research problem.
- Modular and Performant Kernels: Research into a true “microkernel” design for Agent-OS is paramount. This involves identifying the absolute minimal set of primitives for scheduling, policy enforcement, and context management that must reside in the trusted core, pushing all other functionality to efficient, replaceable user-space services. The goal is to minimize overhead on the critical path while maximizing flexibility.
- Verifiable Safety and Security: Moving beyond best-effort security requires developing verifiable safety layers. This involves using formal methods to mathematically prove that an agent’s execution will adhere to its contract and security policies under all conditions. Such guarantees are essential for deploying agents in safety-critical domains like autonomous vehicles and healthcare.
- Standardized Benchmarks: The field urgently needs standardized benchmarks for evaluating Agent-OS implementations. These benchmarks must be multi-faceted, measuring not only task completion but also performance against real-time SLOs (deadline misses, jitter, onset latency), security policy adherence, and cost-efficiency for HRT, SRT, and DT workloads respectively.
- Open Ecosystem Foundations: To combat fragmentation, a concerted effort is needed to build shared ecosystem foundations. This includes contributing to open protocol development, creating shared repositories for certified tools and agents, and establishing common compliance suites to validate that different Agent-OS runtimes are truly interoperable.
7. Conclusions
References
- Authors, U. Fundamentals and Practical Implications of Agentic AI 2025. [arXiv:cs.AI/2505.19443]. Survey on agentic AI, including OS designs for real-time monitoring.
- Authors, U. Large Model Agents: State-of-the-Art, Cooperation Paradigms, and Real-World Applications 2024. [arXiv:cs.AI/2409.14457]. Survey referencing AIOS-like OS for real-time multi-agent cooperation.
- Authors, U. AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways. ACM Computing Surveys 2025. Advocates Agent OS for scalable security in AI agents. [CrossRef]
- Ge, Y.; Ren, Y.; Hua, W.; Xu, S.; Tan, J.; Zhang, Y. LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem 2023. [arXiv:cs.AI/2312.03815].
- Martin, D.L.; Cheyer, A.J.; Moran, D.B. The Open Agent Architecture: A Framework for Building Distributed Software Systems. Applied Artificial Intelligence 1999, 13, 91–128. [Google Scholar] [CrossRef]
- Foundation for Intelligent Physical Agents. FIPA Agent Communication Language Specifications. Technical Report SC00061I, Foundation for Intelligent Physical Agents, 2002. This standard defines the syntax and semantics for agent communication, including message structure and performatives.
- Authors not specified in source. AIOS: LLM Agent Operating System 2024. [arXiv:cs.AI/2403.16971]. Academic architecture published on arXiv [1, 2].
- Authors not specified in source. KAOS: Large Model Multi-Agent Operating System 2024. [arXiv:cs.AI/2406.11342]. Research prototype published on arXiv [2, 3].
- Authors not specified in source. AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant 2024. [arXiv:cs.AI/2410.18603]. Research prototype published on arXiv [2, 3].
- Daglis, A.; Novakovic, S.; Cledat, G.; Olma, W.; Grot, B. Siren: An Operating System for Real-Time Autonomous Systems. In Proceedings of the Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS ’23), New York, NY, USA, 2023; pp. 89–103. [CrossRef]
- Winikoff, M., Jack™ Intelligent Agents: An Industrial Strength Platform. In Multi-Agent Programming: Languages, Platforms and Applications; Bordini, R.H.; Dastani, M.; Dix, J.; El Fallah Seghrouchni, A., Eds.; Springer US: Boston, MA, 2005; pp. 175–193. [CrossRef]
- Authors, U. AgentScope: Multi-Agent Systems Development in Focus. In Proceedings of the Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’11), New York, NY, USA, May 2011; pp. 1–8. Foundational middleware for scalable multi-agent OS.
- Authors not specified in source. MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot 2024. [arXiv:cs.AI/2404.18074]. Research paper published on arXiv [2, 4].
- Authors, U. Planet as a Brain: Towards Internet of AgentSites based on AIOS and Agent Migration 2025. [arXiv:cs.AI/2504.14411]. Extends AIOS to distributed agent migration for scalable architectures.
- Authors, U. Alita: Generalist Agent Enabling Scalable Agentic Reasoning via Autonomous Agent Operating System 2025. [arXiv:cs.AI/2505.20286]. Focuses on autonomous OS for scalable reasoning in generalist agents.
- Zhang.; Others. Eliza: A Web3 Friendly AI Agent Operating System 2025. [arXiv:cs.AI/2501.06781]. Blockchain-integrated Agent OS for decentralized, scalable AI agents.
- Lai, H.; Liu, X.; Others. A Survey on MLLM-based Agents for General Computing Devices Use 2025. [arXiv:cs.AI/2508.04482]. Reviews multimodal agents with OS-level support for real-time devices.
- PwC. PwC launches AI Agent Operating System for enterprises. Press Release, 2025. Commercial orchestration layer launch [7, 8]. For recent enhancements, see https://www.pwc.com/us/en/services/ai/agent-os/agent-os-recent-enhancements.html [7].
- DeepMind, G. Building the Operating System for AI Agents. Developer-focused infrastructure for multi-agent systems [1, 2].
- Microsoft. Get started with Phi Silica in the Windows App SDK. Windows Copilot Runtime and local model for apps [12, 13].
- Developer, A. Integrating actions with Siri and Apple Intelligence. Apple’s approach to formalizing app actions [13, 14].
- Anthropic. Introducing the Model Context Protocol. Anthropic Newsroom. Community-driven initiative for agent tool/data access [15-17].
- (proposed by), G. Agent-to-Agent (A2A) Protocol. Conceptual proposal for standardized inter-agent communication [17].
- Open-source community. OpenTelemetry. Standard for structured tracing, logging, and metrics in AI domains [18, 19].
- Yu, L.; Schmid, B.F. A Conceptual Framework for Agent Oriented and Role Based Workflow Modeling. In Proceedings of the Proceedings of the 34th Annual Hawaii International Conference on System Sciences (HICSS-34). IEEE, 2001. [CrossRef]
- Weyns, D.; Schumacher, M.; Helleboogh, A. Towards a Definition of an Agent-Based Operating System. In Proceedings of the Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS ’05), New York, NY, USA, 2005; pp. 1139–1140. [CrossRef]

| Aspect | LLM as OS | AIOS | Agent-OS (ours) |
|---|---|---|---|
| Nature / Scope | Visionary concept: LLM as kernel; high-level OS analogies. | Academic architecture for LLM agents; efficiency-focused. | Generalized OS for agents: models + tools + environments + HITL. |
| Architecture | Analogies (kernel/memory/ fs/tools); no layered stack. | Three layers: Application / Kernel / Hardware; kernel managers. | Five layers: Kernel; Resource&Service; Agent Runtime; Orchestration&Workflow; User&App; cross-cutting security/governance. |
| Requirements | No formal FR/NFR spec. | Empirical gains reported; no prioritized FR/NFR in public sources. | Prioritized spec (FR, NFR, DR, ER, DEP, GOV) as a portable blueprint. |
| Real-time | Not explicit. | Context/time-aware interrupts; latency reduction, no taxonomy. | Formal HRT/SRT/DT classes; class-tied scheduler/I/O/memory/networking and acceptance tests. |
| HITL & Safety | Human–agent interaction discussed conceptually. | User intervention for risky ops; access manager. | HITL in workflows + console; zero-trust RBAC, capability-scoped tools, encrypted memory, auditable traces. |
| Standards / Protocols | Calls for standards. | Unified calls via SDK inside runtime. | MCP as; A2A for agent IPC |
| Key Components | LLM (kernel), context window (memory), external storage, tools, prompts, agents. | Kernel managers (scheduler /context/ memory/ storage/ tool/ access), LLM core, SDK, apps. | Kernel (scheduler/context/action); services (memory/tool); runtime; orchestration; user layer; security/governance. |
| Classical OS | Role | Agent-OS Analogue |
|---|---|---|
| Process / thread | Schedulable unit | Agent / turn (contract-bound execution) |
| System calls | Kernel entry points | MCP-like tool calls & kernel APIs (checkpoint(), invoke_tool()) |
| IPC / sockets | Inter-process comms | A2A bus (typed messages, conversations) |
| Scheduler (CFS/RT) | CPU time allocation | Class-aware schedulers (EDF/RM, priority, best-effort) |
| Virtual memory | Address spaces | Context/memory service (LLM window, vector/KV stores) |
| File system | Persistent storage | Knowledge store with provenance (RAG artifacts) |
| Device drivers | Hardware adapters | Connectors (web/desktop/IoT/robotics; simulation) |
| Init/systemd | Service orchestration | Workflow engine (DAG/state machine, HITL) |
| Package manager | Install/update | Model/tool catalog (routing, versions, safety tiers) |
| SELinux/AppArmor | Policy enforcement | Policy engine (capabilities, consent, audit) |
| /var/log + tracing | Diagnostics | OTel telemetry + lineage (prompts, tools, sources) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).