Submitted:
10 June 2026
Posted:
11 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- (i)
- A reference architecture that couples a ChatOps front end with a supervisor-orchestrated multi-agent system—specialized domain agents under an LLM supervisor that routes by reasoning and confidence and dispatches workers in parallel—an intelligent query router that arbitrates, by data-sensitivity class, between a locally hosted open-weight model and managed cloud models, a RAG subsystem over a network “source of truth,” and a heterogeneous tool layer spanning inventory, monitoring, control-plane, and configuration-management systems.
- (ii)
- The design of seven production operational use cases, each formulated as an agentic workflow combining planning, function calling, and retrieval, including a backbone link-flapping remediation workflow that fuses data-plane (SNMP-trap) and control-plane (BGP Monitoring Protocol) signals with digital-twin simulation prior to any configuration change.
- (iii)
- An evaluation methodology grounded in measured production data—measured per-event handling times, task frequencies, volumes, user feedback, and token-usage signals from Langfuse traces—together with an analytical model that links the potential MTTR reduction to per-service availability on a representative inter-continental topology.
- (iv)
- A quantitative, case-study demonstration, on real operational data from a major global network, of per-event handling-time reductions by factors ranging from roughly to , with a discussion of the security, safety, generalizability, and token-usage considerations that govern production adoption.
2. Related Work
2.1. Agentic AI Design Patterns
2.2. Retrieval-Augmented Generation and Its Evaluation
2.3. LLMs for Networking and IT Operations
2.4. Positioning of This Work
3. Materials and Methods
3.1. Problem Formulation
3.2. System Architecture
3.3. Multi-Agent Architecture: Supervisor and Specialized Workers
3.4. Model Selection and Deployment
3.5. Tool Integration and Data Sources
3.6. Retrieval-Augmented Subsystem
3.7. Operational Use Cases
3.7.1. Spare Locator
3.7.2. Console Information Retrieval
3.7.3. Packet Loss and Latency Analysis
3.7.4. Control-Plane Path Retrieval
3.7.5. Node-Isolation Detection
3.7.6. Vendor Knowledge-Base Search
3.7.7. Backbone Link-Flapping Detection and Closed-Loop Remediation
3.7.8. ChatOps Multi-Tool Aggregator
3.8. Evaluation Methodology
3.8.1. Experimental Environment and Data Collection
3.8.2. Time-Savings Measurement Protocol
3.8.3. Output-Quality Instrumentation
3.8.4. Availability Modeling
4. Results
4.1. Operational Efficiency
4.2. Observed Quality Signals
4.3. Offline RAG-Quality Benchmark
4.4. Service Availability Impact
5. Discussion
5.1. Principal Findings
5.2. Comparison with Related Work
5.3. Security, Privacy, and Responsible AI
5.4. Reliability and Safety of Autonomous Actions
5.5. Generalizability
5.6. Token Usage and Operational Considerations
5.7. Limitations and Threats to Validity
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AIOps | Artificial Intelligence for IT Operations |
| API | Application Programming Interface |
| BB | Backbone |
| BGP | Border Gateway Protocol |
| BMP | BGP Monitoring Protocol |
| ChatOps | Chat-driven Operations |
| ECMP | Equal-Cost Multi-Path |
| IGP | Interior Gateway Protocol |
| LLM | Large Language Model |
| LSDB | Link-State Database |
| MTBF | Mean Time Between Failures |
| MTTR | Mean Time to Repair |
| MCP | Model Context Protocol |
| NMS | Network Management System |
| NOC | Network Operations Center |
| RAG | Retrieval-Augmented Generation |
| RCA | Root-Cause Analysis |
| REST | Representational State Transfer |
| SE | Service Edge |
| SFP | Small Form-factor Pluggable |
| SLA | Service Level Agreement |
| SNMP | Simple Network Management Protocol |
| SP | Spine |
| SPF | Shortest Path First |
| UUID | Universally Unique Identifier |
| YAML | YAML Ain’t Markup Language |
Appendix A. Reproducibility
Appendix A.1. Software Stack and Model-Serving Configuration
| Component | Configuration | Role |
|---|---|---|
| LangChain | Python framework | Agent/LLM orchestration framework |
| LangGraph | Stateful agent graph | Supervisor/worker routing and execution graph |
| langgraph-checkpoint-redis | Redis-backed checkpointing | Durable checkpointing of agent state |
| langchain-mcp-adapters | MCP adapter layer | MCP tool-layer integration |
| mcp | Model Context Protocol runtime | Typed tool interface for workers |
| Qdrant (qdrant_client) | Vector database client | Vector store for the RAG subsystem |
| Neo4j (neo4j) | Graph database client | Graph-structured agent state |
| Redis | In-memory data store | Checkpoint / short-term memory backend |
| FastAPI | API framework | Service / feedback API |
| boto3 | AWS SDK client | Amazon Bedrock client |
| opentelemetry-instr.-langchain | Tracing instrumentation | LangChain/LangGraph telemetry hooks |
| Langfuse [18] | Self-hosted trace and evaluation server | Trace store, judge and feedback scoring |
| NetworkX | Python graph library | SPF/ECMP topology digital twin |
| Models | ||
| Initial (local; sensitive data) | Open-weight instruction-tuned model served through Ollama [35] | Local GPU serving for confidential requests |
| Current (cloud; bulk) | Managed Anthropic model through Amazon Bedrock | High-volume interactions within a US cross-region inference profile |
| Current (cloud; reasoning) | Managed Anthropic model through Amazon Bedrock | Complex reasoning within a US cross-region inference profile |
Appendix A.2. Supervisor Routing Schema
{ rationale: string,
{ confidence: float [0..1],
{ workers: [ {name: enum, args: object}, ...] }
Appendix A.3. Representative Tool Schema (MCP)
name: get_control_plane_paths
description: “Return current shortest/ECMP paths and
description: “adjacency state between two nodes.”
parameters: { source_node: string (required),
parameters: { target_node: string (required),
parameters: { k_paths: integer (default 3),
parameters: { as_of: timestamp (optional) }
returns: { paths: [ {hops: [string], metric: int} ],
returns: { isolated_nodes: [string], backup_available: bool }
Appendix A.4. Retrieval Pipeline Configuration
- Sparse vectors: a learned sparse expansion model (SPLADE [43], naver/splade-cocondenser-ensembledistil) for conversational and technical sources, and BM25 for structured documentation.
- Fusion: Reciprocal Rank Fusion [44] with , with up to 50 candidates retrieved per collection.
- Reranking: a managed neural reranker served through Amazon Bedrock [47] reduces the fused candidates to a top-N (5 in the production configuration), discarding any with a relevance score below .
- Knowledge graph: an optional graph layer (Section 3.5) returns related entities and relationships for incident and ticket records, and degrades gracefully when unavailable.
- Synthesis: the configured serving model (Section 3.4; the offline benchmark of Section 4.3 used the primary managed-cloud tier) produces the final grounded answer with per-source citations.
- Indexing: Qdrant uses cosine similarity over the dense vectors; chunking, metadata filters, and embedding input-type (query vs. document) settings are configurable and follow the per-source ingestion pipeline.
Appendix A.5. Digital-Twin Validation (SPF/ECMP Pre-Check)
- Build twin. Construct a graph from current topology state, with each edge weighted by its live IGP metric; mark nodes carrying an overload (do-not-transit) indicator.
- Baseline. Compute SPF/ECMP shortest paths over G for the affected source–target demands and record, for each demand, the set of viable next-hops (working and backup paths).
- Apply candidate. Form from G by raising the impacted link’s metric to the proposed value (e.g., a large static cost), modeling the intended de-preference of the flapping link.
- Recompute. Recompute SPF/ECMP over for the same demands.
- Isolation check. If any node reachable in the baseline becomes unreachable in —i.e., the change isolates a node—reject.
- Backup check. If, for any affected demand, no alternate viable path remains in (the change removes the only backup, accounting for overload-marked nodes), reject.
- Decision. Only if every demand retains a valid failover path and no node is isolated, return success; otherwise return reject and notify the operations channel that no safe change exists.
Appendix A.6. Per-Event Timing-Extraction Procedure
Appendix A.7. Evaluation Rubric: User Feedback and Judge Scoring
Appendix A.8. Illustrative Agent-Prompt Skeletons
You are a routing supervisor for a network-operations
assistant. Given a user intent, select one or more
specialized workers from the fixed worker set and
provide arguments for each.
Return ONLY structured output with fields:
xxrationale (why this routing), confidence in
xx[0,1], and workers: [{name, args}].
If the intent is ambiguous or low-confidence, do not
guess: request a clarification or fall back to a
conservative default worker. Do not invent worker
names or tools outside the provided set.
You are a specialized worker for domain. Use only
the provided tools and the retrieved context to
answer; do not fabricate values, identifiers, or
state. Cite the retrieved sources and tool
observations that support your answer, and state
clearly when evidence is insufficient.
State-changing actions follow the configured
guardrails: general change-management writes are
not auto-submitted and require explicit human
approval, whereas the narrow pre-validated
self-healing action proceeds only after the
digital-twin pre-check (Appendix A.5)
succeeds, and always emits a notification with a
version-controlled, reversible configuration change.
References
- Peci, F.; Hamiti, E.; Khan, I. Agentic AI with ChatOps for Large-Scale Network Operations. In Proceedings of the Proceedings of the 2025 IEEE Conference on Artificial Intelligence (CAI). IEEE, Conference version; this article is an extended version; 2025; pp. 1617–1626. [Google Scholar] [CrossRef]
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y. ReAct: Synergizing Reasoning and Acting in Language Models. In Proceedings of the Proceedings of the 11th International Conference on Learning Representations (ICLR), 2023. [Google Scholar]
- Shinn, N.; Cassano, F.; Gopinath, A.; Narasimhan, K.; Yao, S. Reflexion: Language Agents with Verbal Reinforcement Learning. Proc. Adv. Neural Inf. Process. Syst. 2023, arXiv:csVol. 36. [Google Scholar]
- Madaan, A.; Tandon, N.; Gupta, P.; Hallinan, S.; Gao, L.; Wiegreffe, S.; Alon, U.; Dziri, N.; Prabhumoye, S.; Yang, Y.; et al. Self-Refine: Iterative Refinement with Self-Feedback. Proc. Adv. Neural Inf. Process. Syst. 2023, arXiv:csVol. 36. [Google Scholar]
- Schick, T.; Dwivedi-Yu, J.; Dessì, R.; Raileanu, R.; Lomeli, M.; Hambro, E.; Zettlemoyer, L.; Cancedda, N.; Scialom, T. Toolformer: Language Models Can Teach Themselves to Use Tools. Proc. Adv. Neural Inf. Process. Syst. 2023, arXiv:csVol. 36. [Google Scholar]
- Patil, S.G.; Zhang, T.; Wang, X.; Gonzalez, J.E. Gorilla: Large Language Model Connected with Massive APIs. Proc. Adv. Neural Inf. Process. Syst. 2024, arXiv:csVol. 37, 126544–126565. [Google Scholar] [CrossRef]
- Yang, Z.; Li, L.; Wang, J.; Lin, K.; Azarnasab, E.; Ahmed, F.; Liu, Z.; Liu, C.; Zeng, M.; Wang, L. MM-ReAct: Prompting ChatGPT for Multimodal Reasoning and Action. arXiv 2023, arXiv:2303.11381. [Google Scholar]
- Huang, X.; Liu, W.; Chen, X.; Wang, X.; Wang, H.; Lian, D.; Wang, Y.; Tang, R.; Chen, E. Understanding the Planning of LLM Agents: A Survey. arXiv 2024, arXiv:2402.02716. [Google Scholar] [CrossRef]
- Packer, C.; Wooders, S.; Lin, K.; Fan, V.; Patil, S.G.; Stoica, I.; Gonzalez, J.E. MemGPT: Towards LLMs as Operating Systems. arXiv 2023, arXiv:2310.08560. [Google Scholar]
- Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A Survey on Large Language Model Based Autonomous Agents. Front. Comput. Sci. 2024, arXiv:cs18, 186345. [Google Scholar] [CrossRef]
- Wu, Q.; Bansal, G.; Zhang, J.; Wu, Y.; Li, B.; Zhu, E.; Jiang, L.; Zhang, X.; Zhang, S.; Liu, J.; et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. LLMAgents @ ICLR 2024 workshop oral, 2024. [Google Scholar]
- CrewAI. CrewAI: Framework for Orchestrating Role-Playing Autonomous AI Agents. 2024. Available online: https://www.crewai.com/ (accessed on 2025-05-01).
- LangChain. LangGraph: Building Stateful, Multi-Actor Applications with LLMs. 2026. Available online: https://docs.langchain.com/oss/python/langgraph/overview (accessed on 2026-06-01).
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, M.; Wang, H. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv 2023, arXiv:2312.10997. [Google Scholar]
- Es, S.; James, J.; Espinosa-Anke, L.; Schockaert, S. RAGAS: Automated Evaluation of Retrieval Augmented Generation. In Proceedings of the Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2024; pp. 150–158. [Google Scholar] [CrossRef]
- Saad-Falcon, J.; Khattab, O.; Potts, C.; Zaharia, M. ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems. In Proceedings of the Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2024; pp. 338–354. [Google Scholar] [CrossRef]
- TruEra, Snowflake. TruLens: Evaluation and Tracking for LLM Experiments. 2024. Available online: https://www.trulens.org/ (accessed on 2025-05-01).
- Langfuse. Langfuse: Open-Source LLM Engineering Platform—Tracing, Evaluation, and Observability. 2024. Available online: https://langfuse.com/ (accessed on 2025-05-01).
- Yu, W.; Zhang, H.; Pan, X.; Cao, P.; Ma, K.; Li, J.; Wang, H.; Yu, D. Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models. Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 2024, arXiv:cs, 14672–14685. [Google Scholar] [CrossRef]
- Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Trans. Inf. Syst. 2025, arXiv:cs43, 42:1–42:55. [Google Scholar] [CrossRef]
- Huang, Y.; Du, H.; Zhang, X.; Niyato, D.; Kang, J.; Xiong, Z.; Wang, S.; Huang, T. Large Language Models for Networking: Applications, Enabling Techniques, and Challenges. arXiv 2023, arXiv:2311.17474. [Google Scholar] [CrossRef]
- Maatouk, A.; Piovesan, N.; Ayed, F.; De Domenico, A.; Debbah, M. Large Language Models for Telecom: Forthcoming Impact on the Industry. IEEE Commun. Mag. 2025, arXiv:cs63, 62–68. [Google Scholar] [CrossRef]
- Wu, D.; Wang, X.; Qiao, Y.; Wang, Z.; Jiang, J.; Cui, S.; Wang, F. NetLLM: Adapting Large Language Models for Networking. In Proceedings of the Proceedings of the ACM SIGCOMM 2024 Conference, 2024. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, L.; Yang, Y.; Zhuang, Z.; Qi, Q.; Sun, H.; Lu, L.; Feng, J.; Liao, J. Network Meets ChatGPT: Intent Autonomous Management, Control and Operation. J. Commun. Inf. Netw. 2023, 8, 239–255. [Google Scholar] [CrossRef]
- Bandlamudi, J.; Mukherjee, K.; Agarwal, P.; Dechu, S.; Huo, S.; Isahagian, V.; Muthusamy, V.; Purushothaman, N.; Sindhgatta, R. Towards Hybrid Automation by Bootstrapping Conversational Interfaces for IT Operation Tasks. Proc. Proc. AAAI Conf. Artif. Intell. 2023, Vol. 37, 15654–15660. [Google Scholar] [CrossRef]
- Wulf, J.; Meierhofer, J. Exploring the Potential of Large Language Models for Automation in Technical Customer Service. In Proceedings of the Digital Service Innovation: Redefining Provider-Customer Interactions—Proceedings of the Spring Servitization Conference, 2024; pp. 146–157. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, Z.; Zhang, Y.; Zhong, A.; Wang, J.; Yin, F.; Fan, L.; Wu, L.; Wen, Q. RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models. In Proceedings of the Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM), 2024; pp. 4966–4974. [Google Scholar] [CrossRef]
- Zhang, D.; Zhang, X.; Bansal, C.; Las-Casas, P.; Fonseca, R.; Rajmohan, S. PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis. arXiv 2023, arXiv:2309.05833. [Google Scholar] [CrossRef]
- An, K.; Yang, F.; Lu, J.; Li, L.; Ren, Z.; Huang, H.; Wang, L.; Zhao, P.; Kang, Y.; Ding, H.; et al. Nissist: An Incident Mitigation Copilot Based on Troubleshooting Guides. In Proceedings of the ECAI 2024: 27th European Conference on Artificial Intelligence, 2024. [Google Scholar] [CrossRef]
- Jiang, Y.; Zhang, C.; He, S.; Yang, Z.; Ma, M.; Qin, S.; Kang, Y.; Dang, Y.; Rajmohan, S.; Lin, Q.; et al. Xpert: Empowering Incident Management with Query Recommendations via Large Language Models. In Proceedings of the Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE), 2024. [Google Scholar] [CrossRef]
- Shetty, M.; Bansal, C.; Upadhyayula, S.P.; Radhakrishna, A.; Gupta, A. AutoTSG: Learning and Synthesis for Incident Troubleshooting. In Proceedings of the Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2022. [Google Scholar] [CrossRef]
- Roy, D.; Zhang, X.; Bhave, R.; Bansal, C.; Las-Casas, P.; Fonseca, R.; Rajmohan, S. Exploring LLM-Based Agents for Root Cause Analysis. In Proceedings of the Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE Companion), 2024; pp. 208–219. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Battah, A.; Tihanyi, N.; Jain, R.; Maimuţ, D.; Alwahedi, F.; Lestable, T.; Thandi, N.S.; Mechri, A.; Debbah, M.; et al. SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs? IEEE Trans. Softw. Eng. 2025, 51, 1248–1265. [Google Scholar] [CrossRef]
- Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; de las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825. [Google Scholar] [CrossRef]
- Mistral, A.I. Mistral-Small-24B-Instruct-2501 Model Card. 2025. Available online: https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501 (accessed on 2026-06-01).
- Touvron, H.; Martin, L.; Stone, K.; et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
- Amazon Web Services. Amazon Bedrock — Anthropic Claude models. 2026. Available online: https://docs.aws.amazon.com/bedrock/latest/userguide/model-cards-anthropic.html (accessed on 2026-06-01).
- NetBox Labs. NetBox: The Premier Network Source of Truth. 2024. Available online: https://netboxlabs.com/docs/netbox/ (accessed on 2026-06-01).
- Ciena. Ciena Route Explorer. 2026. Available online: https://www.ciena.com/insights/data-sheets/Route-Explorer.html (accessed on 2026-06-01).
- Selector, A.I. Selector AI Network Observability Platform. 2026. Available online: https://www.selector.ai/ (accessed on 2026-06-01).
- Case, J.; Fedor, M.; Schoffstall, M.; Davin, J. A Simple Network Management Protocol (SNMP); IETF, 1990. [Google Scholar]
- Scudder, J.; Fernando, R.; Stuart, S. BGP Monitoring Protocol (BMP); IETF, 2016. [Google Scholar]
- Formal, T.; Piwowarski, B.; Clinchant, S. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. In Proceedings of the Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021; pp. 2288–2292. [Google Scholar] [CrossRef]
- Cormack, G.V.; Clarke, C.L.A.; Büttcher, S. Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. In Proceedings of the Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2009; pp. 758–759. [Google Scholar] [CrossRef]
- Amazon Web Services. Amazon Bedrock — Cohere Embed Multilingual. 2026. Available online: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-cohere-embed-multilingual.html (accessed on 2026-06-01).
- Amazon Web Services. Amazon Bedrock — Cohere Embed v3 Model Parameters. 2026. Available online: https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed-v3.html (accessed on 2026-06-01).
- Amazon Web Services. Amazon Bedrock — Supported Regions and Models for Reranking. 2026. Available online: https://docs.aws.amazon.com/bedrock/latest/userguide/rerank-supported.html (accessed on 2026-06-01).










| System | Primary domain | Multi-tool | LocalLLM | RAG | Prod.eval. | Closed-loop |
|---|---|---|---|---|---|---|
| RCAgent [27] | Cloud RCA | ✓ | ✗ | ∼ | ∼ | ✗ |
| Nissist [29] | Incident mitigation | ∼ | ✗ | ✓ | ∼ | ✗ |
| Xpert [30] | Incident query rec. | ∼ | ✗ | ∼ | ✓ | ✗ |
| NetLLM [23] | Networking (multi-task) | ✗ | ∼ | ✗ | ∼ | ✗ |
| Network-ChatGPT [24] | Intent mgmt. | ∼ | ✗ | ✗ | ✗ | ∼ |
| AutoTSG [31] | Troubleshooting synth. | ∼ | ✗ | ∼ | ∼ | ✗ |
| This work | Carrier network ops | ✓ | ✓ | ✓ | ✓a | ✓b |
| Deployment | Models |
|---|---|
| Initial (conference [1]) | Open-weight instruction-tuned model served locally for confidential requests [34,35], with a cloud model used only for non-sensitive inference |
| Current (managed cloud) | Managed Anthropic model tiers served through Amazon Bedrock: a lower-latency tier for high-volume interactions and a higher-capability tier for complex reasoning, within the operator’s cloud tenancy |
| Worker / agent | Operational function |
|---|---|
| Supervisor-routed domain workers (14) | |
| General troubleshooting assistant | Broad Q&A and diagnostics with access to the full tool set |
| Inventory & infrastructure | Device, circuit, and spare-location lookup over the source of truth |
| Service & connection lookup | Customer-facing interconnection services and connection state |
| Network infrastructure | Control-plane topology, path tracing, and traffic-engineering metrics |
| IP / DNS | IP address management and DNS operations |
| Change requests | Change-request creation, scheduling, and metadata retrieval |
| Service & incident management | Incident/ticket creation, ITSM and ticketing workflow, case lookups |
| Vendor support | Vendor technical-support (TAC) case management |
| Vendor insights | Vendor product and knowledge insights |
| RMA tracking | Return-merchandise-authorization case operations and status |
| Shipment / logistics | Hardware shipment and logistics tracking |
| Device-access management | Out-of-band/console device-access information |
| Service-inventory posting | Service-inventory record creation and updates |
| Utilities & knowledge base | Knowledge-base retrieval, operational CLIs, and notifications |
| Expert agents (pattern-matched) | |
| Change validation | Pre-/post-change validation against a baseline; validation reports |
| Configuration audit | Configuration and compliance auditing |
| Autonomous fast-path agents | |
| Self-healing | Detection-triggered closed-loop remediation (e.g., link-flap cost-out) |
| Network troubleshooting / RCA | Root-cause analysis, device-stability/flap detection, telemetry correlation |
| Incident-management gate | Cross-agent policy gate invoked by the autonomous agents |
| Task | Manual | Agentic | Reduction |
|---|---|---|---|
| (s/event) | (s/event) | factor | |
| Spare locator | 900 | 10 | 90× |
| Console information | 30 | 3 | 10× |
| Packet loss / latency | 3 600 | 15 | 240× |
| Path retrieval | 3 600 | 5 | 720× |
| Node isolation | 7 200 | 3 | 2 400× |
| Vendor search | 10 800 | 30 | 360× |
| Measurement | Value |
|---|---|
| Interactions traced (90 days) | ∼7,400 |
| Interaction channel | |
| via ChatOps messaging channel | ∼91% |
| other (automated tool invocations) | ∼9% |
| Task class (partitions the ∼7,400 traces) | |
| automated self-healing/monitoring | ∼58% (∼4,300) |
| engineer-initiated queries | ∼42% (∼3,100) |
| Rated responses (user feedback) | 168 |
| positive / negative | 104 / 64 |
| Positive share of rated responses | 61.9% (104/168) |
| 95% CI (normal approx.) | [54.5%, 69.2%] |
| response rate | ∼2.3% (168/∼7,400) |
| Model-token spend (Bedrock) | Prompt and completion tokens recorded per trace |
| Primary / reasoning serving tiers | Managed model tiers on Amazon Bedrock |
| Stage (representative 30-day counts) | Count |
|---|---|
| Raw signals (each stream measured independently) | |
| anomalies | ∼1,919,632 |
| monitoring-system alerts | ∼715,848 |
| combined raw signals | ∼2.64 M |
| Distilled categories (each measured independently) | |
| high-impact alerts | 7,885 |
| actionable events | 645 |
| incidents created | 4,382 (∼146/day) |
| Judge dimension | Mean | SD | 95% CI |
|---|---|---|---|
| Relevance | 0.847 | 0.157 | [0.825, 0.869] |
| Context relevance | 0.788 | 0.228 | [0.756, 0.820] |
| Correctness | 0.739 | 0.236 | [0.707, 0.772] |
| Groundedness / no-hallucination score | 0.707 | 0.302 | [0.665, 0.749] |
| Helpfulness | 0.653 | 0.198 | [0.625, 0.680] |
| Conciseness | 0.332 | 0.071 | [0.322, 0.342] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).