Submitted:
06 May 2026
Posted:
06 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- The first category, represented by ToolLLM [17] and similar vector-retrieval systems [7], builds centralized registries or global indexes. For every query, the system computes similarity against all known tools, incurring computational cost and unacceptable latency at scale.
- The second category, embodied by frameworks like OpenClaw [25,27], injects all tool descriptions into the large language model [33] (LLM)’s context window. This strategy becomes infeasible when the tool set grows beyond a few hundred entries, as the context window is exhausted and inference cost grows quadratically.
- We put forth ToolDNS, an infrastructure-native hierarchical architecture for AI tool discovery. By encoding functional, domain, and protocol attributes into a semantic namespace under the .tools TLD, we retrofit tool discovery onto the existing DNS hierarchy. This design bypasses the traditional path of rebuilding centralized registries at a new layer 8 or 9, achieving low engineering migration costs, bidirectional compatibility with standard DNS clients and resolvers, and global high availability inherited from the DNS infrastructure itself. The hierarchical structure reduces per-query search complexity from to .
- We introduce a delegated trust and governance mechanism based on logical subdomains and standard DNS NS records. By inheriting the hierarchical endorsement of parent domains, subdomains provide verifiable identity and security governance through the established reputations of authoritative institutions. At the same time, this mechanism circumvents the administrative monopoly of traditional DNS by allowing multiple entities to manage their own independent resources under shared functional namespaces. Agents can restrict discovery queries to trusted subdomains only, achieving decentralized trust management, flexible security isolation, and equal-footing multi-tenant governance without a central authority.
- We design an LLM-augmented, index-free retrieval protocol using EDNS0 extensions. Instead of maintaining a global vector index, our method performs on-the-fly semantic pruning: the recursive resolver carries the user’s natural language intent and a K parameter to each authoritative server, which returns the top-K most relevant subdomains via lightweight LLM scoring. This avoids periodic index retraining and naturally accommodates dynamic tool repositories. New tools simply appear as new entries in the lowest-level subdomains and are discovered at query time, eliminating synchronization lag. This protocol turns each DNS iteration into an adaptive, semantic narrowing step, making the discovery process independent of the underlying tool repository’s update frequency.
- We create and release a large-scale heterogeneous benchmark dataset comprising real-world tools spanning MCP, A2A, RESTful, and Skill protocols. Using this dataset, we empirically validate that ToolDNS reduces the per-query search space by through only two layers of hierarchical pruning, while achieving retrieval accuracy comparable to state-of-the-art vector retrieval baselines. Furthermore, ToolDNS improves response latency by orders of magnitude compared to exhaustive scan methods, demonstrating its ability to balance high precision and low overhead in ultra-large-scale deployment scenarios.
2. Problem Formulation
2.1. AI Tool Discovery
- A functional description , typically a natural language string that specifies what the tool does, its input/output schema, and usage constraints.
- A protocol specification , where is the set of supported invocation protocols (e.g., MCP, A2A, RESTful, Skill).
- An access endpoint , typically a network address (IP and port) or a URI.
- An organizational trust anchor , denoting the entity that publishes and vouches for the tool (e.g., HKU, Google).
- The system must operate over the existing Internet infrastructure without requiring new global services, custom SDKs, or modifications to client network stacks beyond widely available features [20].
- No single entity should have unilateral control over tool registration or discovery [10]. Different organizations must be able to manage their own tool namespaces independently, while still interoperating under a common logical root.
- Agents must be able to enforce security policies (e.g., “only use tools endorsed by my organization or by well-known security auditors”) without relying on a single global trust anchor.
- The discovery mechanism must not depend on the specific invocation protocol of the tool [6]. Tools using MCP, A2A, REST, or future protocols must be discoverable through the same interface.
- Tools appear, disappear, and change their descriptions frequently. The discovery system should reflect these changes with low latency (ideally, at query time) and without expensive global re-indexing.
2.2. Limitations of Existing Paradigms
3. ToolDNS System Design
3.1. Overview
- Client (agent): Any agent that supports standard DNS queries and EDNS0 can act as a client. The client formulates an intent q and issues a service query (SRV) for a special domain name under .tools. No custom SDK, protocol adaptation, or central registration is required.
- Recursive resolver: The resolver performs iterative resolution on behalf of the client. It maintains a cache of delegation records (NS records) and, when necessary, traverses the hierarchy by following referrals. In ToolDNS, the resolver also handles partially unfolded domain names, a construct that encodes the current search position within the domain name itself, and passes the EDNS0 payload unchanged to each authoritative server.
- Root and TLD servers: The root servers are unchanged; they only need to contain NS records for the .tools TLD. The TLD servers for .tools are enhanced with a semantic matching module: given a query with an EDNS0 payload and a partially unfolded name, they return the most relevant subdomains (e.g., weather.tools, nlp.tools) instead of a single exact tool matched.
- Intermediate authoritative servers: These servers manage subdomains deeper in the hierarchy (e.g., history.weather.tools). Their behavior mirrors that of TLD servers: they receive a partially unfolded name, use the EDNS0 payload to select the top-K matching child subdomains, and return NS records pointing to the next-level authoritative servers.
- Leaf authoritative servers: These servers directly host tool instances. They store SRV records for fully expanded domain names, along with tool metadata (description, protocol, endpoint). Upon receiving a query that reaches the leaf level, they perform a final semantic match over the local tool list and return the top-K tool endpoints.
3.2. Hierarchical Semantic Namespace
3.2.1. Functional Hierarchy
3.2.2. Logical Subdomains for Decentralized Trust
- 1.
- Public common class: These subdomains typically consist of “official” followed immediately by a functional prefix, with “official” usually being hidden (e.g., official.weather.tools → weather.tools). They serve as open entry points and may be managed by a community body or the .tools registry. Their purpose is to provide a neutral discovery path for agents that do not have specific trust requirements.
- 2.
- Certified trust class: These are subdomains that include an organizational identifier (e.g., hku.weather.tools). The identifier is typically placed immediately to the left of the functional prefix. By resolving through such a subdomain, an agent receives tools that are directly endorsed and managed by the named organization. This design enables verifiable, accountable service discovery without centralization.
3.3. Query Protocol and Semantic Encoding
3.3.1. Protocol Support
3.3.2. Partially Unfolded Domain Names
- Fully expanded domain name: This follows the standard SRV record format: _service._proto.domain. (with no leading underscore before domain). It indicates that the resolution has reached a leaf node, and the domain part can be resolved to an IP address via A/AAAA records. For example, _mcp._tcp.api.history.weather.tools. is a fully expanded name.
- Partially expanded domain name: This extends the SRV format by inserting an extra underscore before the domain part: _service._proto._.domain.. The underscore acts as a search cursor: it marks that the path is still under construction and that the resolver should continue traversing deeper. The cursor is a placeholder that will be replaced by the next matched subdomain label.
3.3.3. EDNS0 Semantic Payload
- Version (8 bits): Currently set to 0x00. Future revisions of the payload format can increment this field while using the same option code, ensuring backward compatibility.
- Length (16 bits): The length in bytes of the following Payload field (not including Version, Length, or K). This allows payloads up to bytes, sufficient for complex natural language queries.
- K (8 bits): The number (unsigned integer) of top results requested at each semantic pruning step. A value of is reserved for special use (e.g., cache warming, as discussed in Section 4.1).
- Payload (variable): A UTF-8 encoded string containing the agent’s intent. For version 0, this is plain text; future versions may support compressed or simple structured formats (e.g., CBOR).
3.4. Iterative Resolution Algorithm
| Algorithm 1: ToolDNS discovery at the recursive resolver. |
![]() |
3.5. LLM-Augmented Semantic Pruning at Authoritative Servers
3.5.1. Non-Leaf Servers (TLD and Intermediate)
- Embedding-based cosine similarity: Pre-compute embeddings for each child summary; at query time, embed I and compute dot products. This yields high accuracy but requires an embedding model.
- Keyword matching: Use TF-IDF or BM25 on the summaries. Faster but less accurate.
- Small LLM scoring: For maximum adaptability, the server can invoke a tiny LLM (e.g., a sub-10B active-parameters model) to score the top few candidates.
3.5.2. Leaf Servers
4. Governance And Practical Considerations
4.1. Caching and Performance Optimization
- 1.
- Delegation (NS) records for non-leaf nodes: These records have a long Time-To-Live (TTL), typically hours or days, because the hierarchical structure (e.g., which subdomains exist under weather.tools) changes infrequently. Once cached, the resolver can skip one or more RTTs when answering subsequent queries.
- 2.
- Tool instance (SRV) records for leaf nodes: These have a shorter TTL (minutes) to reflect the dynamic nature of tool availability and endpoints.
- Server address search: If the resolver knows the part of the domain, the resolver first looks for the longest suffix match in the cache that corresponds to a fully expanded or partially unfolded name. If the resolver ends at a non-leaf node of the tree, it can continue to send requests to real authoritative DNS server for the next subdomain. If the entire path to a leaf is cached, the resolver can directly query the leaf server without contacting intermediate servers.
- Query mock: If the resolver does not know any information about the target domain, the resolver does not send request to a DNS server, instead, it skips the request and gets the response directly from the cache tree. Then the resolver can run a semantic match for top-K subdomains until it runs beyond the tree. Then it continues to send request to the real authoritative DNS server for next subdomain.
4.2. Practical Considerations
4.3. Forward Compatibility
5. Experiments
5.1. Heterogeneous Dataset Construction
- RESTful API: Sourced from the G1 task set of the ToolBench benchmark, representing traditional Web API specifications;
- MCP Tools: Collected from the MCP toolset on MCPZoo[29], representing the emerging ecosystem of model context protocols;
- OpenClaw Skills: Based on the community-maintained Awesome OpenClaw Skills list, representing skill invocation standards;
- A2A: Based on the official A2A protocol examples released by Google, representing communication specifications between agents.
5.2. End-to-End Query Hit Rate Comparison
5.3. Computational Complexity and Search Space Reduction
5.4. Comparative Evaluation of Network Efficiency
5.5. Effectiveness of Hierarchical Structure against Attention Dilution
- 1.
- Hierarchical resolution: The model makes a step-wise decision, mimicking the ToolDNS delegation logic. At each level, it evaluates only the children of the current node.
- 2.
- Flat retrieval: All leaf subdomains are expanded into a single, undifferentiated list. The model is asked to select the most relevant item from this exhaustive set.
6. Conclusion
Appendix A. Prompt Templates

Appendix B. List of Tool Categories
- 1.
- Web_Search_and_SEO.
- 2.
- IoT.
- 3.
- Religion_and_Spirituality.
- 4.
- Agriculture_or_Horticulture.
- 5.
- Logistics.
- 6.
- Blockchain.
- 7.
- Coding.
- 8.
- Calendar.
- 9.
- Social_Media.
- 10.
- Weather_and_Climate.
- 11.
- Video.
- 12.
- Security.
- 13.
- Email.
- 14.
- Audio.
References
- Roy Arends, Rob Austein, Matt Larson, Dan Massey, and Scott Rose. 2005. RFC 4033: DNS security introduction and requirements.
- Kenneth P Birman. 2005. Reliable distributed systems: technologies, web services, and applications. Springer.
- Enfang Cui, Yujun Cheng, Rui She, Dan Liu, Zhiyuan Liang, Minxin Guo, Tianzheng Li, Qian Wei, Wenjuan Xing, and Zhijie Zhong. 2025. AgentDNS: A Root Domain Naming System for LLM Agents. arXiv:2505.22368 [cs.AI] https://arxiv.org/abs/2505.22368.
- Hongwei Cui, Yuyang Du, Qun Yang, Yulin Shao, and Soung Chang Liew. 2024. LLMind: Orchestrating AI and IoT with LLM for complex task execution. IEEE Communications Magazine 63, 4 (2024), 214–220.
- Joao Damas, Michael Graff, and Paul Vixie. 2013. RFC 6891: Extension mechanisms for DNS (EDNS (0)).
- Yunus Durmus and Ertan Onur. 2015. Service knowledge discovery in smart machine networks. Wireless Personal Communications 81, 4 (2015), 1455–1480.
- Abul Ehtesham, Aditi Singh, Gaurav Kumar Gupta, and Saket Kumar. 2025. A survey of agent interoperability protocols: Model context protocol (MCP), agent communication protocol (ACP), agent-to-agent protocol (A2A), and agent network protocol (ANP). arXiv preprint arXiv:2505.02279 (2025).
- Robert Elz and Randy Bush. 1997. RFC2181: Clarifications to the DNS Specification.
- Roy Thomas Fielding. 2000. Architectural styles and the design of network-based software architectures. University of California, Irvine.
- De Filippi et al. 2016. The invisible politics of Bitcoin: governance crisis of a decentralised infrastructure. Internet Policy Review 5, 3 (2016).
- Arnt Gulbrandsen, Paul Vixie, and Levon Esibov. 2000. RFC2782: A DNS RR for specifying the location of services (DNS SRV).
- Mark Handley. 2006. Why the Internet only just works. BT Technology Journal 24, 3 (2006), 119–129.
- Paul Hoffman and Patrick McManus. 2018. RFC 8484: DNS queries over HTTPS (DoH).
- Zi Hu, Liang Zhu, John Heidemann, Allison Mankin, Duane Wessels, and Paul Hoffman. 2016. RFC 7858: Specification for DNS over transport layer security (TLS).
- Ken Huang, Vineeth Sai Narajala, Idan Habler, and Akram Sheriff. 2026. Agent name service (ANS): A universal directory for secure AI agent discovery and interoperability. In International Conference on AI in Cybersecurity (ICAIC). IEEE, 1–9.
- Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. 2024. Gorilla: Large language model connected with massive APIs. Advances in Neural Information Processing Systems 37 (2024), 126544–126565.
- Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. 2023. ToolLLM: Facilitating large language models to master 16000+ real-world APIs. arXiv preprint arXiv:2307.16789 (2023).
- Ramesh Raskar, Pradyumna Chari, John Zinky, Mahesh Lambe, Jared James Grogan, Sichao Wang, Rajesh Ranjan, Rekha Singhal, Shailja Gupta, Robert Lincourt, et al. 2025. Beyond DNS: Unlocking the internet of AI agents via the nanda index and verified agentfacts. arXiv preprint arXiv:2507.14263 (2025).
- Partha Pratim Ray. 2025. A survey on model context protocol: Architecture, state-of-the-art, challenges and future directions. Authorea Preprints (2025).
- David P Reed. 2010. End-to-end arguments: The Internet and beyond. In USENIX Security Symposium.
- Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems 36 (2023), 68539–68551.
- Yulin Shao, Qi Cao, and Deniz Gündüz. 2024. A theory of semantic communication. IEEE Transactions on Mobile Computing 23, 12 (2024), 12211–12228.
- Yulin Shao, Deniz Gündüz, and Soung Chang Liew. 2021. Federated edge learning with misaligned over-the-air computation. IEEE Transactions on Wireless Communications 21, 6 (2021), 3951–3964.
- Ion Stoica and Scott Shenker. 2021. From cloud computing to sky computing. In Proceedings of the Workshop on Hot Topics in Operating Systems. 26–32.
- OpenClaw Team. 2024. OpenClaw Documentation. https://docs.openclaw.ai/. Accessed: 2026-03-30.
- Qwen Team. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https://arxiv.org/abs/2505.09388.
- Lukas Weidener, Marko Brkić, Phillip Lee, Martin Karlsson, Kevin Noessler, and Paul Kohlhaas. 2026. From agent-only social networks to autonomous scientific research: Lessons from OpenClaw and Moltbook, and the architecture of ClawdLab and Beach.Science. arXiv preprint arXiv:2602.19810 (2026).
- Niklaus Wirth. 2002. A plea for lean software. Computer 28, 2 (2002), 64–68.
- Mengying Wu, Pei Chen, Geng Hong, Baichao An, Jinsong Chen, Binwang Wan, Xudong Pan, Jiarun Dai, and Min Yang. 2025. MCPZoo: A Large-Scale Dataset of Runnable Model Context Protocol Servers for AI Agent. arXiv preprint arXiv:2512.15144 (2025).
- Renjun Xu and Yang Yan. 2026. Agent skills for large language models: Architecture, acquisition, security, and the path forward. arXiv preprint arXiv:2602.12430 (2026).
- John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. Swe-agent: Agent-computer interfaces enable automated software engineering. Advances in Neural Information Processing Systems 37 (2024), 50528–50652.
- Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations.
- Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 1, 2 (2023), 1–124.









| Step | Removed | Remaining | Rate (%) |
|---|---|---|---|
| Raw Data | — | 54700 | 0.00 |
| Step 2 | 3730 | 50970 | 6.82 |
| Step 4 | 7359 | 43611 | 13.45 |
| Step 6 | 9923 | 100 | 18.14 |
| Total | 21012 | 33688 | 38.41% |
| Request | Response | ||||||||
| Scheme | Total (MB) | Packets | Bytes (KB)/ Query |
Packets/ Query |
Total (MB) | Packets | Bytes (KB)/ Query |
Packets/ Query |
Delay (ms)/ Query |
| AgentDNS | 15.69 | 145,107 | 1.16 | 10.77 | 24.10 | 133,020 | 1.79 | 9.87 | 2035.51 |
| ANS | 14.66 | 145,108 | 1.09 | 10.77 | 18.96 | 133,022 | 1.41 | 9.87 | 2042.68 |
| ToolDNS | 9.70 | 40,198 | 0.65 | 2.98 | 4.40 | 40,198 | 0.33 | 2.98 | 5.62 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
