Submitted:
07 June 2026
Posted:
09 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Major Application Scenarios of Multi-Agent Systems in Clinical Medicine
2.1. Clinical Applications in Medicine
2.1.1. Diagnosis and Differential Diagnosis
2.1.2. Treatment and Decision Support
2.1.3. Medical Imaging
2.1.4. Patient Monitoring and Care Management
2.1.5. Surgical Assistance and Surgical Robotics
2.2. Supporting Applications in Medicine
2.2.1. Hospital Workflow Automation
2.2.2. Evidence Synthesis and Clinical Research
2.2.3. Medical Education and Clinical Training
2.2.4. Clinical Safety Governance
3. Multi-Agent System Architectures for Clinical Workflows
3.1. Clinical Collaboration and Communication Design
3.2. Knowledge Augmentation and External Memory Architectures for Clinical Evidence Chains
3.3. Multimodal Information Integration
3.4. Cross-Institutional Collaboration and Privacy-Preserving Coordination
3.5. Reinforcement Learning–Based Optimization and Evolution of Medical Agents
4. Evaluation Systems and Clinical Validation Benchmarks
4.1. Evaluation Metrics and Methods
4.2. Comparative Analysis of Multi-Agent Collaboration Effectiveness
4.3. Advanced Validation Methods: Human Comparison and Temporal Backtesting
4.4. Robustness, Operational Efficiency, and Dataset Summary
5. Challenges and Ethics
5.1. Technical Summary and Challenges
5.1.1. Medical Hallucination and Cascaded Error Amplification
5.1.2. Bottlenecks in Collaborative Scheduling and Risk of Consensus Bias
5.1.3. Clinical Memory Management and Pressure on Privacy Boundaries
5.2. Ethical and Privacy Issues
5.2.1. Expansion of Privacy Boundaries
5.2.2. Complexification of Accountability Structures
5.2.3. Chain Amplification of Bias Propagation
6. Future Directions
6.1. Paths for Technical Innovation
6.2. Clinical Application Expansion
7. Conclusions
Acknowledgments
Author Contributions
Declaration of Interests
Declaration of Generative AI and AI-Assisted Technologies in the Writing Process
References
- Lee, P. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N. Engl. J. Med. 2023, 388, 1233–1239. [Google Scholar] [CrossRef]
- Eriksen, A.V.; Möller, S.; Ryg, J. Use of GPT-4 to Diagnose Complex Clinical Cases. N. Engl. J. Med. 2024, 1. [Google Scholar] [CrossRef]
- Bean, A.M.; Payne, R.E.; Parsons, G.; et al. Reliability of LLMs as medical assistants for the general public: A randomized preregistered study. Nat. Med. 2026, 32, 609–615. [Google Scholar] [CrossRef]
- Liu, F.; Niu, Y.; Zhang, Q.; et al. A foundational architecture for AI agents in healthcare. Cell Rep. Med. 2025, 6, 102374. [Google Scholar] [CrossRef]
- Fan, W.; Chen, P.; Shi, D.; Guo, X.; Kou, L. Multi-agent modeling and simulation in the AI age. Tsinghua Sci. Technol. 2021, 26, 608–624. [Google Scholar] [CrossRef]
- Gao, S.; Fang, A.; Huang, Y.; et al. Empowering biomedical discovery with AI agents. Cell 2024, 187, 6125–6151. [Google Scholar] [CrossRef]
- Klang, E.; Omar, M.; Raut, G.; et al. Orchestrated multi agents sustain accuracy under clinical-scale workloads compared to a single agent. npj Health Syst. 2026, 3, 23. [Google Scholar] [CrossRef]
- Sorka, M.; Gorenshtein, A.; Abramovitch, H.; Soontrapa, P.; Shelly, S.; Aran, D. AI vs. Human Performance in Conversational Hospital-Based Neurological Diagnosis. medRxiv 2025. [Google Scholar] [CrossRef]
- Almansoori, M.; Kumar, K.; Cholakkal, H. Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions. arXiv 2025. [Google Scholar] [CrossRef]
- Zhao, Y.; Wang, H.; Zheng, Y.; Wu, X. A Layered Debating Multi-Agent System for Similar Disease Diagnosis.
- Chen, X.; Yi, H.; You, M.; et al. Enhancing diagnostic capability with multi-agents conversational large language models. npj Digit Med. 2025, 8, 159. [Google Scholar] [CrossRef]
- Neeley, M.B.; Mao, D. Survey and Improvement Strategies for Gene Prioritization with Large Language Models.
- Zhou, X.; Ren, Y.; Zhao, Q.; et al. An LLM-Driven Multi-Agent Debate System for Mendelian Diseases. arXiv 2025. [Google Scholar] [CrossRef]
- Zhao, W.; Wu, C.; Fan, Y.; et al. An Agentic System for Rare Disease Diagnosis with Traceable Reasoning. arXiv 2026. [Google Scholar] [CrossRef]
- Chen, K.; Li, X.; Yang, T.; Wang, H.; Dong, W.; Gao, Y. MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation. arXiv 2025. [Google Scholar] [CrossRef]
- Esteitieh, Y.; Mandal, S.; Laliotis, G. Towards Metacognitive Clinical Reasoning: Benchmarking MD-PIE Against State-of-the-Art LLMs in Medical Decision-Making. medRxiv 2025. [Google Scholar] [CrossRef]
- Chen, K.; Qi, J.; Huo, J.; et al. A Self-Evolving Framework for Multi-Agent Medical Consultation Based on Large Language Models. In ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: New York, NY, USA, 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Peng, Q.; Cui, J.; Xie, J.; Cai, Y.; Li, Q. Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree. arXiv 2025. [Google Scholar] [CrossRef]
- Madrid-García, A.; Benavent, D.; Merino-Barbancho, B. From chat to act: Large language model agents and agentic AI as the next frontier of AI in rheumatology. EULAR Rheumatol. Open 2025, 1, 147–156. [Google Scholar] [CrossRef]
- Xu, G.; Meng, Y.; Wang, R.; Qi, G. Collaborating LLMs and PLMs for Medical Tasks. In 2024 IEEE International Conference on Knowledge Graph (ICKG); IEEE: New York, NY, USA, 2024; pp. 428–431. [Google Scholar] [CrossRef]
- Iapascurta, V.; Fiodorov, I.; Belii, A.; Bostan, V. Multi-Agent Approach for Sepsis Management. Healthc. Inf. Res. 2025, 31, 209–214. [Google Scholar] [CrossRef]
- Chen, Y.J.; Albarqawi, A.; Chen, C.S. Enhancing Clinical Decision-Making: Integrating Multi-Agent Systems with Ethical AI Governance. arXiv 2025. [Google Scholar] [CrossRef]
- Ayub, U.; Naqvi, S.A.A.; Jajja, S.A.; et al. A large language model (LLM)-based multi-agent framework for risk stratification and treatment recommendations in localized prostate cancer (locPCa).
- Liu, Z.; Xiao, L.; He, M.; Zhu, R.; Yang, H.; Chen, J. PICOAS: A clinical knowledge linking model for delivering up-to-date, interrelated, and personalized decision support. In 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); IEEE: New York, NY, USA, 2024; pp. 6589–6596. [Google Scholar] [CrossRef]
- Liu, S.; Huang, S.S.; McCoy, A.B.; Wright, A.P.; Horst, S.; Wright, A. Optimizing Order Sets With a Large Language Model–Powered Multiagent System. JAMA Netw. Open 2025, 8, e2533277. [Google Scholar] [CrossRef] [PubMed]
- Alam, H.M.T.; Srivastav, D.; Kadir, M.A.; Sonntag, D. Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG. In Advances in Information Retrieval; Springer: Cham, Switzerland, 2025; Volume 15574, pp. 201–209. [Google Scholar] [CrossRef]
- Yi, Z.; Liu, J.; Xiao, T.; Albert, M.V. A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering. arXiv 2025. [Google Scholar] [CrossRef]
- Zhang, Z.; Lee, K.; Jing, P.; et al. GEMA-Score: Granular Explainable Multi-Agent Scoring Framework for Radiology Report Evaluation. arXiv 2025. [Google Scholar] [CrossRef]
- Bani-Harouni, D.; Navab, N.; Keicher, M. MAGDA: Multi-agent Guideline-Driven Diagnostic Assistance. In Foundation Models for General Medical AI; Lecture Notes in Computer Science; Deng, Z., Shen, Y., Kim, H.J., Jeong, W.-K., Aviles-Rivero, A.I., He, J., Zhang, S., Eds.; Springer Nature: Cham, Switzerland, 2025; Volume 15184, pp. 163–172. [Google Scholar] [CrossRef]
- Hu, X.; Huang, J.; Liu, M.; et al. FetalAgents: A Multi-Agent System for Fetal Ultrasound Image and Video Analysis. arXiv 2026. [Google Scholar] [CrossRef]
- Ghezloo, F.; Seyfioglu, M.S.; Soraki, R.; et al. PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology. arXiv 2025. [Google Scholar] [CrossRef]
- Chen, C.; Weishaupt, L.L.; Williamson, D.F.K.; et al. Evidence-based diagnostic reasoning with multi-agent copilot for human pathology.
- Seyfioglu, M.S. Towards Autonomous Histopathological Diagnosis: An End-to-End Multi-Agent AI Framework for Diagnostic Decision-Making and Interpretation.
- Yi, Z.; Xiao, T.; Albert, M.V. A Multimodal Multi-Agent Framework for Radiology Report Generation. arXiv 2025. [Google Scholar] [CrossRef]
- Li, J.; Zhou, T.; Zhou, Z.; et al. Experience-guided multi-agent interpretable framework for radiology report summarization. Comput. Methods Programs Biomed. 2026, 273, 109078. [Google Scholar] [CrossRef]
- Wang, Z.; Zhu, Y.; Zhao, H.; et al. ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration. In Proceedings of the ACM on Web Conference 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 2250–2261. [Google Scholar] [CrossRef]
- Li, R.; Wang, X.; Berlowitz, D.; Mez, J.; Lin, H.; Yu, H. CARE-AD: A multi-agent large language model framework for Alzheimer’s disease prediction using longitudinal clinical notes. npj Digit Med. 2025, 8, 541. [Google Scholar] [CrossRef] [PubMed]
- Gawade, S.; Akhouri, S.; Kulkarni, C.; et al. Multi Agent based Medical Assistant for Edge Devices.
- Yao, Z.; Chafekar, T.; Wang, J.; et al. ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care. arXiv 2025. [Google Scholar] [CrossRef] [PubMed]
- Sun, B.; Hu, D. CTG-Insight: A Multi-Agent Interpretable LLM Framework for Cardiotocography Analysis and Classification. arXiv 2025. [Google Scholar] [CrossRef]
- Yao, T.; Xu, Y.; Wang, H.; Qiu, X.; Althoefer, K.; Qi, P. Multi-Agent Fuzzy Reinforcement Learning with LLM for Cooperative Navigation of Endovascular Robotics.
- Zhao, L.; Bai, J.; Bian, Z.; et al. Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery. arXiv 2025. [Google Scholar] [CrossRef]
- Chen, H.; Gutt, M.; Belker, O.A.; et al. Proof of concept for voice based MRI scanner control using large language models in real time guided interventions. Sci. Rep. 2025, 15, 31206. [Google Scholar] [CrossRef]
- Fang, C.; Yue, X.; Zhao, Z.; Guo, S. The Multi-Agentization of a Dual-Arm Nursing Robot Based on Large Language Models. Bioengineering 2025, 12, 448. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Z.; Yue, X.; Xie, J.; Fang, C.; Shao, Z.; Guo, S. A Dual-Agent Collaboration Framework Based on LLMs for Nursing Robots to Perform Bimanual Coordination Tasks. IEEE Robot Autom. Lett. 2025, 10, 2942–2949. [Google Scholar] [CrossRef]
- Ruiz Mejia, J.M.; Rawat, D.B. MedScrubCrew: A Medical Multi-Agent Framework for Automating Appointment Scheduling Based on Patient-Provider Profile Resource Matching. Healthcare 2025, 13, 1649. [Google Scholar] [CrossRef]
- Lu, M.; Ho, B.; Ren, D.; Wang, X. TriageAgent: Towards Better Multi-Agents Collaborations for Large Language Model-Based Clinical Triage. In Findings of the Association for Computational Linguistics: EMNLP 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 5747–5764. [Google Scholar] [CrossRef]
- Han, S.; Choi, W. Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments.
- Kim, Y.; Jeong, H.; Park, C.; et al. Tiered Agentic Oversight: A Hierarchical Multi-Agent System for Healthcare Safety. arXiv 2025. [Google Scholar] [CrossRef]
- Tu, T.; Schaekermann, M.; Palepu, A.; et al. Towards conversational diagnostic artificial intelligence. Nature 2025, 642, 442–450. [Google Scholar] [CrossRef]
- Akinseloyin, O.; Jiang, X.; Palade, V. An LLM-based Multi-Agent Collaborative Approach for Abstract Screening towards Automated Systematic Reviews.
- Wu, H.; Zhu, Y.; Wang, Z.; et al. EHRFlow: A Large Language Model-Driven Iterative Multi-Agent Electronic Health Record Data Analysis Workflow.
- Angulo, J.; Yeste, V. Notebook for the BioASQ Task 13b Lab at CLEF 2025.
- Israni, M.; Renuse, S.; V, P. AutoMed: Multi-Agent AI System for Personalized Medical Knowledge Retrieval and Summarization. In 2025 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI); IEEE: New York, NY, USA, 2025; pp. 1–6. [Google Scholar] [CrossRef]
- Gorenshtein, A.; Shihada, K.; Sorka, M.; Aran, D.; Shelly, S. LITERAS: Biomedical literature review and citation retrieval agents. Comput. Biol. Med. 2025, 192, 110363. [Google Scholar] [CrossRef]
- Li, H.; Pan, W.; Rajendran, S.; Zang, C.; Wang, F. TrialGenie: Empowering Clinical Trial Design with Agentic Intelligence and Real World Data.
- Yue, L.; Xing, S.; Chen, J.; Fu, T. ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning. In Proceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics; ACM: New York, NY, USA, 2024; pp. 1–10. [Google Scholar] [CrossRef]
- Moran, J. EAGLE-AI: A large language model workflow for automated extraction and scoring of literature evidence linking genes to autism spectrum disorder.
- Wysocki, O.; Wysocka, M.; Jacobo, M.; Unsworth, H.; Freitas, A. Biomedical reasoning in action: Multi-agent System for Auditable Biomedical Evidence Synthesis. arXiv 2025. [Google Scholar] [CrossRef]
- Livieratos, A. MetaMind: A Multi-Agent Transformer-Driven Framework for Automated Network Meta-Analyses.
- Wei, H.; Qiu, J.; Yu, H.; Yuan, W. MEDCO: Medical Education Copilots Based on A Multi-Agent Framework.
- Sangwon, K.L. A Multi-AI Agent Framework for Interactive Neurosurgical Education and Evaluation: From Vignettes to Virtual Conversations.
- Awasthi, A.; Chang, B.V.; Vu, A.M.; et al. MAARTA:Multi-Agentic Adaptive Radiology Teaching Assistant.
- Sangwon, K.L. Evaluating Large Language Model Diagnostic Performance on JAMA Clinical Challenges via a Multi-Agent Conversational Framework.
- Altermatt, F.R.; Neyem, A.; Sumonte, N.; Mendoza, M.; Villagran, I.; Lacassie, H.J. Performance of single-agent and multi-agent language models in Spanish language medical competency exams. BMC Med. Educ. 2025, 25, 666. [Google Scholar] [CrossRef]
- Lim, E.; He, Y.V.; Joselowitz, J.; et al. MATRIX: Multi-Agent simulaTion fRamework for safe Interactions and conteXtual clinical conversational evaluation. arXiv 2025. [Google Scholar] [CrossRef]
- Giuffre, M.; Kresevic, S.; Ajcevic, M.; Crocè, L.; Shung, D. Large Language Model Agent-Based Framework for automated Treatment Prescription in Patients with Chronic Hepatitis C Virus Infection. Dig. Liver Dis. 2025, 57, S46–S47. [Google Scholar] [CrossRef]
- Sabel, J.; Wingren, M.; Lundell, A.; Andersson, S. Medication counseling with large language models: Improving self-evaluation through multi-agent systems.
- Stein, S.; Pilgermann, M.; Weber, S.; Sedlmayr, M. Leveraging MDS2 and SBOM data for LLM-assisted vulnerability analysis of medical devices. Comput Struct. Biotechnol. J. 2025, 28, 267–280. [Google Scholar] [CrossRef]
- Chen, Z.; Peng, Z.; Liang, X.; et al. MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways. arXiv 2025. [Google Scholar] [CrossRef]
- Lee, Y.; Wang, X.; Yang, C.C. Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture. arXiv 2025. [Google Scholar] [CrossRef]
- Klang, E.; Omar, M.; Raut, G.; et al. Orchestrated multi agents sustain accuracy under clinical-scale workloads compared to a single agent. npj Health Syst. 2026, 3, 23. [Google Scholar] [CrossRef]
- Ke, Y.; Yang, R.; Lie, S.A.; et al. Mitigating Cognitive Biases in Clinical Decision-Making Through Multi-Agent Conversations Using Large Language Models: Simulation Study. J. Med. Internet Res. 2024, 26, e59439. [Google Scholar] [CrossRef]
- Liu, P.R.; Bansal, S.; Dinh, J.; et al. MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models. In 2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR); IEEE: New York, NY, USA, 2025; pp. 456–462. [Google Scholar] [CrossRef]
- Wang, Q.; Wang, Z.; Li, M.; et al. A feasibility study of automating radiotherapy planning with large language model agents. Phys. Med. Biol. 2025, 70, 075007. [Google Scholar] [CrossRef] [PubMed]
- De Maio, C.; Fenza, G.; Furno, D.; Grauso, T.; Loia, V. Privacy-Preserving Healthcare Data Interactions: A Multi-Agent Approach Using LLMs. JCOMSS 2025, 21, 13–22. [Google Scholar] [CrossRef]
- Chen, K.; Zhen, T.; Wang, H.; et al. MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems. arXiv 2025. [Google Scholar] [CrossRef]
- Chan, T.K.; Dinh, N.D. ENTAgents: AI Agents for Complex Knowledge Otolaryngology. medRxiv 2025. [Google Scholar] [CrossRef]
- Feng, Y.; Wang, J.; Zhou, L.; Lei, Z.; Li, Y. DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue. arXiv 2025. [Google Scholar] [CrossRef]
- Xia, P.; Wang, J.; Peng, Y.; et al. MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning. arXiv 2026. [Google Scholar] [CrossRef]
- Zhuang, Y.; Jiang, W.; Zhang, J.; Yang, Z.; Zhou, J.T.; Zhang, C. Learning to Be A Doctor: Searching for Effective Medical Agent Architectures. arXiv 2025. [Google Scholar] [CrossRef]
- Croxford, E.; Gao, Y.; First, E.; et al. Automating Evaluation of AI Text Generation in Healthcare with a Large Language Model (LLM)-as-a-Judge.
- Zhao, H.; Zhu, Y.; Wang, Z.; Wang, Y.; Gao, J.; Ma, L. ConfAgents: A Conformal-Guided Multi-Agent Framework for Cost-Efficient Medical Diagnosis. arXiv 2025. [Google Scholar] [CrossRef]
- Wang, W.; Ma, Z.; Wang, Z.; et al. A Survey of LLM-based Agents in Medicine: How far are we from Baymax?
- Qiu, J.; Lam, K.; Li, G.; et al. LLM-based agentic systems in medicine and healthcare. Nat. Mach. Intell. 2024, 6, 1418–1420. [Google Scholar] [CrossRef]
- Ong, J.C.L.; Chang, S.Y.H.; William, W.; et al. Ethical and regulatory challenges of large language models in medicine. Lancet Digit. Health 2024, 6, e428–e432. [Google Scholar] [CrossRef]
- Ethics and Governance of Artificial Intelligence for Health: Large Multi-Modal Models. WHO Guidance, 1st ed.; World Health Organization: Geneva, Switzerland, 2024.
- Yagoubi, F.E.; Mallah, R.A.; Badu-Marfo, G. AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems. arXiv 2026. [Google Scholar] [CrossRef]
- Prakash, C.; Lind, M.; Sisodia, A. Agentic AI Governance and Lifecycle Management in Healthcare. arXiv 2026. [Google Scholar] [CrossRef]
- Kaissis, G.A.; Makowski, M.R.; Rückert, D.; Braren, R.F. Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2020, 2, 305–311. [Google Scholar] [CrossRef]
- Pati, S.; Kumar, S.; Varma, A.; et al. Privacy preservation for federated learning in health care. Patterns 2024, 5, 100974. [Google Scholar] [CrossRef]
- Juneja, G.; Pasupulati, J.N.S.; Albalak, A.; Hua, W.; Wang, W.Y. MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation. arXiv 2025. [Google Scholar] [CrossRef]
- Chen, Y.J.; Albarqawi, A.; Chen, C.S. Reinforcing Clinical Decision Support through Multi-Agent Systems and Ethical AI Governance. arXiv 2025. [Google Scholar] [CrossRef]
- Liu, X.; Glocker, B.; McCradden, M.M.; Ghassemi, M.; Denniston, A.K.; Oakden-Rayner, L. The medical algorithmic audit. Lancet Digit. Health 2022, 4, e384–e397. [Google Scholar] [CrossRef] [PubMed]
- Nouis, S.C.; Uren, V.; Jariwala, S. Evaluating accountability, transparency, and bias in AI-assisted healthcare decision- making: A qualitative study of healthcare professionals’ perspectives in the UK. BMC Med. Ethics 2025, 26, 89. [Google Scholar] [CrossRef]
- Kaissis, G.A.; Makowski, M.R.; Rückert, D.; Braren, R.F. Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2020, 2, 305–311. [Google Scholar] [CrossRef]
- Pati, S.; Kumar, S.; Varma, A.; et al. Privacy preservation for federated learning in health care. Patterns 2024, 5, 100974. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Cheng, X.; Zhang, X. Accurate Insights, Trustworthy Interactions: Designing a Collaborative AI-Human Multi-Agent System with Knowledge Graph for Diagnosis Prediction. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems; ACM: New York, NY, USA, 2025; pp. 1–15. [Google Scholar] [CrossRef]
- Tan, S.Y.; Sumner, J.; Wang, Y.; Wenjun Yip, A. A systematic review of the impacts of remote patient monitoring (RPM) interventions on safety, adherence, quality-of-life and cost-related outcomes. npj Digit Med. 2024, 7, 192. [Google Scholar] [CrossRef]
- Roberts, M.C.; Holt, K.E.; Del Fiol, G.; Baccarelli, A.A.; Allen, C.G. Precision public health in the era of genomics and big data. Nat. Med. 2024, 30, 1865–1873. [Google Scholar] [CrossRef]
- Rehan, M.W.; Rehan, M.M. Survey, taxonomy, and emerging paradigms of societal digital twins for public health preparedness. npj Digit Med. 2025, 8, 520. [Google Scholar] [CrossRef] [PubMed]
- Zhang, K.; Yang, X.; Wang, Y.; et al. Artificial intelligence in drug development. Nat. Med. 2025, 31, 45–59. [Google Scholar] [CrossRef]
- Xu, Z.; Ren, F.; Wang, P.; et al. A generative AI-discovered TNIK inhibitor for idiopathic pulmonary fibrosis: A randomized phase 2a trial. Nat. Med. 2025, 31, 2602–2610. [Google Scholar] [CrossRef]
- Tudor, B.H.; Shargo, R.; Gray, G.M.; et al. A scoping review of human digital twins in healthcare applications and usage patterns. npj Digit Med. 2025, 8, 587. [Google Scholar] [CrossRef] [PubMed]
- Kovatchev, B.P.; Colmegna, P.; Pavan, J.; et al. Human-machine co-adaptation to automated insulin delivery: A randomised clinical trial using digital twin technology. npj Digit Med. 2025, 8, 253. [Google Scholar] [CrossRef] [PubMed]
- Rosenthal, J.T.; Beecy, A.; Sabuncu, M.R. Rethinking clinical trials for medical AI with dynamic deployments of adaptive systems. npj Digit Med. 2025, 8, 252. [Google Scholar] [CrossRef]
- Butler, P.M.; Yang, J.; Brown, R.; et al. Smartwatch- and smartphone-based remote assessment of brain health and detection of mild cognitive impairment. Nat. Med. 2025, 31, 829–839. [Google Scholar] [CrossRef] [PubMed]
- Teodoro, D.; Naderi, N.; Yazdani, A.; Zhang, B.; Bornet, A. A Scoping Review of Artificial Intelligence Applications in Clinical Trial Risk Assessment. medRxiv 2025. [Google Scholar] [CrossRef]






| Feature | General-Purpose LLM | LLM-Based Single Agent | LLM-Based MAS |
|---|---|---|---|
| Core positioning | General-purpose foundation model | Agent encapsulated around a single goal | Collaborative system composed of multiple role-based agents |
| Task execution | One-shot generation or short-chain execution | Can plan and invoke tools, but the same agent undertakes the main subtasks | Decomposes complex tasks across different agents and integrates results through orchestration/negotiation |
| Context management | Mainly single-turn or short-session context | Can maintain task-level context and short-term memory | Can maintain shared states, private memories, and cross-role context |
| External knowledge/tool use | Mainly relies on prompting or attached knowledge bases | Can actively retrieve information and call APIs or tools | Different agents can invoke heterogeneous tools, databases, and knowledge sources according to role |
| Error control | Mainly relies on user review | Limited self-checking; vulnerable to single-point errors | Can reduce single-point errors through cross-review, voting, adjudication, and supervisory agents |
| Typical medical scenarios | Medical question answering, document generation, single-step explanation | Lightweight decision support and single-process automation | MDT-like diagnosis and treatment, complex workflows, dynamic monitoring, evidence synthesis |
| Main limitations | Lacks workflow and responsibility structures | Role mixing; easily degrades as workload increases | Complex system design, higher communication costs, more difficult evaluation and governance |
| Dataset/ Benchmark |
Data Source and Type | Main Task Scenario | Primary Capability Evaluated | Common Metrics | Representative References |
|---|---|---|---|---|---|
| JAMA Clinical Challenges | JAMA clinical challenge cases (815 selected from 1519 cases) | Dynamic diagnosis and progressive information disclosure | Inquiry organization, information acquisition, dynamic diagnostic convergence | Accuracy, dialogue rounds | [64] |
| MIMIC series (MIMIC-III/IV) | Real inpatient EHR, ICU data, and progress notes | Inpatient pathways, task orchestration, entity extraction, causal analysis, cost-sensitive diagnosis | Workflow continuity, shared-state modeling, operational efficiency | Accuracy, F1, AUC, token use, latency | [56,70,72,83] |
| PDSQI-9 + MIMIC-III/ProbSum | Medical document summaries and quality scales | Automated review of medical text-generation quality | Expert agreement and reliability of automated review | ICC, Krippendorff’s alpha, Gwet’s AC2 | [82] |
| VHA longitudinal clinical notes (CARE-AD) | Long-term medical records and clinical notes from the U.S. Veterans Health Administration | Early prediction of Alzheimer’s disease | Long-term memory, shared state, time-sliced prediction | Accuracy, Precision, Recall, F1 | [37] |
| VUMC real-world order sets | Real order sets and knowledge base from Vanderbilt University Medical Center | Order-set optimization and assessment of clinical executability | Output executability, human-machine agreement, expert filtering | Expert ratings, Cohen’s kappa | [25] |
| Clinically scaled mixed-task set | PubMed literature, EHR discharge summaries, and dosage-calculation tasks | Mixed workload involving retrieval, extraction, and calculation | Stability, scalability, operational efficiency | Accuracy, latency, token cost | [72] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).