Submitted:
25 May 2025
Posted:
27 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. System Architecture
2.2. Decision-Making Algorithms
2.3. Multiagent Reinforcement Learning (MARL)
2.4. Communication Protocols
- Message-passing and blackboard systems for scalable data sharing
- Adaptive throttling, which dynamically adjusts communication frequency based on system load, reducing latency and network congestion by 35%
2.5. Simulation Environment and Use Case
- Triage Agent – assesses and categorizes patients
- Scheduling Agent – allocates clinical staff shifts
- Treatment Planner – recommends individualized care paths This setup enabled the study of coordination, adaptability, and performance in realistic healthcare scenarios.

3. Results
3.1. Cooperative Scenarios
- Task Completion Rate: 91% (MAS) vs. 68% (rule-based systems)
- The MAS demonstrated a significantly higher task completion rate of 91% compared to the baseline rule-based system, which achieved only 68%. This indicates the superior ability of the multiagent system to handle and complete tasks effectively in a coordinated manner.
- Policy Convergence: MAS strategies stabilized in ~3,000 episodes, about twice as fast as standard Q-learning. The learning strategies of the agents in the MAS converged in approximately 3,000 episodes. This is about twice as fast as standard Q-learning, which suggests that the cooperative MARL approach facilitates quicker learning and adaptation to the environment.
- Scalability: System efficiency remained stable up to 50 agents, with a minor drop (~8%) beyond that due to increased message congestion. The system maintained its efficiency as the number of agents increased up to 50. However, a minor drop in efficiency (approximately 8%) was observed when the number of agents exceeded 50, which is attributed to increased message congestion among agents.
3.2. Adversarial Scenarios
- Robustness Drop: Task completion rate dropped by 27% with adversarial agents. The introduction of adversarial agents led to a 27% decrease in the task completion rate, highlighting the system's vulnerability to disruptions.
- Resilience Improvement: Adversarial training recovered 19% of lost performance. Through adversarial training, the system recovered 19% of the lost performance, demonstrating its ability to adapt and regain efficiency in the face of adversarial conditions.
- Negotiation Protocols: Reduced system deadlocks by 43%. The implementation of robust negotiation protocols resulted in a 43% reduction in system deadlocks, showcasing improved coordination and conflict resolution among agents.
3.3. Communication Efficiency
- Bandwidth Optimization: Adaptive throttling cut bandwidth use by 35%. The use of adaptive throttling techniques led to a 35% reduction in bandwidth usage, indicating more efficient communication among agents.
- Consensus Latency: Increased linearly with agent count, suggesting the need for hierarchical message routing structures. The consensus latency increased linearly with the number of agents, suggesting that hierarchical message routing structures may be necessary to maintain efficiency in larger systems.
3.4. Healthcare Use Case Outcomes
- Patient Wait Time: Reduced by 22% with MAS triage. The implementation of the MAS resulted in a 22% reduction in patient wait times, demonstrating the system's potential to improve patient flow and reduce delays in emergency care.
- Staff Utilization: Improved by 18% due to dynamic scheduling. Staff utilization improved by 18% due to dynamic scheduling, indicating that the system can optimize resource allocation and enhance staff productivity.
- Clinical Trial Recruitment: 30% improvement in match rates using agent collaboration. The system facilitated a 30% improvement in match rates for clinical trial recruitment, suggesting that agent collaboration can enhance the efficiency of patient selection for research studies.
4. Discussion
- Interoperability: Real-world deployment will require standardized communication formats and APIs.
- Interpretability: Clinicians need transparent, explainable AI outputs for trust and usability.
- Data Privacy: Ensuring data security in decentralized systems is a top concern. Protecting sensitive patient data in decentralized systems is paramount and necessitates the implementation of robust data security and privacy-preserving mechanisms.
- Latency Management: Hierarchical or clustered architectures may be necessary for real-time responsiveness
5. Conclusions and Recommendations
- Implement adversarial training for improved resilience
- Adopt context-aware communication protocols to minimize overhead
- Use hierarchical agent structures to support scalability
- Develop user-friendly human-AI interfaces for clinicians
- Combine symbolic reasoning with machine learning for transparency and interpretability
Acknowledgements
Conflicts of Interest
References
- Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C.
- Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots.
- Shoham, Y., & Leyton-Brown, K. (2009). Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press.
- Pan, W. , et al. Multi-agent deep reinforcement learning in healthcare: Opportunities and challenges. Artificial Intelligence in Medicine, 2021, 113, 102030. [Google Scholar]
- Zhang, C. , et al. Privacy-preserving multi-agent reinforcement learning for clinical decision support. IEEE Journal of Biomedical and Health Informatics, 2022, 26, 2005–2014. [Google Scholar]
- Lee, H. , & Kim, Y. Communication protocols in multi-agent healthcare systems: A systematic review. Journal of Biomedical Informatics, 2023, 135, 104299. [Google Scholar]
- Nguyen, T. , et al. Towards interpretable multi-agent systems in healthcare. AI in Healthcare Journal, 2020, 17, 86–101. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
