Submitted:
04 June 2026
Posted:
05 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A faithful, deterministic, seeded re-implementation of a hybrid RRT + Behavior Cloning decentralized navigation framework that reuses the original trained BC network, RRT planner, and route datasets, with the non-scalable multiprocessing prototype replaced by a reproducible single-process scheduler (code and raw results provided with the submission).
- A quantification of environment-specific skill transfer: an environment-adapted route library cuts collisions by 37–85% (Mann–Whitney U on per-episode collision counts, all ) and task failures by 43–69% relative to a Map2-naive library across five fleet sizes (2, 4, 6, 8, 10 robots), recovering most of the navigation quality of an online RRT baseline at a fraction of its planning cost.
- A repeated-seed evaluation of collision-history sharing—indirect coordination through a shared map, with no direct messaging—within the full framework, finding only a limited benefit (full participation: a nominally significant 13% fewer collisions on Map1, , which does not survive Holm correction; selective participation and a second layout: not significant). This corrects the originating thesis’s preliminary finding—a near-halving of collisions inferred from a single unseeded run—and shows that the effect is undetectable at low task throughput, where too few tasks accumulate to populate the shared map; we document this regime explicitly.
- A transparent, reproducible methodology in which every figure and number regenerates from the provided code and raw results, including the route-filtering and throughput conditions that determine whether each effect appears.
2. Related Work
3. Materials and Methods
3.1. Environment and Metrics
3.2. Hybrid RRT + Behavior Cloning Framework
| Algorithm 1:Per-robot decision step, run independently by each robot i on every scheduler tick. A predicted blocked-move is a counted event (Section 3.1), not a physical crash, and does not by itself halt the robot. |
|
3.3. Route Libraries and Skill Transfer
3.4. Collision-History Sharing
3.5. Experimental Design
4. Results
4.1. Environment-Adapted Route Libraries Transfer the Navigation Skill
4.2. Throughput, Collision Composition, and Planning Cost Across Fleet Density
4.3. Collision-History Sharing Gives Only a Limited Benefit
5. Discussion
5.1. What Governs Navigation Quality
5.2. The Role of Collision-History Sharing
5.3. Generality and Relation to Planning–Learning Hybrids
5.4. Limitations
5.5. Future Work
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hekmati, A.; Gupta, K. On Safe Robot Navigation Among Humans as Dynamic Obstacles in Unknown Indoor Environments. In 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO); IEEE: Kuala Lumpur, Malaysia, 2018; pp. 1082–1087. [Google Scholar] [CrossRef]
- Silva, S.; Verdezoto, N.; Paillacho, D.; Millan-Norman, S.; Hernández, J.D. Online Social Robot Navigation in Indoor, Large and Crowded Environments. In 2023 IEEE International Conference on Robotics and Automation (ICRA); IEEE: London, United Kingdom, 2023; pp. 9749–9756. [Google Scholar] [CrossRef]
- Han, R.; Chen, S.; Hao, Q. Cooperative Multi-Robot Navigation in Dynamic Environment with Deep Reinforcement Learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA); IEEE: Paris, France, 2020; pp. 448–454. [Google Scholar] [CrossRef]
- Xu, P.; Karamouzas, I. Human-Inspired Multi-Agent Navigation Using Knowledge Distillation. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), [cs]. 2021; pp. 8105–8112. [Google Scholar] [CrossRef]
- Wu, S.; Chen, G.; Shi, M.; Alonso-Mora, J. Decentralized Multi-Agent Trajectory Planning in Dynamic Environments with Spatiotemporal Occupancy Grid Maps, 2024. arXiv [cs]. arXiv:2404.15602. [CrossRef]
- Karaman, S.; Frazzoli, E. Incremental Sampling-Based Algorithms for Optimal Motion Planning, 2010. arXiv [cs]. arXiv:1005.0416. [CrossRef]
- Codevilla, F.; Santana, E.; López, A.M.; Gaidon, A. Exploring the Limitations of Behavior Cloning for Autonomous Driving, 2019. arXiv [cs]. arXiv:1904.08980. [CrossRef]
- Bojarski, M.; Testa, D.D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.; Muller, U.; Zhang, J.; Zhang, X.; Zhao, J.; Zieba, K. End to End Learning for Self-Driving Cars, 2016. arXiv [cs]. arXiv:1604.07316. [CrossRef]
- Zhang, R.; Hou, J.; Walter, F.; Gu, S.; Guan, J.; Röhrbein, F.; Du, Y.; Cai, P.; Chen, G.; Knoll, A. Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey, 2024. arXiv [cs]. arXiv:2408.09675. [CrossRef]
- Gao, Z.; Yang, G.; Prorok, A. Online Control Barrier Functions for Decentralized Multi-Agent Navigation. In 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS); IEEE: Boston, MA, USA, 2023; pp. 107–113. [Google Scholar] [CrossRef]
- Mestres, P.; Nieto-Granda, C.; Cortés, J. Distributed Safe Navigation of Multi-Agent Systems Using Control Barrier Function-Based Optimal Controllers, 2024. arXiv [eess]. arXiv:2402.06195. [CrossRef]
- Kondo, K.; Tewari, C.T.; Peterson, M.B.; Thomas, A.; Kinnari, J.; Tagliabue, A.; How, J.P. PUMA: Fully Decentralized Uncertainty-Aware Multiagent Trajectory Planner with Real-Time Image Segmentation-Based Frame Alignment, 2024. arXiv [cs]. arXiv:2311.03655. [CrossRef]
- Tang, J.; Duan, H.; Lao, S. Swarm Intelligence Algorithms for Multiple Unmanned Aerial Vehicles Collaboration: A Comprehensive Review. Artif. Intell. Rev. 2023, 56, 4295–4327. [Google Scholar] [CrossRef]
- Li, J.; Wang, K.; Chen, Z.; Wang, J. An Improved RRT* Path Planning Algorithm in Dynamic Environment. In Methods and Applications for Modeling and Simulation of Complex Systems; Series Title: Communications in Computer and Information Science; Fan, W., Zhang, L., Li, N., Song, X., Eds.; Springer Nature Singapore: Singapore, 2022; Vol. 1713, pp. 301–313. [Google Scholar] [CrossRef]
- Zhao, P.; Chang, Y.; Wu, W.; Luo, H.; Zhou, Z.; Qiao, Y.; Li, Y.; Zhao, C.; Huang, Z.; Liu, B.; Liu, X.; He, S.; Guo, D. Dynamic RRT: Fast Feasible Path Planning in Randomly Distributed Obstacle Environments. J. Intell. Robot. Syst. 2023, 107, 48. [Google Scholar] [CrossRef]
- Da Silva Costa, L.; Tonidandel, F. DVG+A* and RRT Path-Planners: A Comparison in a Highly Dynamic Environment. J. Intell. Robot. Syst. 2021, 101, 58. [Google Scholar] [CrossRef]
- Florence, P.; Lynch, C.; Zeng, A.; Ramirez, O.; Wahid, A.; Downs, L.; Wong, A.; Lee, J.; Mordatch, I.; Tompson, J. Implicit Behavioral Cloning, 2021. arXiv [cs]. arXiv:2109.00137. [CrossRef]
- Chi, Z.; Zhu, L.; Zhou, F.; Zhuang, C. A Collision-Free Path Planning Method Using Direct Behavior Cloning. In telligent Robotics and Applications; Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D., Eds.; Springer International Publishing: Cham, 2019; pp. 529–540. [Google Scholar] [CrossRef]
- Samak, T.V.; Samak, C.V.; Kandhasamy, S. Robust Behavioral Cloning for Autonomous Vehicles Using End-to-End Imitation Learning. SAE Int. J. Connect. Autom. Veh. 2021, 4, 12–04–03–0023. [Google Scholar] [CrossRef]
- Farag, W.; Saleh, Z. Behavior Cloning for Autonomous Driving Using Convolutional Neural Networks. In 2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT); IEEE: Sakhier, Bahrain, 2018; pp. 1–7. [Google Scholar] [CrossRef]
- Ran, L.; Zhang, Y.; Zhang, Q.; Yang, T. Convolutional Neural Network-Based Robot Navigation Using Uncalibrated Spherical Images. Sensors 2017, 17, 1341. [Google Scholar] [CrossRef]
- Pan, Z.; Manocha, D. Feedback Motion Planning for Liquid Pouring Using Supervised Learning. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Vancouver, BC, 2017; pp. 1252–1259. [Google Scholar] [CrossRef]
- Jia, B.; Manocha, D. Sim-to-Real Robotic Sketching Using Behavior Cloning and Reinforcement Learning. In 2024 IEEE International Conference on Robotics and Automation (ICRA); IEEE: Yokohama, Japan, 2024; pp. 18272–18278. [Google Scholar] [CrossRef]
- Dharmavaram, A.; Gupta, T.; Li, J.; Sycara, K.P. SS-MAIL: Self-Supervised Multi-Agent Imitation Learning, 2021. arXiv [cs]. arXiv:2110.08963. [CrossRef]
- Fang, B.; Zheng, C.; Wang, H. Fact-Based Agent Modeling for Multi-Agent Reinforcement Learning, 2023. arXiv [cs]. arXiv:2310.12290. [CrossRef]
- Strouse, D.J.; McKee, K.R.; Botvinick, M.; Hughes, E.; Everett, R. Collaborating with Humans without Human Data, 2021. arXiv [cs]. arXiv:2110.08176. [CrossRef]
- Hu, H.; Wu, D.J.; Lerer, A.; Foerster, J.; Brown, N. Human-AI Coordination via Human-Regularized Search and Learning, 2022. arXiv [cs]. arXiv:2210.05125. [CrossRef]
- Yu, H.; Hirayama, C.; Yu, C.; Herbert, S.; Gao, S. Sequential Neural Barriers for Scalable Dynamic Obstacle Avoidance. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Detroit, MI, USA, 2023; pp. 11241–11248. [Google Scholar] [CrossRef]
- Ye, G.; Alterovitz, R. Demonstration-Guided Motion Planning. In Robotics Research; Series Title: Springer Tracts in Advanced Robotics; Christensen, H.I., Khatib, O., Eds.; Springer International Publishing: Cham, 2017; Vol. 100, pp. 291–307. [Google Scholar] [CrossRef]
- Dalal, M.; Yang, J.; Mendonca, R.; Khaky, Y.; Salakhutdinov, R.; Pathak, D. Neural MP: A Generalist Neural Motion Planner. 2024. [Google Scholar] [CrossRef]
- Morga-Bonilla, S.I.; Rivas-Cambero, I.; Torres-Jiménez, J.; Téllez-Cuevas, P.; Núñez-Cruz, R.S.; Perez-Arista, O.V. Behavioral Cloning Strategies in Steering Angle Prediction: Applications in Mobile Robotics and Autonomous Driving. World Electr. Veh. J. 2024, 15, 486. [Google Scholar] [CrossRef]
- Zhan, E.; Zheng, S.; Yue, Y.; Lucey, P. Generative Multi-Agent Behavioral Cloning. 2018. [Google Scholar]
- Zhang, Z.; Hong, J.; Enayati, A.S.; Najjaran, H. Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning, 2024. arXiv [cs]. arXiv:2307.16062. [CrossRef]
- Desaraju, V.R.; How, J.P. Decentralized Path Planning for Multi-Agent Teams in Complex Environments Using Rapidly-Exploring Random Trees. In 2011 IEEE International Conference on Robotics and Automation; IEEE: Shanghai, China, 2011; pp. 4956–4961. [Google Scholar] [CrossRef]
- Zhaofeng, Y.; Ruizhe, Z. Path Planning of Multi-robot Cooperation for Avoiding Obstacle Based on Improved Artificial Potential Field Method. 2014; 165. [Google Scholar]
- Zhou, C.; Huang, B.; Fränti, P. A Review of Motion Planning Algorithms for Intelligent Robotics. 2021. [Google Scholar] [CrossRef]
- Yu, C.; Gao, S. Reducing Collision Checking for Sampling-Based Motion Planning Using Graph Neural Networks. 2021. [Google Scholar]
- Heng, H.; Ghazali, M.H.M.; Rahiman, W. Comparative Analysis of Navigation Algorithms for Mobile Robot. J. Ambient Intell. Humaniz. Comput. 2024, 15, 3861–3871. [Google Scholar] [CrossRef]
- Adiuku, N.; Avdelidis, N.P.; Tang, G.; Plastropoulos, A. Improved Hybrid Model for Obstacle Detection and Avoidance in Robot Operating System Framework (Rapidly Exploring Random Tree and Dynamic Windows Approach). Sensors 2024, 24, 2262. [Google Scholar] [CrossRef]
- Yamada, J.; Lee, Y.; Salhotra, G.; Pertsch, K.; Pflueger, M.; Sukhatme, G.S.; Lim, J.J.; Englert, P. Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments. 2020. [Google Scholar] [CrossRef]
- Bowen, C.; Alterovitz, R. Closed-Loop Global Motion Planning for Reactive, Collision-Free Execution of Learned Tasks. ACM Trans. Hum.-Robot Interact. 2018, 7, 1–16. [Google Scholar] [CrossRef]
- Verma, A.; Bagkar, S.; Allam, N.V.S.; Raman, A.; Schmid, M.; Krovi, V.N. Implementation and Validation of Behavior Cloning Using Scaled Vehicles; 2021; p. 2021–01–0248. [Google Scholar] [CrossRef]
- De Luca, A.; Muratore, L.; Tsagarakis, N.G. Autonomous Navigation With Online Replanning and Recovery Behaviors for Wheeled-Legged Robots Using Behavior Trees. IEEE Robot. Autom. Lett. 2023, 8, 6803–6810. [Google Scholar] [CrossRef]










| Component | Setting |
|---|---|
| Grid size | occupancy grid |
| Obstacle inflation | 1 cell (robot footprint) |
| Robot control | decentralized; no direct inter-robot messaging |
| Collision-history sharing | optional, via a shared map H (written and read; no messaging) |
| Collision definition | predicted blocked-move (three-step look-ahead) |
| BC input | robot position, target route waypoint, local occupancy window |
| BC architecture | 29-input MLP, hidden layers 256–256–64 |
| Episode budget | 2000 scheduler steps |
| Per-task step cap | 300 steps (else counted as a failure) |
| Online RRT iteration cap | 8000 (hybrid reconnect) / 3000 (online baseline) |
| Seeds per condition | 15 |
| Fleet sizes | |
| Sharing fraction | and a 0–1 sweep |
| Statistical tests | Mann–Whitney U, Spearman |
| Library | Generated on |
Raw routes |
Filtered routes |
Median length |
Free cells covered |
Cells in obstacles |
Routes overlapping |
|---|---|---|---|---|---|---|---|
| Map2-naive | Map1 | 3543 | 3304 | 33 | 1207 | 8.1% | 58% |
| Map2-adapted | Map2 | 3592 | 3256 | 37 | 1363 | 0.0% | 0% |
| Experiment | Map | Library | N | Steps | |
|---|---|---|---|---|---|
| Skill transfer | Map2 | naive vs. adapted | 2,4,6,8,10 | 0 | 2000 |
| Online RRT baseline | Map2 | none (online) | 2,4,6,8,10 | — | 2000 |
| Sharing ablation | Map1, Map2 | native | 5 | 0, 0.6, 1.0 | 2000 |
| Access-fraction sweep | Map1 | native | 5 | 0–1 (step ) | 2000 |
| Sharing scalability | Map1 | native | 2,4,6,8,10 | 0, 0.6 | 2000 |
| Throughput dependence | Map1 | native | 5 | 0, 0.6, 1.0 | 300–2000 |
| N | Method | Tasks | Collisions | Coll./task [95% CI] | Fail rate | Reduction | p | |
|---|---|---|---|---|---|---|---|---|
| 2 | Map2-naive hybrid | 93 | 3.05 [2.86, 3.22] | 0.20 | — | — | — | |
| Map2-adapted hybrid | 99 | 0.42 [0.34, 0.52] | 0.06 | 85% | ||||
| Online RRT | 120 | 0.23 [0.19, 0.28] | 0.01 | 90% | ||||
| 4 | Map2-naive hybrid | 177 | 4.00 [3.75, 4.26] | 0.23 | — | — | — | |
| Map2-adapted hybrid | 191 | 1.08 [0.98, 1.19] | 0.09 | 71% | ||||
| Online RRT | 221 | 0.82 [0.75, 0.88] | 0.04 | 74% | ||||
| 6 | Map2-naive hybrid | 254 | 4.98 [4.76, 5.17] | 0.26 | — | — | — | |
| Map2-adapted hybrid | 276 | 2.09 [1.92, 2.26] | 0.12 | 54% | ||||
| Online RRT | 312 | 1.37 [1.28, 1.47] | 0.06 | 66% | ||||
| 8 | Map2-naive hybrid | 327 | 5.97 [5.63, 6.36] | 0.29 | — | — | — | |
| Map2-adapted hybrid | 353 | 2.92 [2.76, 3.10] | 0.16 | 47% | ||||
| Online RRT | 399 | 1.98 [1.90, 2.06] | 0.09 | 59% | ||||
| 10 | Map2-naive hybrid | 394 | 6.83 [6.62, 7.06] | 0.32 | — | — | — | |
| Map2-adapted hybrid | 426 | 3.99 [3.84, 4.15] | 0.18 | 37% | ||||
| Online RRT | 468 | 2.69 [2.60, 2.80] | 0.12 | 53% |
| N | Method | Static-obstacle/task | Robot–robot/task | Static fraction |
|---|---|---|---|---|
| 2 | Map2-naive hybrid | 2.69 | 0.36 | 88% |
| Map2-adapted hybrid | 0.00 | 0.42 | 0% | |
| Online RRT | 0.00 | 0.23 | 0% | |
| 4 | Map2-naive hybrid | 2.70 | 1.30 | 68% |
| Map2-adapted hybrid | 0.00 | 1.08 | 0% | |
| Online RRT | 0.00 | 0.82 | 0% | |
| 6 | Map2-naive hybrid | 2.77 | 2.21 | 56% |
| Map2-adapted hybrid | 0.00 | 2.09 | 0% | |
| Online RRT | 0.00 | 1.37 | 0% | |
| 8 | Map2-naive hybrid | 2.89 | 3.09 | 48% |
| Map2-adapted hybrid | 0.00 | 2.92 | 0% | |
| Online RRT | 0.00 | 1.98 | 0% | |
| 10 | Map2-naive hybrid | 2.90 | 3.93 | 43% |
| Map2-adapted hybrid | 0.00 | 3.99 | 0% | |
| Online RRT | 0.00 | 2.69 | 0% |
| N | Method | Runtime (s) | RRT calls/task | Coll./task | Fail rate |
|---|---|---|---|---|---|
| 2 | Map2-naive hybrid | 0.07 | 3.05 | 0.20 | |
| Map2-adapted hybrid | 0.02 | 0.42 | 0.06 | ||
| Online RRT | 1.16 | 0.23 | 0.01 | ||
| 4 | Map2-naive hybrid | 0.08 | 4.00 | 0.23 | |
| Map2-adapted hybrid | 0.03 | 1.08 | 0.09 | ||
| Online RRT | 1.44 | 0.82 | 0.04 | ||
| 6 | Map2-naive hybrid | 0.12 | 4.98 | 0.26 | |
| Map2-adapted hybrid | 0.04 | 2.09 | 0.12 | ||
| Online RRT | 1.74 | 1.37 | 0.06 | ||
| 8 | Map2-naive hybrid | 0.14 | 5.97 | 0.29 | |
| Map2-adapted hybrid | 0.05 | 2.92 | 0.16 | ||
| Online RRT | 2.06 | 1.98 | 0.09 | ||
| 10 | Map2-naive hybrid | 0.16 | 6.83 | 0.32 | |
| Map2-adapted hybrid | 0.05 | 3.99 | 0.18 | ||
| Online RRT | 2.45 | 2.69 | 0.12 |
| Map | Collisions | Coll./task | Tasks | Change | p | Significant | ||
|---|---|---|---|---|---|---|---|---|
| Map1 | 0.0 | 1.42 | 255 | — | — | — | — | |
| 0.6 | 1.41 | 248 | no | |||||
| 1.0 | 1.29 | 246 | no | |||||
| Map2 | 0.0 | 1.68 | 231 | — | — | — | — | |
| 0.6 | 1.64 | 231 | no | |||||
| 1.0 | 1.63 | 233 | no |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).