Submitted:
07 April 2026
Posted:
08 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. What Is Physical AI?
Fundamental challenges.
1.2. Evolution Toward Physical AI
1.3. Scope, Contributions, and Organisation
- 1.
- Full-stack integration perspective. We trace the complete Physical AI pipeline from multimodal sensing through edge hardware, world models, foundation policies, and fleet deployment (Figure 1), exposing cross-cutting dependencies—between simulation fidelity and policy robustness, between hardware cost curves and deployment breadth, between governance frameworks and scalability—that single-topic surveys cannot capture.
- 2.
- Deployment-grounded analysis. We systematically document commercial deployments with measured outcomes across logistics, manufacturing, autonomous vehicles, and emerging domains, and propose a four-phase maturity taxonomy (Section 7) that characterises adoption trajectories from research prototypes to fleet-scale operations.
- 3.
- Critical gap identification and roadmap. We identify six coupled challenges—data ecosystems, sim-to-real resilience, lifelong adaptation, safety assurance, workforce integration, and sustainable hardware—and anchor a concrete research roadmap to regulatory milestones through 2030 (Section 8).
Survey methodology.
2. Hardware Platforms and System Architectures
2.1. Platform Landscape and Cost Curves
2.2. Integrated Sensing and Actuation Stacks
2.3. Edge Compute and Neuromorphic Accelerators
2.4. Simulation Infrastructure and Digital Twins
2.5. Middleware: ROS 2 and Alternatives
2.6. Hardware-Accelerated Perception Pipelines
3. World Modeling and Reasoning
3.1. Mapping and Localization
3.2. 3D Scene Understanding and Semantic Mapping
3.3. Intuitive Physics and Causal Reasoning
3.4. Planning and Decision-Making
- Hierarchical planners break down tasks into abstract skills and low-level execution, yielding interpretable plans and allowing reuse of primitives across tasks. However, hierarchical planners may struggle with continuous refinement when subtask boundaries are ambiguous.
- Optimization-based planning formulates motion generation as a constrained optimization problem. Methods such as RRT [59] and CHOMP [60] can find feasible trajectories while satisfying dynamics and collision constraints. These approaches provide smooth trajectories but require accurate models and can be computationally expensive. ABB’s collaborative robots use CHOMP-based planning to safely navigate shared workspaces with human operators.
- Morphology-based planning accounts for changes in the robot’s physical configuration. Some tasks benefit from altering morphology—for example, a robot that can switch between walking and rolling or adjust limb length. Morphology planning has been explored with adaptive shape-shifting robots and multi-modal locomotion.
- Vision- or transformer-based planning leverages large neural networks for end-to-end policy generation. CLIPort [63] combines CLIP’s semantic understanding with Transporter Networks to map visual and language inputs directly to spatial action primitives for tabletop manipulation. Vision-language transformers operate on latent representations rather than explicit state spaces and are trained with large datasets, enabling generalization across tasks and environments.
| Planning Method | Key Strengths | Main Limitations | Primary Applications |
|---|---|---|---|
| Hierarchical | Interpretable plans, reusable skills | Ambiguous subtask bounds | Task decomposition, long-horizon |
| Optimization-based (RRT, CHOMP) | Smooth trajectories, constraints | Accurate models required, compute-heavy | Manipulation, motion planning |
| Morphology-based | Adaptive form, multi-modal locomotion | Complex control | Shape-shifting, terrain adaptation |
| Vision-Transformer (CLIPort) | End-to-end learning, generalizes | Black-box, data-hungry | Vision-language, manipulation |
3.5. Simulation and Digital Twins
3.6. World Foundation Models
3.7. Systems-Level Simulation and Operations Optimization
4. Perception, VLMs/VLAs, and Generalist Policies
4.1. Perception Foundations
4.2. LLMs for Robot Planning
Interactive dialogue and code as action.
4.3. Vision-Language-Action Models
4.4. Generalist Agents and Cross-Embodiment Policies
Design tradeoff analysis.
4.5. Benchmarking and Evaluation
5. Learning in Physical AI
5.1. Reinforcement Learning
Critical analysis.
5.1.1. Scaling RL: Fleet Learning and Closed-Loop Control
5.1.2. Multi-Robot Coordination and Multi-Agent Learning
5.1.3. RL in Dynamic and Contact-Rich Settings
5.1.4. Legged Locomotion and Whole-Body Control
5.1.5. Dexterous Manipulation and In-Hand Skills
5.2. Imitation Learning and Learning from Demonstrations
5.3. Self-Supervised and Unsupervised Learning
Learning from human video.
| Paradigm | Data Source | Efficiency | Reward | Best Suited For |
|---|---|---|---|---|
| Reinforcement Learning | Autonomous Interaction | Low | Required | Optimization, dynamics |
| Imitation Learning | Expert Demos | High | Not needed | Fast bootstrap, complex behavior |
| Self-Supervised | Autonomous Interaction | Medium | Not needed | Representation learning |
| Hybrid (IL + RL) | Demos + Interaction | Med-High | Minimal | Robust deployment |
5.4. Sim-to-Real Transfer and Domain Adaptation
5.5. Generalization and Transfer Learning
5.6. Cross-Embodiment Learning and Large-Scale Robot Datasets
The data bottleneck: quantity, quality, and modality
6. Safety, Ethics, and Deployment
6.1. Assurance for Model-Mediated Control
6.2. Governance, Oversight and Transparency
6.3. Operational Risk Management
6.4. Societal and Workforce Considerations
6.5. MLOps and Deployment Infrastructure
6.6. Fleet Learning and Continuous Improvement
6.7. Operations, Testing and Rollout Automation
7. Applications and Case Studies
7.1. Humanoids and Flexible Assembly
7.2. Warehousing, Logistics and Retail
7.3. Autonomous Mobility and Field Robotics
7.4. Science, Healthcare and Hazardous Environments
| Industry | Organization | Technology Stack | Measured Outcomes |
|---|---|---|---|
| Manufacturing and Assembly | |||
| Automotive | BMW + Figure | Figure 02, VLA | Multi-year pilot; material handling |
| Automotive | Tesla | Optimus | Battery handling; internal deploy |
| Retail/Service | Sanctuary AI | Phoenix, dexterous manip. | Back-of-store ops; SKU handling |
| Research | Unitree | R1 ($5.9k) | Research labs; whole-body control |
| Warehousing and Logistics | |||
| Fulfillment | Amazon | Isaac Sim, FMs | 1M+ robots; workflow validation |
| Picking | Covariant | RFM-1 VLA | Novel SKU recognition, cluttered bins |
| Distribution | DHL | VLA picking | 40% throughput gain; low errors |
| Sortation | FedEx | AMRs + AI | 100k+ pkg/hr; sub-1% misroutes |
| Autonomous Mobility and Field Ops | |||
| Auto Driving | Wayve | RouteDrive, E2E | Fleet learning: Asda, Post, Nissan |
| Agriculture | John Deere | Auto tractors, VLM | cm-level GPS; NL task spec |
| Construction | Built Robotics | Quadrupeds, excavators | 60% survey cost cut; safety gains |
| Laboratory and Healthcare | |||
| Lab Auto | GAMORA | LLM liquid handlers | Closed-loop expts; NL goals |
7.5. Deployment Maturity and Industry Trajectories
8. Challenges, Open Problems and Outlook
8.1. Data, Benchmarks and Evaluation
8.2. Sim-to-Real Transfer and the Gen2Real Gap
8.3. Robustness and Continual Learning
8.4. Safety Assurance and Governance
8.5. Hardware, Energy and Supply Chains
8.6. Limitations of Current Systems
8.7. Near-Term Outlook and Integration Priorities
8.8. Future Directions
- Resilient sim-to-real. Deliver self-healing simulation stacks that load telemetry from deployed fleets each night, tune contact and sensor parameters automatically, and push validated updates back to production robots with provable guarantees on performance drift [116].
- Safety and assurance. Transition from ad hoc safety cases to continuously updated “living dossiers” that fuse simulation stress tests, runtime monitoring and governance checkpoints; regulators accept these dossiers as evidence for high-risk certification under acts such as the EU AI Act [147,149,170].
- Ethics and labour. Embed participatory design and workforce reskilling into deployment roadmaps so that automation augments rather than displaces frontline teams, supported by transparent reporting on job transitions and access to new technical roles [152].
- Sustainable hardware. Achieve circular supply chains for actuators, batteries and sensors, with recycling and remanufacturing targets codified into procurement; pair energy-aware planning with recyclable materials to halve embodied carbon relative to 2024 installations [169].
9. Conclusion
Acknowledgments
Appendix A. Sensor Technologies and Perception
| Sensor Type | Range | Accuracy | Weather | Cost | Primary Applications |
|---|---|---|---|---|---|
| RGB Camera | 0–50m | Medium | Poor | $ | Object recognition, inspection |
| RGB-D Camera | 0–6m | High | Poor | $$ | Bin picking, manipulation |
| Event Camera | 0–50m | High | Excellent | $$$ | High-speed tracking, drones |
| 2D LiDAR | 0–30m | Very High | Good | $$ | Navigation, SLAM, pallets |
| 3D LiDAR | 0–300m | Very High | Good | $$$$ | Autonomous vehicles, mapping |
| Radar (77GHz) | 0–300m | Medium | Excellent | $$ | All-weather obstacle detection |
| Tactile Array | Contact | Very High | N/A | $$ | Manipulation, force feedback |
Appendix A.1. Frontier Sensing Modalities
RGB-D and structured-light cameras
Event cameras
Force-torque and tactile sensors
Appendix A.2. Sensor Fusion and Multi-Modal Architectures

References
- Moravec, H. Mind Children: The Future of Robot and Human Intelligence; Harvard University Press: Cambridge, MA, 1988.
- Figure AI. F.02 Contributed to the Production of 30,000 Cars at BMW, 2025. 11-month deployment trial at BMW Group Plant Spartanburg.
- Waymo. Meet the 6th-generation Waymo Driver. https://waymo.com/blog/2024/08/meet-the-6th-generation-waymo-driver, 2024. Autonomous vehicle sensor suite with 13 cameras, 4 LiDAR, 6 radar sensors. Accessed: 2025-10-31.
- Intuitive Surgical. 20 Million Patients Benefit from da Vinci Surgery Globally. GlobeNewsWire, 2026. Cumulative milestone of 20 million da Vinci procedures worldwide. Accessed: 2026-02-01.
- John Deere. Autonomous Tractor. https://www.deere.com/en/autonomous/, 2024. Fully autonomous tractor with GPS and vision systems for precision agriculture. Accessed: 2025-10-31.
- NVIDIA. NVIDIA Expands Omniverse With Generative Physical AI, 2025. Press release, January 6, 2025.
- Brooks, R.A. Intelligence without representation. Artificial Intelligence 1991, 47, 139–159. [Google Scholar] [CrossRef]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2017, pp. 23–30.
- Nilsson, N.J. Shakey the Robot. Technical Note 323, SRI International, Artificial Intelligence Center, Menlo Park, CA, 1984.
- Fikes, R.E.; Nilsson, N.J. STRIPS: A new approach to the application of theorem proving to problem solving. Artificial intelligence 1971, 2, 189–208. [Google Scholar] [CrossRef]
- Brooks, R.A. A robust layered control system for a mobile robot. IEEE Journal on Robotics and Automation 1986, 2, 14–23. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Todorov, E.; Erez, T.; Tassa, Y. MuJoCo: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 5026–5033. [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International conference on machine learning. PMLR, 2021, pp. 8748–8763.
- Ahn, M.; Brohan, A.; Brown, N.; Chebotar, Y.; Cortes, O.; David, B.; Finn, C.; Fu, C.; Gopalakrishnan, K.; Hausman, K.; et al. Do as i can, not as i say: Grounding language in robotic affordances. In Proceedings of the Conference on Robot Learning (CoRL). PMLR, 2023, Vol. 205, pp. 287–318.
- Liang, J.; Huang, W.; Xia, F.; Xu, P.; Hausman, K.; Ichter, B.; Florence, P.; Zeng, A. Code as policies: Language model programs for embodied control. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 9493–9500.
- Zitkovich, B.; Yu, T.; Xu, S.; Xu, P.; Xiao, T.; Xia, F.; Wu, J.; Wohlhart, P.; Welker, S.; Wahid, A.; et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In Proceedings of the Conference on Robot Learning. PMLR, 2023, pp. 2165–2183.
- Driess, D.; Xia, F.; Sajjadi, M.S.; Lynch, C.; Chowdhery, A.; Wahid, A.; Tompson, J.; Vuong, Q.; Yu, T.; Huang, W.; et al. PaLM-E: An Embodied Multimodal Language Model. In Proceedings of the Proceedings of the 40th International Conference on Machine Learning. PMLR, 2023, Vol. 202, pp. 8469–8488.
- Black, K.; Brown, N.; Driess, D.; Esmail, A.; Equi, M.; Finn, C.; Fusai, N.; Groom, L.; Hausman, K.; Ichter, B.; et al. π0: A Vision-Language-Action Flow Model for General Robot Control. In Proceedings of the Robotics: Science and Systems (RSS), 2025.
- Reed, S.; Zolna, K.; Parisotto, E.; Colmenarejo, S.G.; Novikov, A.; Barth-Maron, G.; Gimenez, M.; Sulsky, Y.; Kay, J.; Springenberg, J.T.; et al. A generalist agent. Transactions on Machine Learning Research (TMLR) 2022.
- Intrinsic. Unlocking New Value in Industrial Automation with AI. https://www.intrinsic.ai/, 2024. Accessed: 2025-10-30.
- Intrinsic and NVIDIA. NVIDIA and Google’s Intrinsic Developing Next-Generation Robots. https://roboticsandautomationnews.com/2024/05/16/nvidia-and-googles-intrinsic-developing-next-generation-robots/, 2024. Accessed: 2025-10-30.
- Cainiao. Alibaba’s Cainiao Launches Enterprise Smart Warehouse Solution. https://techwireasia.com/2022/03/alibabas-cainiao-launches-enterprise-smart-warehouse-solution/, 2022. Accessed: 2025-10-30.
- Baidu. Baidu Apollo Launches 6th-Gen Robotaxi with 60% Lower Cost. https://cnevpost.com/2024/05/15/baidu-apollo-launches-6th-gen-robotaxi/, 2024. Accessed: 2025-10-30.
- Glasner, J. The Year of Humanoid Robots. Crunchbase News, 2024. Reports $7.2 billion in robotics venture funding in 2024. Accessed: 2025-10-31.
- Kawaharazuka, K.; Oh, J.; Yamada, J.; Posner, I.; Zhu, Y. Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications. IEEE Access 2025, 13, 162467–162504. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, W.; Bai, Y.; Liang, X.; Li, G.; Gao, W.; Lin, L. Aligning Cyber Space With Physical World: A Comprehensive Survey on Embodied AI. IEEE/ASME Transactions on Mechatronics 2025, 30, 7253–7274. [Google Scholar] [CrossRef]
- Sapkota, R.; Cao, Y.; Roumeliotis, K.I.; Karkee, M. Vision-Language-Action (VLA) Models: Concepts, Progress, Applications and Challenges. arXiv 2025, arXiv:2505.04769. [Google Scholar]
- Sriram, A. Function over flash: Specialized robots attract billions with efficient task handling. https://www.reuters.com/business/finance/function-over-flash-specialized-robots-attract-billions-with-efficient-task-2025-05-22/, 2025. Accessed: 2025-10-28.
- International Federation of Robotics. Global Robot Demand in Factories Doubles Over 10 Years. https://ifr.org/ifr-press-releases/news/global-robot-demand-in-factories-doubles-over-10-years, 2025. Market forecast for industrial robot shipments. Accessed: 2025-10-31.
- MW Group. BMW Group Invests in AI Robotics Start-Up Figure. https://www.bmwgroup.com/en/news/general/2024/humanoid-robots.html, 2024. Partnership for humanoid robot deployment in manufacturing. Accessed: 2025-10-31.
- Unitree Robotics. Unitree R1 Humanoid Robot. https://www.unitree.com/R1/, 2025. Compact general-purpose humanoid robot (1.2 m, 25 kg, 26 joints) priced from $4,900 (R1 AIR). Unveiled July 2025. Accessed: 2025-10-31.
- Shakir, U. Tesla’s Optimus bot makes a scene at the robotaxi event. The Verge, 2024. Musk estimated Optimus retail price at $20,000–$30,000. Accessed: 2025-10-31.
- NVIDIA Corporation. NVIDIA Announces Omniverse Microservices to Supercharge Physical AI. https://www.globenewswire.com/news-release/2024/06/17/2899696/0/en/NVIDIA-Announces-Omniverse-Microservices-to-Supercharge-Physical-AI.html, 2024. Sensor RTX microservices for physically accurate sensor simulation. Announced June 2024. Accessed: 2025-10-31.
- NVIDIA Corporation. NVIDIA Blackwell-Powered Jetson Thor Now Available, Accelerating the Age of General Robotics. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-thor/, 2025. Platform delivering 2,070 FP4 TFLOPS for humanoid and mobile robots. Announced August 2025. Accessed: 2025-10-31.
- TrendForce. NVIDIA Jetson Thor Targets Advanced Humanoid Applications. https://www.trendforce.com/presscenter/news/20250826-12685.html, 2025. Market analysis of Jetson Thor performance improvements. Accessed: 2025-10-31.
- Qualcomm. Qualcomm Launches World’s First 5G and AI-Enabled Robotics Platform. https://www.qualcomm.com/news/releases/2020/06/qualcomm-launches-worlds-first-5g-and-ai-enabled-robotics-platform, 2020. Robotics RB5 platform with 5G connectivity and 15 TOPS AI performance. Accessed: 2025-10-31.
- Intel Labs. Intel Loihi 2 Neuromorphic Research Chip. https://www.intel.com/content/www/us/en/research/neuromorphic-computing-loihi-2-technology-brief.html, 2024. Second-generation neuromorphic processor for energy-efficient AI. Accessed: 2025-10-31.
- Intel Corporation. Intel Builds World’s Largest Neuromorphic System to Enable More Sustainable AI. https://newsroom.intel.com/artificial-intelligence/intel-builds-worlds-largest-neuromorphic-system-to-enable-more-sustainable-ai, 2024. 1.15 billion neuron neuromorphic system deployed at Sandia National Laboratories. Accessed: 2025-10-31.
- NVIDIA Corporation. NVIDIA Announces Omniverse Real-Time Physics Digital Twins With Industry Software Leaders. https://www.globenewswire.com/de/news-release/2024/11/18/2983079/0/en/NVIDIA-Announces-Omniverse-Real-Time-Physics-Digital-Twins-With-Industry-Software-Leaders.html, 2024. Digital twin platform transforming manufacturing with 1,200x faster simulations. Announced November 2024. Accessed: 2025-10-31.
- ROS 2 Control Contributors. Welcome to the ros2_control documentation! ROS 2 Control Documentation (Rolling), 2025. Accessed: September 29, 2025.
- Lumpp, F.; Panato, M.; Bombieri, N.; Fummi, F. A Design Flow Based on Docker and Kubernetes for ROS-based Robotic Software Applications. ACM Transactions on Embedded Computing Systems 2024, 23, 74:1–74:24. [Google Scholar] [CrossRef]
- Open Source Robotics Foundation. Jazzy Jalisco (jazzy) — ROS 2 Documentation. OSRF Technical Documentation, 2024.
- NVIDIA. NVIDIA Brings Generative AI Tools, Simulation and Perception Workflows to ROS Developer Ecosystem. NVIDIA Newsroom, 2024. Announcement for ROSCon 2024.
- NVIDIA. Isaac Perceptor for Autonomous Mobile Robot Development. Technical specification, NVIDIA Corporation, 2024.
- NVIDIA. NVIDIA Implementation of Type Adaptation and Negotiation (NITROS). NVIDIA Corporation, 2024. Isaac ROS Documentation.
- Thrun, S. Probabilistic robotics. Communications of the ACM 2002, 45, 52–57. [Google Scholar] [CrossRef]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE transactions on robotics 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Teed, Z.; Deng, J. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2021, Vol. 34, pp. 16558–16569.
- Ušinskis, V.; Nowicki, M.; Dzedzickis, A.; Bučinskas, V. Sensor-Fusion Based Navigation for Autonomous Mobile Robot. Sensors 2025, 25, 1248. [Google Scholar] [CrossRef] [PubMed]
- OpenAI. GPT-4V(ision) System Card. https://openai.com/index/gpt-4v-system-card/, 2023. Accessed: 2025-10-30.
- Jatavallabhula, K.M.; Kuwajerwala, A.; Gu, Q.; Omama, M.; Chen, T.; Maalouf, A.; Li, S.; Iyer, G.; Saryazdi, S.; Keetha, N.; et al. Conceptfusion: Open-set multimodal 3d mapping. In Proceedings of the Robotics: Science and Systems (RSS), 2023.
- Physical Intelligence. The π0 Foundation Model. https://www.physicalintelligence.company, 2024. Accessed: 2025-10-27.
- Coumans, E.; Bai, Y. PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2021.
- Battaglia, P.; Pascanu, R.; Lai, M.; Jimenez Rezende, D.; et al. Interaction networks for learning about objects, relations and physics. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2016, Vol. 29.
- Hasani, R.; Lechner, M.; Amini, A.; Rus, D.; Grosu, R. Liquid time-constant networks. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2021, Vol. 35, pp. 7657–7666.
- Chahine, M.; Hasani, R.; Kao, P.; Ray, A.; Shubert, R.; Lechner, M.; Amini, A.; Rus, D. Robust flight navigation out of distribution with liquid neural networks. Science Robotics 2023, 8, eadc8892. [Google Scholar] [CrossRef] [PubMed]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics 1968, 4, 100–107. [Google Scholar] [CrossRef]
- LaValle, S.M. Rapidly-exploring random trees: A new tool for path planning. Technical Report TR 98-11, Department of Computer Science, Iowa State University, 1998.
- Ratliff, N.; Zucker, M.; Bagnell, J.A.; Srinivasa, S. CHOMP: Gradient optimization techniques for efficient motion planning. In Proceedings of the 2009 IEEE international conference on robotics and automation. IEEE, 2009, pp. 489–494.
- Sutton, R.S. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin 1991, 2, 160–163. [Google Scholar] [CrossRef]
- Kaelbling, L.P.; Lozano-Pérez, T. Hierarchical task and motion planning in the now. In Proceedings of the 2011 IEEE international conference on robotics and automation. IEEE, 2011, pp. 1470–1477.
- Shridhar, M.; Manuelli, L.; Fox, D. Cliport: What and where pathways for robotic manipulation. In Proceedings of the Conference on robot learning. PMLR, 2021, pp. 894–906.
- NVIDIA. Isaac Sim: GPU-Accelerated Robot Simulation. NVIDIA Developer Documentation, 2024.
- Pixar Animation Studios. Universal Scene Description (OpenUSD). https://openusd.org, 2016. Open-source framework for 3D scene interchange and collaboration. Accessed: 2025-10-31.
- NVIDIA. PhysX 5 SDK. https://github.com/NVIDIA-Omniverse/PhysX, 2022. Open-source GPU-accelerated physics engine for real-time simulation. Accessed: 2025-10-31.
- Amazon Robotics and NVIDIA. Amazon Robotics Builds Digital Twins of Warehouses with NVIDIA Omniverse and Isaac Sim. https://resources.nvidia.com/en-us-omniverse-enterprise/amazon-robotics, 2024. Case study on warehouse digital twin deployment at scale. Accessed: 2025-10-31.
- Siemens. Understanding Your Whole Factory with the Comprehensive Digital Twin. https://blogs.sw.siemens.com/thought-leadership/2024/12/26/understanding-your-whole-factory-with-the-comprehensive-digital-twin/, 2024. Accessed: 2025-10-30.
- Siemens and NVIDIA. Siemens and NVIDIA Expand Partnership to Accelerate AI Capabilities in Manufacturing. https://press.siemens.com/global/en/pressrelease/siemens-and-nvidia-expand-partnership-accelerate-ai-capabilities-manufacturing, 2025. Accessed: 2025-10-30.
- Puig, X.; Undersander, E.; Szot, A.; Cote, M.D.; Yang, T.Y.; Partsey, R.; Desai, R.; Clegg, A.W.; Hlavac, M.; Min, S.Y.; et al. Habitat 3.0: A co-habitat for humans, avatars and robots. In Proceedings of the International Conference on Learning Representations (ICLR), 2024.
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An open urban driving simulator. In Proceedings of the Conference on robot learning. PMLR, 2017, pp. 1–16.
- Koenig, N.; Howard, A. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2004, Vol. 3, pp. 2149–2154.
- Jakob, W.; Speierer, S.; Roussel, N.; Vicini, D. Dr.Jit: A Just-In-Time Compiler for Differentiable Rendering. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 2022, 41. Core compiler of the Mitsuba 3 rendering system. [CrossRef]
- Liu, M.; Grabli, S.; Speierer, S.; Sarafianos, N.; Bode, L.; Chiang, M.; Hery, C.; Davis, J.; Aliaga, C. Controllable Biophysical Human Faces. Computer Graphics Forum 2025, 44, e70170. [Google Scholar] [CrossRef]
- Sang, S.; Zhi, T.; Song, G.; Liu, M.; Lai, C.; Liu, J.; Wen, X.; Davis, J.; Luo, L. Agileavatar: Stylized 3d avatar creation via cascaded domain bridging. In Proceedings of the SIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1–8.
- Agarwal, N.; Ali, A.; Bala, M.; Balaji, Y.; Barker, E.; Cai, T.; Chattopadhyay, P.; Chen, Y.; Cui, Y.; Ding, Y.; et al. Cosmos world foundation model platform for physical ai. arXiv 2025, arXiv:2501.03575. [Google Scholar] [CrossRef]
- Ali, A.; Bai, J.; Bala, M.; Balaji, Y.; Blakeman, A.; et al. World Simulation with Video Foundation Models for Physical AI. arXiv 2025, arXiv:2511.00062. [Google Scholar]
- Azzolini, A.; Bai, J.; Brandon, H.; Cao, J.; Chattopadhyay, P.; et al. Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning. arXiv 2025, arXiv:2503.15558. [Google Scholar] [CrossRef]
- Authors, G. Genesis: A Generative and Universal Physics Engine for Robotics and Beyond, 2024.
- Simio LLC. Simio Digital Twin Simulation Software. https://www.simio.com, 2024. Process digital twin platform for discrete event simulation. Accessed: 2025-10-31.
- Consumer Goods Technology. P&G Taps into AI and Automation for Faster, Smarter Operations, 2024. Describes P&G’s Control Tower virtual twin reducing deadhead movements by 15%.
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Liu, S.; Zeng, Z.; Ren, T.; Li, F.; Zhang, H.; Yang, J.; Li, C.; Yang, J.; Su, H.; Zhu, J.; et al. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. In Proceedings of the Proceedings of the European Conference on Computer Vision (ECCV), 2024.
- Shen, W.; Yang, G.; Yu, A.; Wong, J.; Kaelbling, L.P.; Isola, P. Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation. In Proceedings of the Conference on Robot Learning (CoRL), 2023. Best Paper Award.
- Chen, T.; Shorinwa, O.; Bruno, J.; Swann, A.; Yu, J.; Zeng, W.; Nagami, K.; Dames, P.; Schwager, M. Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps. IEEE Transactions on Robotics 2025.
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2017.
- Hu, Y.; Lin, F.; Zhang, T.; Yi, L.; Gao, Y. Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning. arXiv 2023, arXiv:2311.17842. [Google Scholar]
- Brohan, A.; Brown, N.; Carbajal, J.; Chebotar, Y.; Dabis, J.; Finn, C.; Gopalakrishnan, K.; Hausman, K.; Herzog, A.; Hsu, J.; et al. Rt-1: Robotics transformer for real-world control at scale. In Proceedings of the Robotics: Science and Systems (RSS), 2023.
- Kim, M.; Pertsch, K.; Karamcheti, S.; Xiao, T.; Balakrishna, A.; Nair, S.; Rafailov, R.; Foster, E.; Sanketi, P.; Vuong, Q.; et al. OpenVLA: An Open-Source Vision-Language-Action Model. In Proceedings of the Conference on Robot Learning (CoRL). PMLR, 2024, Vol. 270, pp. 2679–2713.
- Octo Model Team.; Ghosh, D.; Walke, H.; Pertsch, K.; Black, K.; Mees, O.; Dasari, S.; Hejna, J.; Xu, C.; Luo, J.; et al. Octo: An Open-Source Generalist Robot Policy. In Proceedings of the Proceedings of Robotics: Science and Systems, Delft, Netherlands, 2024.
- Liu, S.; Wu, L.; Li, B.; Tan, H.; Chen, H.; Wang, Z.; Xu, K.; Su, H.; Zhu, J. RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation. In Proceedings of the International Conference on Learning Representations (ICLR), 2025.
- Li, X.; Zhang, M.; Geng, Y.; Geng, H.; Long, Y.; Shen, Y.; Zhang, R.; Liu, J.; Dong, H. ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 18061–18070.
- Tang, W.; Jing, D.; Pan, J.H.; Lu, Z.; Liu, Y.H.; Li, L.E.; Ding, M.; Fu, C.W. Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation, 2025, [arXiv:cs.AI/2505.12744].
- Physical Intelligence Team. Open Sourcing π0. Physical Intelligence Blog, 2025.
- Bjorck, J.; Castañeda, F.; Cherniadev, N.; Da, X.; Ding, R.; et al. GR00T N1: An Open Foundation Model for Generalist Humanoid Robots. arXiv 2025, arXiv:2503.14734. [Google Scholar] [CrossRef]
- Figure AI. Helix: A vision-language-action model for generalist humanoid control. Figure AI Blog, 2025.
- Team, G.R. Gemini Robotics: Bringing AI into the Physical World, 2025.
- Team, G.R. Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer, 2025.
- Guruprasad, P.; Sikka, H.; Song, J.; Wang, Y.; Liang, P.P. Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks, 2024, [arXiv:cs.RO/2411.05821].
- Kalashnikov, D.; Irpan, A.; Pastor, P.; Ibarz, J.; Herzog, A.; Jang, E.; Quillen, D.; Holly, E.; Kalakrishnan, M.; Vanhoucke, V.; et al. Scalable deep reinforcement learning for vision-based robotic manipulation. In Proceedings of the Conference on robot learning. PMLR, 2018, pp. 651–673.
- Cheng, X.; Shi, K.; Agarwal, A.; Pathak, D. Extreme parkour with legged robots. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 11443–11450.
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning (ICML). PMLR, 2018, pp. 1861–1870.
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International conference on machine learning. PMLR, 2018, pp. 1587–1596.
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
- Hansen, N.; Su, H.; Wang, X. TD-MPC2: Scalable, Robust World Models for Continuous Control. In Proceedings of the International Conference on Learning Representations (ICLR), 2024.
- Chebotar, Y.; Vuong, Q.; Hausman, K.; Xia, F.; Lu, Y.; Irpan, A.; Kumar, A.; Yu, T.; Herzog, A.; Pertsch, K.; et al. Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions. In Proceedings of the Conference on Robot Learning (CoRL), 2023.
- Wang, Z.; Hunt, J.J.; Zhou, M. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
- Zhu, Y.; Wan Hasan, W.Z.; Ramli, H.R.H.; Norsahperi, N.M.H.; Mohd Kassim, M.S.; Yao, Y. Deep Reinforcement Learning of Mobile Robot Navigation in Dynamic Environment: A Review. Sensors 2025, 25, 3394. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Y.; Wong, J.; Mandlekar, A.; Martín-Martín, R.; Joshi, A.; Nasiriany, S.; Zhu, Y. robosuite: A modular simulation framework and benchmark for robot learning. arXiv 2020, arXiv:2009.12293. [Google Scholar]
- Yu, T.; Quillen, D.; He, Z.; Julian, R.; Hausman, K.; Finn, C.; Levine, S. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Proceedings of the Conference on robot learning. PMLR, 2019, pp. 1094–1100.
- Rashid, T.; Samvelyan, M.; Schroeder de Witt, C.; Farquhar, G.; Foerster, J.; Whiteson, S. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the Proceedings of the 35th International Conference on Machine Learning (ICML), 2018, pp. 4295–4304.
- Yu, C.; Velu, A.; Vinitsky, E.; Gao, J.; Wang, Y.; Bayen, A.; Wu, Y. The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2022.
- Zhou, Y.; Xiao, J.; Zhou, Y.; Loianno, G. Multi-Robot Collaborative Perception With Graph Neural Networks. IEEE Robotics and Automation Letters 2022, 7, 2289–2296. [Google Scholar] [CrossRef]
- Liu, K.; Tang, Z.; Wang, D.; Wang, Z.; Zhao, B.; Li, X. COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models. In Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA), 2025. [CrossRef]
- Li, P.; An, Z.; Abrar, S.; Zhou, L. Large Language Models for Multi-Robot Systems: A Survey. arXiv 2025, arXiv:2502.03814. [Google Scholar] [CrossRef]
- Torne, M.; Simeonov, A.; Li, Z.; Chan, A.; Chen, T.; Gupta, A.; Agrawal, P. Reconciling reality through simulation: A real-to-sim-to-real approach for robust manipulation. In Proceedings of the Robotics: Science and Systems (RSS), 2024.
- Miki, T.; Lee, J.; Hwangbo, J.; Wellhausen, L.; Koltun, V.; Hutter, M. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics 2022, 7. [Google Scholar] [CrossRef]
- Kumar, A.; Fu, Z.; Pathak, D.; Malik, J. RMA: Rapid Motor Adaptation for Legged Robots. In Proceedings of the Robotics: Science and Systems (RSS), 2021.
- Hoeller, D.; Rudin, N.; Sako, D.; Hutter, M. ANYmal parkour: Learning agile navigation for quadrupedal robots. Science Robotics 2024, 9. [Google Scholar] [CrossRef]
- Radosavovic, I.; Xiao, T.; Zhang, B.; Darrell, T.; Malik, J.; Sreenath, K. Real-world humanoid locomotion with reinforcement learning. Science Robotics 2024, 9. [Google Scholar] [CrossRef]
- Cheng, X.; Ji, Y.; Chen, J.; Yang, R.; Yang, G.; Wang, X. Expressive Whole-Body Control for Humanoid Robots. In Proceedings of the Robotics: Science and Systems (RSS), 2024.
- Shaw, K.; Agarwal, A.; Pathak, D. LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning. In Proceedings of the Robotics: Science and Systems (RSS), 2023.
- Yang, M.; Lu, C.; Church, A.; Lin, Y.; Ford, C.J.; Li, H.; Psomopoulou, E.; Barton, D.A.; Lepora, N.F. AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch. In Proceedings of the Conference on Robot Learning (CoRL), 2024.
- Wang, C.; Shi, H.; Wang, W.; Zhang, R.; Fei-Fei, L.; Liu, C.K. DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation. In Proceedings of the Robotics: Science and Systems (RSS), 2024.
- Ross, S.; Gordon, G.; Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011, pp. 627–635.
- Argall, B.D.; Chernova, S.; Veloso, M.; Browning, B. A survey of robot learning from demonstration. Robotics and autonomous systems 2009, 57, 469–483. [Google Scholar] [CrossRef]
- Mandlekar, A.; Xu, D.; Wong, J.; Nasiriany, S.; Wang, C.; Kulkarni, R.; Fei-Fei, L.; Savarese, S.; Zhu, Y.; Martín-Martín, R. What matters in learning from offline human demonstrations for robot manipulation. In Proceedings of the Conference on Robot Learning (CoRL). PMLR, 2021, pp. 1678–1690.
- Chi, C.; Feng, S.; Du, Y.; Xu, Z.; Cousineau, E.; Burchfiel, B.; Song, S. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. In Proceedings of the Robotics: Science and Systems (RSS), 2023.
- Zhao, T.Z.; Kumar, V.; Levine, S.; Finn, C. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware. In Proceedings of the Robotics: Science and Systems (RSS), 2023.
- Fu, Z.; Zhao, T.Z.; Finn, C. Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation. In Proceedings of the Conference on Robot Learning (CoRL), 2024.
- Sermanet, P.; Lynch, C.; Chebotar, Y.; Hsu, J.; Jang, E.; Schaal, S.; Levine, S. Time-contrastive networks: Self-supervised learning from video. In Proceedings of the 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 1134–1141.
- Ha, D.; Schmidhuber, J. Recurrent world models facilitate policy evolution. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2018, Vol. 31.
- Levine, S.; Pastor, P.; Krizhevsky, A.; Ibarz, J.; Quillen, D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International journal of robotics research 2018, 37, 421–436. [Google Scholar] [CrossRef]
- Nair, S.; Rajeswaran, A.; Kumar, V.; Finn, C.; Gupta, A. R3M: A Universal Visual Representation for Robot Manipulation. In Proceedings of the Conference on Robot Learning (CoRL), 2022.
- Ma, Y.J.; Liang, W.; Du, G.; Jayaraman, D.; et al. VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
- Karamcheti, S.; Nair, S.; Chen, A.S.; Kollar, T.; Finn, C.; Sadigh, D.; Liang, P. Language-Driven Representation Learning for Robotics. In Proceedings of the Robotics: Science and Systems (RSS), 2023.
- Yang, S.; Du, Y.; Ghasemipour, K.; Tompson, J.; Kaelbling, L.; Schuurmans, D.; Abbeel, P. Learning Interactive Real-World Simulators. In Proceedings of the International Conference on Learning Representations (ICLR), 2024.
- Tan, J.; Zhang, T.; Coumans, E.; Iscen, A.; Bai, Y.; Hafner, D.; Bohez, S.; Vanhoucke, V. Sim-to-real: Learning agile locomotion for quadruped robots. In Proceedings of the Robotics: Science and Systems (RSS), 2018.
- James, S.; Wohlhart, P.; Kalakrishnan, M.; Kalashnikov, D.; Irpan, A.; Ibarz, J.; Levine, S.; Hadsell, R.; Bousmalis, K. Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12627–12637.
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International conference on machine learning. PMLR, 2017, pp. 1126–1135.
- Open X-Embodiment Collaboration. Open X-Embodiment: Robotic Learning Datasets and RT-X Models. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024. Co-winner, Best Conference Paper Award.
- Chen, H.; Wang, J.; Shah, A.; Tao, R.; Wei, H.; Xie, X.; Sugiyama, M.; Raj, B. Understanding and mitigating the label noise in pre-training on downstream tasks. In Proceedings of the International Conference on Learning Representations (ICLR), 2024. Spotlight.
- Chen, H.; Shah, A.; Wang, J.; Tao, R.; Wang, Y.; Li, X.; Xie, X.; Sugiyama, M.; Singh, R.; Raj, B. Imprecise label learning: A unified framework for learning with various imprecise label configurations. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2024, Vol. 37, pp. 59621–59654.
- Liu, M.; Di, Z.; Wei, J.; Wang, Z.; Zhang, H.; Xiao, R.; Wang, H.; Pang, J.; Chen, H.; Shah, A.; et al. Automatic dataset construction (adc): Sample collection, data curation, and beyond. arXiv 2024, arXiv:2408.11338. [Google Scholar] [CrossRef]
- Liu, M.; Wei, J.; Liu, Y.; Davis, J. Human and ai perceptual differences in image classification errors. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2025, Vol. 39, pp. 14318–14326.
- Chen, R.; Sun, Y.; Wang, J.; Lv, M.; Zhang, Q.; Zeng, Y. SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents. arXiv 2025, arXiv:2509.25885. [Google Scholar] [CrossRef]
- Dalrymple, D.; Skalse, J.; Bengio, Y.; Russell, S.; Tegmark, M.; Seshia, S.; Omohundro, S.; Szegedy, C.; Goldhaber, B.; Ammann, N.; et al. Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems. arXiv 2024, arXiv:2405.06624. [Google Scholar] [CrossRef]
- EU Artificial Intelligence Act. High-level Summary of the AI Act. https://artificialintelligenceact.eu/high-level-summary/, 2024. Summary of compliance obligations for high-risk AI systems. Accessed: 2025-10-31.
- European Commission. AI Act: Regulatory Framework for Artificial Intelligence. https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai, 2024. Official EU AI Act policy page with entry into force date August 1, 2024. Accessed: 2025-10-31.
- Anthropic. Announcing our updated Responsible Scaling Policy. https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy, 2024. AI Safety Level framework and ASL-3 protections announced October 2024. Accessed: 2025-10-31.
- Bai, Y.; et al. Constitutional AI: Harmlessness from AI Feedback. arXiv 2022, arXiv:2212.08073. [Google Scholar] [CrossRef]
- World Economic Forum. The Future of Jobs Report 2025. https://reports.weforum.org/docs/WEF_Future_of_Jobs_Report_2025.pdf, 2025. Workforce impact analysis projecting 92M jobs displaced and 170M created by 2030. Accessed: 2025-10-31.
- Vats, V.; Binta Nizam, M.; Liu, M.; Wang, Z.; Ho, R.; Sai Prasad, M.; Titterton, V.; Venkat Malreddy, S.; Aggarwal, R.; Xu, Y.; et al. A Survey on Human-AI Collaboration with Large Foundation Models. arXiv 2024, arXiv:2403.04931. [Google Scholar]
- Ichnowski, J.; Chen, K.; Dharmarajan, K.; Adebola, S.; Danielczuk, M.; Mayoral-Vilches, V.; Jha, N.; Zhan, H.; Llontop, E.; Xu, D.; et al. FogROS 2: An Adaptive and Extensible Platform for Cloud and Fog Robotics Using ROS 2. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 5493–5500. [CrossRef]
- Chen, K.; Wang, M.; Gualtieri, M.; Tian, N.; Juette, C.; Ren, L.; Ichnowski, J.; Kubiatowicz, J.; Goldberg, K. FogROS2-LS: A Location-Independent Fog Robotics Framework for Latency Sensitive ROS2 Applications. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 10581–10587. [CrossRef]
- Hu, A.; Russell, L.; Yeo, H.; Murez, Z.; Fedoseev, G.; Kendall, A.; Shotton, J.; Corrado, G. Gaia-1: A generative world model for autonomous driving. arXiv 2023, arXiv:2309.17080. [Google Scholar] [CrossRef]
- Wayve. Nissan to Launch Next-Generation Autonomous Driving Technology in FY2027. https://wayve.ai/press/nissan-announcement/, 2025. Partnership to integrate Wayve’s AI Driver software into Nissan ProPILOT. Accessed: 2025-10-31.
- Tesla. Optimus: Tesla’s General-Purpose Humanoid Robot. https://www.tesla.com/we-robot, 2024. General-purpose humanoid robot demonstrated at We, Robot event October 2024. Accessed: 2025-10-31.
- Sanctuary AI. Phoenix Seventh Generation Humanoid Robot. https://www.sanctuary.ai/blog/sanctuary-ai-unveils-the-next-generation-of-ai-robotics, 2024. General-purpose humanoid with 24-hour task learning capability. Released April 2024. Accessed: 2025-10-31.
- DENSO Robotics. DENSO and Microsoft Azure Partnership for Cloud-Connected Industrial Robots. DENSO Press Release, 2024.
- staff, A. An update on how we’re accelerating the use of AI in robotics at scale. Amazon, 2024.
- Covariant. Introducing RFM-1: Giving robots human-like reasoning capabilities. Covariant Technical Blog, 2024.
- Covariant. The Covariant Brain: Powering the future of automation. Technical documentation, Covariant, 2024.
- DHL Supply Chain. DHL Supply Chain Continues to Innovate With Orchestration, Robotics, and AI in 2024. https://www.dhl.com/us-en/home/press/press-archive/2024/dhl-supply-chain-continues-to-innovate-with-orchestration-robotics-and-ai-in-2024.html, 2024. Deployment of 7,000+ robots including AMRs and collaborative systems. Accessed: 2025-10-31.
- FedEx and Nimble. FedEx Announces Expansion of FedEx Fulfillment With Nimble Alliance. https://newsroom.fedex.com/newsroom/global-english/fedex-announces-expansion-of-fedex-fulfillment-with-nimble-alliance, 2024. Partnership for autonomous fulfillment robots in warehouse operations. Announced September 2024. Accessed: 2025-10-31.
- Built Robotics. Autonomous Construction Equipment. https://www.builtrobotics.com, 2024. AI-powered autonomous systems for bulldozers, excavators, and construction machinery. Accessed: 2025-10-31.
- Wasay, F.A.; Rahman, M.A.; Ghouse, H. GAMORA: A Gesture Articulated Meta Operative Robotic Arm for Hazardous Material Handling in Containment-Level Environments. arXiv 2025, arXiv:2506.14513. [Google Scholar] [CrossRef]
- Colan, J.; Davila, A.; Yamada, Y.; Hasegawa, Y. Human-Robot collaboration in surgery: Advances and challenges towards autonomous surgical assistants, 2025, [arXiv:cs.RO/2507.11460].
- Interact Analysis. Industrial Robot Forecast Update: Wide Variations Across Robot Types, Regions and Industries. https://interactanalysis.com/insight/industrial-robot-forecast-update-wide-variations-across-robot-types-regions-and-industries/, 2024. Market forecast for industrial robot shipments. Accessed: 2025-10-31.
- European Commission. EU AI Act Implementation Timeline. https://ai-act-service-desk.ec.europa.eu/en/ai-act/eu-ai-act-implementation-timeline, 2025. Accessed: 2025-10-28.
- Chee, F.Y. Code of Practice to Help Companies with AI Rules May Come End 2025, EU Says. https://www.reuters.com/business/media-telecom/code-practice-help-companies-with-ai-rules-may-come-end-2025-eu-says-2025-07-03/, 2025. Accessed: 2025-10-28.
- Intel. Intel RealSense Depth Camera D455, 2020.
- Microsoft. Azure Kinect DK. https://azure.microsoft.com/en-us/products/kinect-dk, 2019. RGB-D camera with time-of-flight depth sensor for collaborative robotics. Accessed: 2025-10-31.
- Gallego, G.; Delbrück, T.; Orchard, G.; Bartolozzi, C.; Taba, B.; Censi, A.; Leutenegger, S.; Davison, A.J.; Conradt, J.; Daniilidis, K.; et al. Event-based vision: A survey. IEEE transactions on pattern analysis and machine intelligence 2022, 44, 154–180. [Google Scholar] [CrossRef]
- Rebecq, H.; Ranftl, R.; Koltun, V.; Scaramuzza, D. High speed and high dynamic range video with an event camera. IEEE transactions on pattern analysis and machine intelligence 2021, 43, 1964–1980. [Google Scholar] [CrossRef]
- Robotiq. Most Popular Uses for Force Torque Sensors in Industry, 2015. Survey of industrial force-torque sensor applications. Last updated 2025.
- Analog Devices. A Technical Note on a Tactile Sensor Prototype. https://www.analog.com, 2025. Accessed: 2025-10-27.
- Liu, Z.; Tang, H.; Amini, A.; Yang, X.; Mao, H.; Rus, D.; Han, S. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 2774–2781.






indicates substantial coverage; ⃝ indicates partial or incidental treatment; — indicates not covered.
indicates substantial coverage; ⃝ indicates partial or incidental treatment; — indicates not covered.| Survey | Sensors | HW | World Models | VLAs | Learning | Safety | Deploy. |
|---|---|---|---|---|---|---|---|
| Kawaharazuka et al. 2025 [26] | — | — | ⃝ | ![]() |
⃝ | — | — |
| Liu et al. 2025 [27] | ⃝ | — | ⃝ | ⃝ | ![]() |
— | — |
| Sapkota et al. 2025 [28] | — | — | — | ![]() |
⃝ | — | — |
| This survey | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| Platform (specs) | Can run today | Cannot run / key gap |
|---|---|---|
| Jetson Thor ∼2000 TOPS / 130 W | Multiple VLA policies concurrently; transformer-class models on a mobile base at control-loop rates | Unproven at scale; supply-constrained; requires significant thermal engineering |
| Jetson Orin 275 TOPS / 60 W | YOLO-class detection at ∼100 FPS; lightweight SLAM; backbone of most deployed AMRs | VLMs at only 0.1–0.4 FPS ( below real-time threshold); forces split architecture |
| Qualcomm RB5 15 TOPS / 15 W | On-device object detection; 5G enables cloud offload; drones and service robots | Zero foundation-model inference on-device; entirely cloud-dependent for VLA workloads |
| Loihi 2 Event-driven / <1 W | Spiking tactile controllers; event-driven reflexes at W power | Cannot run dense transformers; no mature toolchain for VLA-class models; research-only |
| Model | Params | Vision Enc. | LM Backbone | Action Decoder | Hz | Training Scale | Emb. | Open |
|---|---|---|---|---|---|---|---|---|
| Early VLA Systems (2022–2023) | ||||||||
| Gato [20] | 1.2B | ResNet | Unified xfmr | Autoregressive tokens | — | 604 tasks (multi-modal) | —a | No |
| RT-1 [88] | 35M | EffNet-B3 | FiLM cond. | Discrete (256 bins) | 3 | 130K eps, 700+ tasks | 1 | Yes |
| RT-2 [17] | 55B | ViT-22B | PaLI-X | Discrete text tokens | 1–3 | 130K eps + web VLM | 1 | No |
| Open-Source Cross-Embodiment Models (2024) | ||||||||
| Octo [90] | 93M | CNN patches | T5-Base | Diffusion (MLP) | — | 800K eps (OXE) | 9 | Yes |
| OpenVLA [89] | 7B | DINOv2+SigLIP | Llama 2 | Discrete tokens | 5–6 | 970K eps (OXE) | 22 | Yes |
| RDT-1B [91] | 1.2B | SigLIP | T5-XXL | Diffusion xfmr | 6 | 1M+ eps, 46 datasets | 46 | Yes |
| Industrial Generalist Agents (2024–2025) | ||||||||
| [19] | 3.3B | SigLIP | Gemma 2B | Flow matching | 50 | 10K hrs, 68 tasks | 7 | Yes |
| GR00T N1 [95] | 3B | SigLIP-2 (Eagle) | SmolLM2 | DiT + flow match | 10/120b | 7.4K hrs (real+sim) | 5+ | Yes |
| Helix [96] | ∼7B | Undisclosed | 7B VLM | Dual-systemc | 9/200b | 500 hrs teleop | 1 | No |
| Gemini Rob. [97] | — | Gemini 2.0 | Gemini 2.0 | Cloud + local dec. | 50 | Thousands of hrs | 3+ | No |
| Algorithm | Type | Efficiency | Stability | Key Innovation & Limitations |
|---|---|---|---|---|
| PPO [12] | On-policy | Medium | High | Clipped surrogate objective; high sample cost limits real-robot use |
| SAC [102] | Off-policy | High | High | Maximum-entropy exploration; sensitive to reward shaping |
| TD3 [103] | Off-policy | High | Medium | Twin Q-networks, delayed updates; supersedes DDPG [104] |
| QT-Opt [100] | Off-policy | High | Medium | Fleet-scale Q-learning; requires >500k real interactions |
| TD-MPC2 [105] | Model-based | Very High | High | Latent trajectory optimisation; single 317M-param agent learns 80 tasks |
| Q-Transformer [106] | Offline | High | High | Autoregressive Q-learning over action tokens; offline data only |
| Diffusion-QL [107] | Offline | High | High | Diffusion model as policy class; SoTA on D4RL benchmarks |
| Category | Failure Mode | Example / Symptom | Mitigation |
|---|---|---|---|
| Perception | Object hallucination | VLM detects nonexistent obstacle; grasps empty space | Multi-sensor fusion; confidence thresholds; redundant modalities |
| Sensor degradation | LiDAR rain scatter; camera occlusion or glare | Graceful degradation; radar backup; digital twin replay for diagnosis | |
| Planning | Infeasible plan | LLM proposes physically impossible action sequence | Affordance grounding (PaLM-SayCan); physics-aware verification |
| Goal misinterpretation | Ambiguous NL instruction yields unintended behaviour | Clarification dialogue; conservative defaults; human-in-the-loop | |
| Control | Distribution shift | Policy encounters unseen object geometry or dynamics | Domain randomisation; online adaptation (RMA); fleet retraining |
| Latency-induced error | VLM response too slow for dynamic scene changes | Dual-system architecture (Helix); reactive safety fallbacks | |
| Hardware | Actuator fault | Joint failure mid-task; gripper slip | Torque monitoring; redundant actuators; safe-stop protocols |
| Compute overload | Edge GPU thermal throttle; inference timeout | Model distillation; workload partitioning; cloud offload (FogROS2) | |
| Integration | Sim-to-real gap | Policy trained in simulation fails on real contacts | Iterative calibration (RialTo); hybrid sim-real training |
| Fleet inconsistency | Model update degrades subset of heterogeneous fleet | Canary deployments; per-embodiment regression testing | |
| Human–Robot | Intent misalignment | Robot optimises proxy metric, not user’s true goal | Constitutional AI constraints; value alignment; audit logging |
| Trust miscalibration | Operator over-relies on or under-trusts autonomy | Transparent confidence displays; graduated autonomy levels |
| Phase | Industries | Transition Criteria | Key Characteristics | Primary Barriers |
|---|---|---|---|---|
| Phase 1: At Scale | Logistics, warehousing | >100 units; >12 mo uptime; documented ROI | 100s–1000s deployed; 24/7 operations | Infrastructure costs; skilled workforce |
| Phase 2: Pilot | Manufacturing (humanoids, cobots) | >10 units; >3 mo trial; partner-validated tasks | 10s–100s units; supervised; focused tasks | Reliability, safety cert., integration |
| Phase 3: Regulatory | Autonomous vehicles | Safety case accepted; limited commercial licence | Technically ready; limited commercial ops | Liability, public acceptance |
| Phase 4: Research | Agriculture, construction, healthcare | Published demo; >1 real-world trial | Demos and niche deployments; high supervision | Unstructured envs, economics |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).