Submitted:
07 March 2026
Posted:
10 March 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Instead of focusing on specific applications or introductory conceptual discussions, we offer an in-depth and systematic overview of world modeling paradigms, methodologies, key functions, and their relationships.
- We summarize the key evolutionary developments of existing major WMs and their core mathematical formulations across branches from a broader perspective.
- Beyond concentrating solely on commonly studied application domains, we comprehensively review all application areas of world models explored to date. These domains include robotics, autonomous driving, scientific discovery, virtual game simulation, GUI-based agents, as well as interpretability and trustworthiness.
- We provide a more thorough and inclusive overview of benchmark datasets, evaluation metrics, simulation platforms, and comparative experiments across WMs.
2. Foundational World Models
2.1. Observation-Level Generative World Models
2.1.1. Language Observations
2.1.2. Visual Observations
2.1.3. 3D and 4D Observations
2.2. Latent-Space World Models
2.3. Reinforcement Learning (RL)-Based World Models
2.4. Object-Centric World Models
2.5. Discussion of Expected World Models
3. Applications of World Models in AI
3.1. World Models for Robotics
3.1.1. Manipulation
3.1.2. Navigation
3.1.3. Policy Learning
3.1.4. Locomotion
3.2. World Models for Autonomous Driving
3.2.1. Predictive Modeling
3.2.2. Action-Conditioned Imagination
3.2.3. Decision-Centric Integration
3.3. World Models for Science
3.3.1. Social Science and Socioeconomic Systems
3.3.2. Physical and Natural Sciences
3.4. World Models for Virtual Game Simulation
3.4.1. 2D Pixel-Level Observation Prediction
3.4.2. 3D Mesh-Level Observation Prediction
3.5. World Models for GUI-Based Agents
3.6. Interpretable and Trustworthy World Models
3.7. Limitations of WMs in Downstream Applications
4. Benchmark of World Models
4.1. Benchmark Datasets & Evaluation Metrics
4.1.1. Pretrained Video Benchmarks
4.1.2. Benchmarks on Downstream Tasks
4.1.3. Designing General Metrics for World Models
- Generalization: Given the world model and the task metric , we express the metric for measuring cross-domain generalization capability as follows:where and represent the task metrics (e.g., success rate or accuracy) when the WM is evaluated on the training sample and the testing sample . and denote the distributions of the training and testing datasets, and they exhibit a significant distribution discrepancy ().
- Causal Reasoning: Given a set of interventions and a distance measure , inspired by Pearl’s do-calculus [283], we adopt a counterfactual intervention operator () to measure causal reasoning capability :where and denote the expected outputs when the inputs are and .
- Long-Horizon Consistency: For multi-step execution tasks, we compare the incremental deviation between the actual trajectory and the imagined trajectory under identical conditions and policies to evaluate the long-horizon consistency of existing foundational WMs:where and denote the ground-truth and predicted trajectory (generated based on Eq. (1)) at time t.
4.2. Physics Engines & Simulation Platforms
4.3. Performance Comparison
5. Challenges & Future Directions
5.1. Scientific Modeling
5.2. Long-Horizon Consistency & Causal Reasoning
5.3. Grounding in Physical and Semantic Constraints
5.4. Generalization & Scalability in Real-World
6. Conclusions
References
- Zhu, F.; et al. Irasim: Learning interactive real-robot action simulators. arXiv 2024. [Google Scholar] [CrossRef]
- Wang, J.; Ma, A.; Cao, K.; et al. WISA: World simulator assistant for physics-aware text-to-video generation. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Hafner, D.; Pasukonis, J.; et al. Mastering diverse control tasks through world models. Nature 2025, 1–7. [Google Scholar] [CrossRef]
- Ha, D.R.; Schmidhuber, J. World Models. arXiv 2018. [Google Scholar]
- Zhao, C.; Zhang, R.; et al. World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks. arXiv 2025. [Google Scholar] [CrossRef]
- Janner, M.; et al. Planning with Diffusion for Flexible Behavior Synthesis. In Proceedings of the ICML, 2022. [Google Scholar]
- Assran, M.; Bardes, A.; et al. V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning. arXiv 2025. [Google Scholar]
- Ghaemi, H.; et al. seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Hafner, D.; Lillicrap, T.; et al. Dream to Control: Learning Behaviors by Latent Imagination. In Proceedings of the ICLR, 2020. [Google Scholar]
- Hafner, D.; et al. Mastering Atari with Discrete World Models. In Proceedings of the ICLR, 2021. [Google Scholar]
- Hansen, N.; Su, H.; Wang, X. TD-MPC2: Scalable, Robust World Models for Continuous Control. In Proceedings of the ICLR, 2024. [Google Scholar]
- Yang, S.; Du, Y.; Dai, B.; et al. Probabilistic Adaptation of Black-Box Text-to-Video Models. In Proceedings of the ICLR, 2024. [Google Scholar]
- Kotar, K.; Lee, W.; Venkatesh, R.; et al. World Modeling with Probabilistic Structure Integration. arXiv 2025. [Google Scholar] [CrossRef]
- Kang, B.; Yue, Y.; Lu, R.; et al. How Far Is Video Generation from World Model: A Physical Law Perspective. In Proceedings of the ICML, 2025. [Google Scholar]
- Yang, H. Utilizing World Models for Adaptively Covariate Acquisition Under Limited Budget for Causal Decision Making. In Proceedings of the ICLR Workshop, 2025. [Google Scholar]
- Feng, F.; Lippe, P.; Magliacane, S. Learning Interactive World Model for Object-Centric Reinforcement Learning. arXiv 2025. [Google Scholar] [CrossRef]
- Lee, H.; Lee, Y.; et al. Hyperspherical Normalization for Scalable Deep Reinforcement Learning. In Proceedings of the ICML, 2025. [Google Scholar]
- Chen, H.; et al. VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models. In Proceedings of the CVPR, 2024. [Google Scholar]
- Yu, H.X.; Duan, H.; et al. WonderWorld: Interactive 3D Scene Generation from a Single Image. CVPR 2024. [Google Scholar]
- Zhang, W.; Jelley, A.; McInroe, T.; et al. Objects Matter: Object-Centric World Models Improve Reinforcement Learning in Visually Complex Environments. In Proceedings of the RLC Workshop, 2025. [Google Scholar]
- Zhou, S.; Zhou, T.; et al. WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents. arXiv 2025. [Google Scholar]
- Zhang, B.; Wang, R.; Xiao, W.; et al. DyMoDreamer: World Modeling with Dynamic Modulation. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Chi, X.; Jia, P.o. Wow: Towards a world omniscient world model through embodied interaction. arXiv 2025. [Google Scholar] [CrossRef]
- Samsami, M.R.; et al. Mastering Memory Tasks with World Models. In Proceedings of the ICLR, 2024. [Google Scholar]
- Wang, Y.; Wan, S.; Gan, L.; et al. AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors. In Proceedings of the ICML, 2024. [Google Scholar]
- Hansen, N.A.; et al. Temporal Difference Learning for Model Predictive Control. In Proceedings of the ICML, 2022. [Google Scholar]
- Zhou, G.; Pan, H.; LeCun, Y.; et al. DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning. In Proceedings of the ICML, 2025. [Google Scholar]
- team, F.C.; Copet, J.; et al. CWM: An Open-Weights LLM for Research on Code Generation with World Models. arXiv 2025. [Google Scholar]
- Brooks, T.; Peebles, B.; Holmes, C.; et al. Video generation models as world simulators. Technical Report 2024. [Google Scholar]
- Acuaviva, P.; et al. From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models. arXiv 2025. [Google Scholar] [CrossRef]
- Wang, Z.; Wei, X.; Li, B.; et al. VideoVerse: How Far is Your T2V Generator from a World Model? arXiv 2025. [Google Scholar]
- Karypidis, E.; Kakogeorgiou, I.; et al. DINO-Foresight: Looking into the Future with DINO. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Wang, X.; Zhang, X.; Luo, Z.; et al. Emu3: Next-Token Prediction is All You Need. arXiv 2024. [Google Scholar] [CrossRef]
- Gkountouras, J.; et al. Language Agents Meet Causality – Bridging LLMs and Causal World Models. In Proceedings of the ICLR, 2025. [Google Scholar]
- Zhang, Y.; et al. Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Mattes, P.; Schlosser, R.; Herbrich, R. Hieros: Hierarchical Imagination on Structured State Space Sequence World Models. In Proceedings of the ICML, 2024. [Google Scholar]
- Brito, C.S.; et al. World Models as Reference Trajectories for Rapid Motor Adaptation. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Levy, G.; Colas, C.; et al. WorldLLM: Improving LLMs’ world modeling using curiosity-driven theory-making. arXiv 2025. [Google Scholar]
- Wu, Z.; Dvornik, N.; et al. SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models. In Proceedings of the ICLR, 2023. [Google Scholar]
- Locatello, F.; Weissenborn, D.; et al. Object-centric learning with slot attention. In Proceedings of the NeurIPS, 2020. [Google Scholar]
- Chae, H.; Kim, N.; iunn Ong, K.T.; et al. Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation. In Proceedings of the ICLR, 2025. [Google Scholar]
- Hao, S.; Gu, Y.; Ma, H.; et al. Reasoning with Language Model is Planning with World Model. In Proceedings of the EMNLP, 2023. [Google Scholar]
- Foffano, D.; Russo, A.; Proutiere, A. Adversarial Diffusion for Robust Reinforcement Learning. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Wu, T.; Yang, S.; Po, R.; et al. Video World Models with Long-term Spatial Memory. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Wang, Z.; et al. Dyn-O: Building Structured World Models with Object-Centric Representations. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Lee, V.; Abbeel, P.; et al. DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing. In Proceedings of the ICLR, 2024. [Google Scholar]
- Alonso, E.; Jelley, A.; et al. Diffusion for world modeling: Visual details matter in atari. NeurIPS 2024. [Google Scholar]
- Li, L.; Fan, Z.; Cong, W.; et al. Martian World Model: Controllable Video Synthesis with Physically Accurate 3D Reconstructions. In Proceedings of the neurIPS, 2025. [Google Scholar]
- Park, B.; Go, H.; et al. SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering. ICCV 2025. [Google Scholar]
- Che, H.; et al. GameGen-X: Interactive Open-world Game Video Generation. In Proceedings of the ICLR, 2025. [Google Scholar]
- Jin, Y.; et al. Pyramidal Flow Matching for Efficient Video Generative Modeling. In Proceedings of the ICLR, 2025. [Google Scholar]
- Barhdadi, M.R.; et al. PhysicsNeRF: Physics-Guided 3D Reconstruction from Sparse Views. In Proceedings of the ICML Workshop, 2025. [Google Scholar]
- Chen, B.; Monsó, D.M.; et al. Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion. In Proceedings of the NeurIPS, 2024. [Google Scholar]
- Assran, M.; et al. Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture. In Proceedings of the CVPR, 2023. [Google Scholar]
- Bardes, A.; Garrido, Q.; Ponce, J.; et al. Revisiting Feature Prediction for Learning Visual Representations from Video. arXiv 2024. [Google Scholar] [CrossRef]
- Bardhan, J.; Agrawal, R.; et al. HEP-JEPA: A foundation model for collider physics. In Proceedings of the ICLR Workshop, 2025. [Google Scholar]
- Wang, B.; Meng, X.; et al. EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling. arXiv 2025. [Google Scholar]
- Hao, C.; et al. Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning. In Proceedings of the CVPR, 2025. [Google Scholar]
- Park, K.; Lee, Y. Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning. In Proceedings of the ICLR, 2025. [Google Scholar]
- Lin, Z.; Wu, Y.F.; Peri, S.; et al. Improving Generative Imagination in Object-Centric World Models. In Proceedings of the ICML, 2020. [Google Scholar]
- GX-Chen, A.; Marino, K.; Fergus, R. Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction. In Proceedings of the ICLR, 2025. [Google Scholar]
- Lu, J.; Huang, Z.; Yang, Z.; et al. Wovogen: World volume-aware diffusion for controllable multi-camera driving scene generation. In Proceedings of the ECCV, 2024. [Google Scholar]
- Gao, S.; Zhou, S.; et al. AdaWorld: Learning Adaptable World Models with Latent Actions. In Proceedings of the ICML, 2025. [Google Scholar]
- Zhang, X.; et al. Social World Model-Augmented Mechanism Design Policy Learning. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Yue, Y.; Wang, Y.; et al. CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning. In Proceedings of the CVPR, 2025. [Google Scholar]
- Hafner, D.; Lillicrap, T.; Fischer, I.; et al. Learning latent dynamics for planning from pixels. In Proceedings of the ICML, 2019. [Google Scholar]
- Gao, S.; Yang, J.; Chen, L.; et al. Vista: A generalizable driving world model with high fidelity and versatile controllability. In Proceedings of the NeurIPS, 2024. [Google Scholar]
- Cheng, J.; Ge, Y.; et al. Animegamer: Infinite anime life simulation with next game state prediction. In Proceedings of the ICCV, 2025. [Google Scholar]
- Qiao, S.; Fang, R.; et al. Agent planning with world knowledge model. NeurIPS 2024. [Google Scholar]
- Zhang, Y.; Su, Y.; et al. CellFlux: Simulating Cellular Morphology Changes via Flow Matching. In Proceedings of the ICML, 2025. [Google Scholar]
- Zhao, Z.; et al. From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Agro, B.; Sykora, Q.; et al. Uno: Unsupervised occupancy fields for perception and forecasting. In Proceedings of the CVPR, 2024. [Google Scholar]
- Chua, K.; et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. In Proceedings of the NeurIPS, 2018. [Google Scholar]
- Yang, Y.; et al. Medical world model: Generative simulation of tumor evolution for treatment planning. arXiv 2025. [Google Scholar] [CrossRef]
- Feng, T.; Wang, W.; et al. A survey of world models for autonomous driving. arXiv 2025. [Google Scholar] [CrossRef]
- Fu, A.; Zhou, Y.; Zhou, T.; et al. Exploring the interplay between video generation and world models in autonomous driving: A survey. arXiv 2024. [Google Scholar] [CrossRef]
- Guan, Y.o. World models for autonomous driving: An initial survey. IEEE Transactions on Intelligent Vehicles, 2024. [Google Scholar]
- Tu, S.; Zhou, X.; et al. The role of world models in shaping autonomous driving: A comprehensive survey. arXiv 2025. [Google Scholar] [CrossRef]
- Zhao, J.; Zhao, W.; et al. Autonomous driving system: A comprehensive survey. Expert Systems with Applications 2024. [Google Scholar] [CrossRef]
- Zablocki, É.; Ben-Younes, H.; Pérez, P.; Cord, M. Explainability of deep vision-based autonomous driving systems: Review and challenges. IJCV 2022. [Google Scholar] [CrossRef]
- Kong, L.; Yang, W.; et al. 3D and 4D World Modeling: A Survey. arXiv 2025. [Google Scholar]
- Xie, N.; et al. From 2D to 3D Cognition: A Brief Survey of General World Models. arXiv 2025. [Google Scholar] [CrossRef]
- Baraldi, L.; et al. The Safety Challenge of World Models for Embodied AI Agents: A Review. arXiv 2025. [Google Scholar] [CrossRef]
- Zhu, Z.; et al. Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond. arXiv 2025. [Google Scholar]
- Lin, M.; et al. Exploring the Evolution of Physics Cognition in Video Generation: A Survey. arXiv 2025. [Google Scholar] [CrossRef]
- Liu, Y.; et al. Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI. arXiv 2025. [Google Scholar] [CrossRef]
- Li, X.; He, X.; et al. A Comprehensive Survey on World Models for Embodied AI. arXiv 2025. [Google Scholar] [CrossRef]
- Long, X.; et al. A Survey: Learning Embodied Intelligence from Physical Simulators and World Models. arXiv 2025. [Google Scholar] [CrossRef]
- Fung, P.; Bachrach, Y.; et al. Embodied AI Agents: Modeling the World. arXiv 2025. [Google Scholar] [CrossRef]
- Zhang, P.F.; et al. A Step Toward World Models: A Survey on Robotic Manipulation. arXiv 2025. [Google Scholar] [CrossRef]
- Sun, J.; et al. Integrating World Models into Vision Language Action and Navigation: A Comprehensive Survey. TechRxiv 2025. [Google Scholar] [CrossRef]
- Ding, J.; et al. Understanding World or Predicting Future? A Comprehensive Survey of World Models. ACM Comput. Surv. 2025, 58. [Google Scholar] [CrossRef]
- Ser, J.D.; et al. World Models in Artificial Intelligence: Sensing, Learning, and Reasoning Like a Child. arXiv 2025. [Google Scholar] [CrossRef]
- Xu, K.; Zhao, H.; et al. From Specialist to Generalist: A Comprehensive Survey on World Models. TechRxiv 2026. [Google Scholar] [CrossRef] [PubMed]
- Xie, K.; et al. Making Large Language Models into World Models with Precondition and Effect Knowledge. arXiv 2024. [Google Scholar]
- Bahmani, S.; et al. 4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling. CVPR 2023. [Google Scholar]
- Liu, H.; Yan, W.; Zaharia, M.; et al. World model on million-length video and language with blockwise ringattention. arXiv 2024. [Google Scholar] [CrossRef]
- Yu, H.X.; Duan, H.; Hur, J.; et al. WonderJourney: Going from Anywhere to Everywhere. CVPR 2023. [Google Scholar]
- Achiam, J.; et al.; OpenAI GPT-4 Technical Report. arXiv 2024. [Google Scholar]
- Lu, G.; et al. Manigaussian: Dynamic gaussian splatting for multi-task robotic manipulation. In Proceedings of the ECCV, 2024; pp. 349–366. [Google Scholar]
- Wang, R.; Todd, G.; Xiao, Z.; et al. Can Language Models Serve as Text-Based World Simulators? In Proceedings of the ACL, 2024. [Google Scholar]
- Grattafiori, A.; et al. The Llama 3 Herd of Models. arXiv 2024. [Google Scholar] [CrossRef]
- Lyu, B.; Huang, S.; Liang, Z. SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors. In Proceedings of the EMNLP, 2025. [Google Scholar]
- Wang, A.; Ai, B.; Wen, B.; et al. Wan: Open and Advanced Large-Scale Video Generative Models. arXiv 2025. [Google Scholar] [CrossRef]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Visual Instruction Tuning. In Proceedings of the NeurIPS, 2023. [Google Scholar]
- Huang, S.; Wu, J.; Zhou, Q.; et al. Vid2World: Crafting Video Diffusion Models to Interactive World Models. arXiv 2025. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, F.; et al. Co-Evolving Latent Action World Models. arXiv 2025. [Google Scholar] [CrossRef]
- Liang, A.; Liu, Y.; Yang, Y.; et al. LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences. In Proceedings of the AAAI, 2025. [Google Scholar]
- Fridman, R.; et al. SceneScape: Text-Driven Consistent Scene Generation. In Proceedings of the NeurIPS, 2023. [Google Scholar]
- Höllein, L.; Cao, A.; et al. Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models. ICCV 2023. [Google Scholar]
- Engstler, P.; Vedaldi, A.; et al. Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting. In Proceedings of the 3DV, 2024. [Google Scholar]
- Huang, W.; Chao, Y.W.; et al. PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation. arXiv 2026. [Google Scholar]
- Bardes, A.; et al. MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features. arXiv 2023. [Google Scholar]
- Baldassarre, F.; et al. Back to the Features: DINO as a Foundation for Video World Models. arXiv 2024. [Google Scholar]
- Delliaux, T.; Vu, N.K.; Francois-Lavet, V.; et al. Learning Abstract World Models with a Group-Structured Latent Space. In Proceedings of the EWRL, 2025. [Google Scholar]
- Cohen, L.; Wang, K.; Kang, B.; et al. Improving Token-Based World Models with Parallel Observation Prediction. In Proceedings of the ICML, 2024. [Google Scholar]
- Ma, H.; Wu, J.; Feng, N.; et al. HarmonyDream: Task Harmonization Inside World Models. In Proceedings of the ICML, 2024. [Google Scholar]
- Huang, D.; WANG, J.; Li, Y.; et al. PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning. In Proceedings of the ICML, 2025. [Google Scholar]
- Wang, Q.; Yang, J.; Wang, Y.; et al. Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning. In Proceedings of the NeurIPS, 2024. [Google Scholar]
- Chen, R.; Chen, X.H.; et al. Policy-conditioned Environment Models are More Generalizable. In Proceedings of the ICML, 2024. [Google Scholar]
- Li, S.; Huang, Z.; Su, H. Reward-free World Models for Online Imitation Learning. In Proceedings of the ICML, 2025. [Google Scholar]
- Rigter, M.; Jiang, M.; Posner, I. Reward-Free Curricula for Training Robust World Models. In Proceedings of the ICLR, 2024. [Google Scholar]
- Georgiev, I.; Giridhar, V.; Hansen, N.; et al. PWM: Policy Learning with Multi-Task World Models. In Proceedings of the ICLR, 2025. [Google Scholar]
- Zheng, R.; Wang, J.; et al. FLARE: Robot Learning with Implicit World Modeling. arXiv 2025. [Google Scholar] [CrossRef]
- Ajay, A.; Du, Y.; Gupta, A.; et al. Is Conditional Generative Modeling all you need for Decision Making? In Proceedings of the ICLR, 2023. [Google Scholar]
- Zhang, K.; et al. PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation. In Proceedings of the NeuIPS, 2024. [Google Scholar]
- Liao, Y.; Zhou, P.; et al. Genie envisioner: A unified world foundation platform for robotic manipulation. arXiv 2025. [Google Scholar] [CrossRef]
- Zhang, C.; Wu, Z.; Lu, G.; et al. iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation. arXiv 2025. [Google Scholar]
- Wu, J.; Yin, S.; et al. ivideogpt: Interactive videogpts are scalable world models. NeurIPS 2024. [Google Scholar]
- Guo, J.; Ma, X.; Wang, Y.; et al. FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation. arXiv 2025. [Google Scholar] [CrossRef]
- Zhen, H.; Sun, Q.; et al. TesserAct: Learning 4D Embodied World Models. arXiv 2025. [Google Scholar] [CrossRef]
- Ferraro, S.; Nakano, A.; et al. When Object-Centric World Models Meet Policy Learning: From Pixels to Policies, and Where It Breaks. arXiv 2025. [Google Scholar] [CrossRef]
- Kapl, F.; et al. Object-Centric Representations Generalize Better Compositionally with Less Compute. In Proceedings of the ICLR Workshop, 2025. [Google Scholar]
- Chen, H.; Wang, B.; et al. World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation. arXiv 2025. [Google Scholar]
- Hamdan, S.; Güney, F. CarFormer: Self-driving with Learned Object-Centric Representations. In Proceedings of the ECCV, 2025. [Google Scholar]
- Jeong, Y.; Chun, J.; et al. Object-centric world model for language-guided manipulation. arXiv 2025. [Google Scholar]
- Liang, W.; Sun, G.; et al. PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model. arXiv 2025. [Google Scholar]
- Barcellona, L.; et al. Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination. In Proceedings of the ICLR, 2025. [Google Scholar]
- Goswami, R.G.; et al. OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation. arXiv 2025. [Google Scholar]
- López Escoriza, A.; et al. Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning. In Proceedings of the ICML, 2025. [Google Scholar]
- Qi, H.; Yin, H.; et al. Strengthening Generative Robot Policies through Predictive World Modeling. arXiv 2025. [Google Scholar] [CrossRef]
- Ajay, A.; Han, S.; et al. Compositional Foundation Models for Hierarchical Planning. In Proceedings of the NeurIPS, 2023. [Google Scholar]
- Pezzato, C.; et al. Mobile Manipulation with Active Inference for Long-Horizon Rearrangement Tasks. arXiv 2025. [Google Scholar] [CrossRef]
- Nhu, A.N.; et al. Time-Aware World Model for Adaptive Prediction and Control. In Proceedings of the ICML, 2025. [Google Scholar]
- Luo, Y.; Sun, C.; et al. Potential Based Diffusion Motion Planning. In Proceedings of the ICML, 2024. [Google Scholar]
- Li, Y.; Wei, X.; et al. ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance. arXiv 2025. [Google Scholar] [CrossRef]
- Chen, Z.; Huo, J.; Chen, Y.; Gao, Y. RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation. arXiv 2025. [Google Scholar]
- Song, Z.; Qin, S.; Chen, T.; Lin, L.; Wang, G. Physical Autoregressive Model for Robotic Manipulation without Action Pretraining. arXiv 2025. [Google Scholar] [CrossRef]
- Lykov, A.; Sam, J.; et al. PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models. arXiv 2025. [Google Scholar] [CrossRef]
- Zhou, S.; Du, Y.; Chen, J.; et al. RoboDreamer: Learning Compositional World Models for Robot Imagination. In Proceedings of the ICML, 2024; pp. 61885–61896. [Google Scholar]
- Luo, Y.; Du, Y. Grounding Video Models to Actions through Goal Conditioned Exploration. In Proceedings of the ICLR, 2025. [Google Scholar]
- Routray, S.; Pan, H.; et al. ViPRA: Video Prediction for Robot Actions. arXiv 2025. [Google Scholar] [CrossRef]
- Yang, X.; Li, B.; et al. ORV: 4D Occupancy-centric Robot Video Generation. arXiv 2025. [Google Scholar] [CrossRef]
- Qian, Z.; Chi, X.; et al. WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation. arXiv 2025. [Google Scholar]
- Fu, X.; Wang, X.; et al. Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control. arXiv 2025. [Google Scholar] [CrossRef]
- Feng, Y.; Tan, H.; et al. Vidar: Embodied Video Diffusion Model for Generalist Bimanual Manipulation. arXiv 2025. [Google Scholar]
- Huang, Y.; Zhang, J.; et al. LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation. arXiv 2025. [Google Scholar]
- Li, S.; Hao, Q.; Shang, Y.; Li, Y. KeyWorld: Key Frame Reasoning Enables Effective and Efficient World Models. arXiv 2025. [Google Scholar] [CrossRef]
- Bar, A.; Zhou, G.; Tran, D.; Darrell, T.; LeCun, Y. Navigation world models. In Proceedings of the CVPR, 2025. [Google Scholar]
- Yang, Y.; Liu, J.; Zhang, Z.; et al. MindJourney: Test-Time Scaling with World Models for Spatial Reasoning. arXiv 2025. [Google Scholar]
- Hu, Y.; et al. Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation. arXiv 2025. [Google Scholar] [CrossRef]
- Yao, X.; et al. NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments. arXiv 2025. [Google Scholar]
- Dong, Y.; et al. Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation. arXiv 2025. [Google Scholar] [CrossRef]
- Shah, D.; et al. Rapid Exploration for Open-World Navigation with Latent Goal Models. arXiv 2021. [Google Scholar]
- Nie, D.; et al. WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation. arXiv 2025. [Google Scholar] [CrossRef]
- Wang, W.; et al. Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model. arXiv 2025. [Google Scholar]
- Alcedo, K.; et al. Perspective-Shifted Neuro-Symbolic World Models: A Framework for Socially-Aware Robot Navigation. arXiv 2025. [Google Scholar]
- Li, H.; et al. Scaling Inference-Time Search with Vision Value Models for Improved Visual Comprehension. In Proceedings of the ICLR Workshop, 2025. [Google Scholar]
- Damm, E.R.; et al. Kinodynamic Motion Planning for Mobile Robot Navigation across Inconsistent World Models. In Proceedings of the RSS Workshop on Resilient Off-road Autonomous Robotics, 2025. [Google Scholar]
- Liu, W.; et al. X-mobility: End-to-end generalizable navigation via world modeling. In Proceedings of the ICRA, 2025; pp. 7569–7576. [Google Scholar]
- Miller, T.; et al. FalconWing: An Ultra-Light Fixed-Wing Platform for Indoor Aerial Applications. In Proceedings of the NeurIPS Workshop, 2025. [Google Scholar]
- Deng, Y.; Hanna, J.P. Abstract Sim2Real through Approximate Information States. In Proceedings of the NeurIPS Workshop, 2025. [Google Scholar]
- Zhou, S.; et al. Learning 3D Persistent Embodied World Models. arXiv 2025. [Google Scholar]
- Yoo, M.; et al. World Model Implanting for Test-time Adaptation of Embodied Agents. arXiv 2025. [Google Scholar] [CrossRef]
- Yokozawa, R.; et al. Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation. arXiv 2025. [Google Scholar] [CrossRef]
- Zhu, F.; Yan, Z.; et al. WMPO: World Model-based Policy Optimization for Vision-Language-Action Models. arXiv 2025. [Google Scholar]
- Jiang, Z.; Liu, K.; Qin, Y.; et al. World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation. arXiv 2025. [Google Scholar] [CrossRef]
- Li, Z.; Han, X.; et al. DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions. arXiv 2025. [Google Scholar] [CrossRef]
- Alles, M.; et al. Latent Action World Models for Control with Unlabeled Trajectories. arXiv 2025. [Google Scholar] [CrossRef]
- Zhang, L.; Kan, M.; et al. Prelar: World model pre-training with learnable action representation. In Proceedings of the ECCV, 2024. [Google Scholar]
- Goswami, R.G.; Bar, A.; et al. World Models Can Leverage Human Videos for Dexterous Manipulation. arXiv 2025. [Google Scholar] [CrossRef]
- He, Z.; Ai, B.; et al. Scaling Cross-Embodiment World Models for Dexterous Manipulation. arXiv 2025. [Google Scholar] [CrossRef]
- Lee, S.; Jung, Y.; et al. TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos. arXiv 2025. [Google Scholar]
- Zhi, H.; Chen, P.; et al. 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model. arXiv 2025. [Google Scholar]
- Guo, Y.; Shi, L.X.; et al. Ctrl-world: A controllable generative world model for robot manipulation. arXiv 2025. [Google Scholar] [CrossRef]
- Li, C.; et al. Robotic world model: A neural network simulator for robust policy optimization in robotics. arXiv 2025. [Google Scholar] [CrossRef]
- Quevedo, J.; Sharma, A.K.; et al. WorldGym: World Model as An Environment for Policy Evaluation. arXiv 2025. [Google Scholar]
- Li, Y.; et al. WorldEval: World Model as Real-World Robot Policies Evaluator. arXiv 2025. [Google Scholar]
- Zhang, L.; Xiong, Y.; Yang, Z.; et al. Copilot4d: Learning unsupervised world models for autonomous driving via discrete diffusion. In Proceedings of the ICLR, 2024. [Google Scholar]
- Min, C.; Zhao, D.; Xiao, L.; Nie, Y.; Dai, B. Uniworld: Autonomous driving pre-training via world models. arXiv 2023. [Google Scholar]
- Wang, Y.; He, J.; Fan, L.; et al. Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving. In Proceedings of the CVPR, 2024. [Google Scholar]
- Hu, A.; Russell, L.; Yeo, H.; et al. Gaia-1: A generative world model for autonomous driving. arXiv 2023. [Google Scholar] [CrossRef]
- Li, Q.; Jia, X.; Wang, S.; et al. Think2drive: Efficient reinforcement learning by thinking with latent world model for autonomous driving (in carla-v2). In Proceedings of the ECCV, 2024. [Google Scholar]
- Wang, H.; Ye, X.; et al. Adawm: Adaptive world model based planning for autonomous driving. ICLR 2025. [Google Scholar]
- Li, Y.; Shang, S.; et al. DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving. arXiv 2025. [Google Scholar]
- Lai, H.; Cao, J.; et al. World model-based perception for visual legged locomotion. In Proceedings of the ICRA, 2025. [Google Scholar]
- Sun, W.; Chen, L.; et al. Learning humanoid locomotion with world model reconstruction. arXiv 2025. [Google Scholar] [CrossRef]
- Raja, G.; Agishev, R.; et al. ProTerrain: Probabilistic Physics-Informed Rough Terrain World Modeling. arXiv 2025. [Google Scholar]
- Gu, X.; Wang, Y.J.; et al. Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning. RSS 2024. [Google Scholar]
- Liu, H.; Gao, Y.; et al. Ego-Vision World Model for Humanoid Contact Planning. arXiv 2025. [Google Scholar] [CrossRef]
- Wu, Z.; Ni, J.; et al. Holodrive: Holistic 2d-3d multi-modal street scene generation for autonomous driving. arXiv 2024. [Google Scholar]
- Huang, Z.; Zhang, J.; et al. Neural volumetric world models for autonomous driving. In Proceedings of the ECCV, 2024. [Google Scholar]
- Yan, Z.; Dong, W.; Shao, Y.; et al. Renderworld: World model with self-supervised 3d label. In Proceedings of the ICRA, 2025. [Google Scholar]
- Zhang, H.; et al. Machine learning methods for weather forecasting: A survey. Atmosphere 2025. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. Skilful nowcasting of extreme precipitation with NowcastNet. Nature 2023. [Google Scholar] [CrossRef]
- Min, C.; Zhao, D.; Xiao, L.; et al. Driveworld: 4d pre-trained scene understanding via world models for autonomous driving. In Proceedings of the CVPR, 2024. [Google Scholar]
- Zheng, W.; Chen, W.; et al. Occworld: Learning a 3d occupancy world model for autonomous driving. In Proceedings of the ECCV, 2024. [Google Scholar]
- Wang, X.; et al. Drivedreamer: Towards real-world-drive world models for autonomous driving. In Proceedings of the ECCV, 2024. [Google Scholar]
- Zhao, G.; Wang, X.; et al. Drivedreamer-2: Llm-enhanced world models for diverse driving video generation. In Proceedings of the AAAI, 2025. [Google Scholar]
- Guo, X.; Ding, C.; et al. Infinitydrive: Breaking time limits in driving world models. arXiv 2024. [Google Scholar] [CrossRef]
- Lyu, J.; Li, Z.; et al. DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation. arXiv 2025. [Google Scholar]
- Zheng, W.; Xia, Z.; et al. Doe-1: Closed-loop autonomous driving with large world model. arXiv 2024. [Google Scholar] [CrossRef]
- Yu, J.; et al. Gamefactory: Creating new games with generative interactive videos. arXiv 2025. [Google Scholar] [CrossRef]
- Zhou, X.; Liu, J.; et al. Social World Models. arXiv 2025. [Google Scholar] [PubMed]
- Team, F.D. SocioVerse: A World Model for Social Simulation Powered by LLM Agents and a Pool of 10 Million Real-World Users. arXiv 2025. [Google Scholar]
- Yang, Y.; et al. TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Zhang, C.; Shi, J.; Sui, Y. A Virtual Reality-Integrated System for Behavioral Analysis in Neurological Decline. In Proceedings of the ICLR Workshop, 2025. [Google Scholar]
- Sun, L.; Huang, H.; et al. Bidding for Influence: Auction-Driven Diffusion Image Generation. In Proceedings of the ICML Workshop, 2025. [Google Scholar]
- Cao, D.Y.; et al. Effectively Designing 2-Dimensional Sequence Models for Multivariate Time Series. In Proceedings of the ICLR Workshop, 2025. [Google Scholar]
- Lab, L.; et al. ODesign: A World Model for Biomolecular Interaction Design. arXiv 2025. [Google Scholar] [CrossRef]
- Yang, Z.; Song, X.; et al. Xray2Xray: World Model from Chest X-rays with Volumetric Context. arXiv 2025. [Google Scholar]
- Yue, Y.; Wang, Y.; Jiang, H.; et al. EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance. In Proceedings of the CVPR, 2025; pp. 25993–26003. [Google Scholar]
- Koju, S.; Bastola, S.; et al. Surgical vision world model. In Proceedings of the MICCAI Workshop, 2025. [Google Scholar]
- Wu, H.; Gao, Y.; et al. Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models. arXiv 2025. [Google Scholar] [CrossRef]
- Park, K.; et al. PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data. In Proceedings of the ICLR Workshop, 2025. [Google Scholar]
- Luo, X.; et al. Reconstructing Dynamics from Steady Spatial Patterns with Partial Observations. In Proceedings of the ICLR Workshop, 2025. [Google Scholar]
- Team, H.; et al. Hunyuanworld 1.0: Generating immersive, explorable, and interactive 3d worlds from words or pixels. arXiv 2025. [Google Scholar] [CrossRef]
- Valevski, D.; Leviathan, Y.; et al. Diffusion Models Are Real-Time Game Engines. In Proceedings of the ICLR, 2025. [Google Scholar]
- Decart, E.; McIntyre, Q.; et al. Oasis: A universe in a transformer. Technical Report 2024. [Google Scholar]
- Zhang, Y.; Peng, C.; et al. Matrix-Game: Interactive World Foundation Model. arXiv 2025. [Google Scholar] [CrossRef]
- He, X.; Peng, C.; et al. Matrix-game 2.0: An open-source, real-time, and streaming interactive world model. arXiv 2025. [Google Scholar]
- Sun, W.; Wei, F.; et al. From Virtual Games to Real-World Play. arXiv 2025. [Google Scholar] [CrossRef]
- Yang, Z.; et al. Matrix-3d: Omnidirectional explorable 3d world generation. arXiv 2025. [Google Scholar] [CrossRef]
- Gu, Y.; Zhang, K.; Ning, Y.; et al. Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents. Transactions on Machine Learning Research 2025. [Google Scholar]
- Gao, Y.; Ye, J.; Wang, J.; Sang, J. Websynthesis: World-model-guided mcts for efficient webui-trajectory synthesis. arXiv 2025. [Google Scholar]
- Fang, T.; Zhang, H.; et al. WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model. arXiv 2025. [Google Scholar]
- Deng, M.; Hou, J.; et al. SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model. arXiv 2025. [Google Scholar]
- Yu, X.; Peng, B.; et al. Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents. arXiv 2025. [Google Scholar] [CrossRef]
- Rivard, L.; Sun, S.; et al. Neuralos: Towards simulating operating systems via neural generative models. arXiv 2025. [Google Scholar] [CrossRef]
- Luo, D.; et al. ViMo: A Generative Visual GUI World Model for App Agents. arXiv 2025. [Google Scholar]
- Yin, X.; Luo, X.; et al. Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach. arXiv 2025. [Google Scholar] [CrossRef]
- Mei, K.; et al. R-WoM: Retrieval-augmented World Model For Computer-use Agents. arXiv 2025. [Google Scholar]
- Richens, J.; et al. General agents need world models. In Proceedings of the ICML, 2025. [Google Scholar]
- Spies, A.F.; et al. Transformers Use Causal World Models in Maze-Solving Tasks. In Proceedings of the ICLR Workshop, 2025. [Google Scholar]
- Rohekar, R.Y.; et al. A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment. arXiv 2024. [Google Scholar]
- Zhang, T.; et al. When Do Neural Networks Learn World Models? arXiv 2025. [Google Scholar] [CrossRef]
- Tehenan, M.; et al. Linear Spatial World Models Emerge in Large Language Models. arXiv 2025. [Google Scholar] [CrossRef]
- Yuan, Y.; et al. Revisiting the Othello World Model Hypothesis. arXiv 2025. [Google Scholar] [CrossRef]
- Zhao, W.; et al. Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey. IEEE Symposium Series on Computational Intelligence 2020. [Google Scholar]
- ;, *!!! REPLACE !!!*; et al.; OpenAI Solving Rubik’s Cube with a Robot Hand. arXiv 2019. [Google Scholar]
- Tobin, J.; et al. Domain randomization for transferring deep neural networks from simulation to the real world. IROS 2017. [Google Scholar]
- Tao, Z.; et al. A Survey on Self-Evolution of Large Language Models. arXiv 2024. [Google Scholar] [CrossRef]
- Wu, T.; Yuan, W.; et al. Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge. In Proceedings of the EMNLP, 2025. [Google Scholar]
- Madaan, A.; et al. Self-Refine: Iterative Refinement with Self-Feedback. In Proceedings of the NeurIPS, 2023. [Google Scholar]
- Weng, Y.; et al. Large Language Models are Better Reasoners with Self-Verification. In Proceedings of the EMNLP, 2023. [Google Scholar]
- Fu, S.; et al. Self-Verification Provably Prevents Model Collapse in Recursive Synthetic Training. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Yuan, W.; et al. Self-Rewarding Language Models. In Proceedings of the ICML, 2024. [Google Scholar]
- Chen, Z.; et al. Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models. In Proceedings of the ICML, 2024. [Google Scholar]
- Wang, Y.; et al. Self-play reinforcement learning guides protein engineering. Nature Machine Intelligence 2023. [Google Scholar] [CrossRef]
- Wang, Y.; Kordi, Y.; et al. Self-Instruct: Aligning Language Models with Self-Generated Instructions. In Proceedings of the ACL, 2022. [Google Scholar]
- Wu, Y.; et al. Self-Play Preference Optimization for Language Model Alignment. In Proceedings of the ICLR, 2025. [Google Scholar]
- Fu, S.; Wang, Y.; et al. A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops. In Proceedings of the ICLR, 2025. [Google Scholar]
- DeepSeek-AI.; et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 2025.
- Pearce, T.; et al. Scaling Laws for Pre-training Agents and World Models. arXiv 2024. [Google Scholar] [CrossRef]
- Radji, W.; et al. How Hard is it to Confuse a World Model? arXiv 2025. [Google Scholar] [CrossRef]
- Bain, M.; Nagrani, A.; Varol, G.; et al. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval. In Proceedings of the ICCV, 2021. [Google Scholar]
- Chen, T.S.; et al. Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers. CVPR 2024. [Google Scholar]
- Grauman, K.; Westbury, A.; et al. Ego4D: Around the World in 3,000 Hours of Egocentric Video. In Proceedings of the CVPR, 2022. [Google Scholar]
- Miech, A.; et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips. ICCV 2019. [Google Scholar]
- Duan, H.; Yu, H.X.; Chen, S.; et al. WorldScore: A Unified Evaluation Benchmark for World Generation. arXiv 2025. [Google Scholar] [CrossRef]
- Padalkar, A.; Pooley, A.; Jain, A.; et al. Open X-Embodiment: Robotic Learning Datasets and RT-X Models: Open X-Embodiment Collaboration0. ICRA 2024. [Google Scholar]
- Ku, A.; Anderson, P.; et al. Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding. In Proceedings of the EMNLP, 2020. [Google Scholar]
- Yue, H.; Huang, S.; et al. EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models. arXiv 2025. [Google Scholar] [CrossRef]
- Yu, T.; et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning. arXiv 2019. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; et al. nuScenes: A Multimodal Dataset for Autonomous Driving. CVPR 2020. [Google Scholar]
- Arai, H.; Ishihara, K.; et al. ACT-Bench: Towards Action Controllable World Models for Autonomous Driving. arXiv 2024. [Google Scholar] [CrossRef]
- Chandrasekaran, S.N.; Ackerman, J.; et al. JUMP Cell Painting dataset: morphological impact of 136,000 chemical and genetic perturbations. BioRxiv 2023. [Google Scholar]
- Morshid, A.; Elsayes, K.M.; et al. A machine learning model to predict hepatocellular carcinoma response to transcatheter arterial chemoembolization. Radiology: Artificial Intelligence 2019. [Google Scholar] [CrossRef] [PubMed]
- Bellemare, M.G.; et al. The Arcade Learning Environment: An Evaluation Platform for General Agents. ArXiv 2012. [Google Scholar] [CrossRef]
- Guss, W.H.; Houghton, B.; et al. MineRL: A Large-Scale Dataset of Minecraft Demonstrations. In Proceedings of the IJCAI, 2019. [Google Scholar]
- Xie, T.; Zhang, D.; et al. OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. In Proceedings of the NeurIPS, 2024. [Google Scholar]
- Bonatti, R.; Zhao, D.; et al. Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale. In Proceedings of the ICML, 2025. [Google Scholar]
- Hüyük, A.; et al. Reasoning Elicitation in Language Models via Counterfactual Feedback. In Proceedings of the ICLR, 2025. [Google Scholar]
- Xiang, X.; Chen, Y.; et al. Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation. arXiv 2025. [Google Scholar]
- Li, J.; Feng, W.; et al. T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback. In Proceedings of the NeurIPS, 2024. [Google Scholar]
- Xing, J.; Xia, M.; et al. DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors. ECCV 2024. [Google Scholar]
| 1 | |
| 2 | Multibody dynamics for biomedical research: PDF. |
| 3 | NVIDIA PhysX SDK document: Docs. |
| 4 | design and use paradigms for Gazebo: PDF. |
| 5 | Professional mobile robot simulation: PDF. |
| 6 | |
| 7 | PyBullet project site: Pybullet.org. PyBullet quickstart guide: PDF. |
| 8 | |
| 9 | |
| 10 | |
| 11 | Gen-3 project site: Github.io. |











![]() |
= Simulation,
= Planning,
= Decision-making.
= Simulation,
= Planning,
= Decision-making.![]() |
![]() |
![]() |
![]() |
![]() |
= Simulation,
= Planning,
= Decision-making.
= Simulation,
= Planning,
= Decision-making.![]() |
![]() |
![]() |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).








