Computer Science and Mathematics

Sort by

Review
Computer Science and Mathematics
Robotics

William Lawless

Abstract: In this review article, we introduce the problem of team interaction, cover the mathematics, results, discussion, conclusions, and a path forward. To begin, cognitive science assumes a 1:1 relationship between beliefs and actions, whether with games, concepts, preferences, rational choices, eyewitness accounts, or self-reported pain. Unfortunately, it generalizes to reinforcement for generative-AI (gen-AI), a lower form of learning which can not account for higher-level cognition, the resistance of biases to be rectified, the inability to predict successfully without updated information, and supports Planck’s lament that physics evolves one funeral at a time. The problem with 1:1 beliefs-to-reality is that observations of social interaction only produce separable independent and identically distributed (i.i.d.) data, which, by definition, cannot reconstruct the interactions observed. Presently, Gen-AI uses separable i.i.d. data, preventing Gen-AI models from replicating interdependent human systems. Failing to account for interdependence, classical models of teams do not generalize, nor do their models predict advantages. Solving this problem is critical to advancing the science of teams arbitrarily composed of human-AI-machine-robot members. In contrast, based on interdependence, choosing how to “squeeze" uncertainty in our quantum-like (Q-L) model of teams, generalizes (e.g., vulnerability, espionage, time-energy tradeoffs), models self-organization’s ability to provide advantages (e.g., innovation) not possible under command decision-making (viz., authoritarianism), and may solve the hard-to-find connection between mind and reality. Our results suggest that humans have dual cognitive systems, one being cognition and the other embodied, but hidden, interdependence, which Simon was unable to capture and Kahneman had begun to address, our exemplar being Einstein’s decade-long struggle to construct his concept of general relativity. In the future, we propose that coupled tuning “squeezes" interdependent information to produce the advantages we have found over CDM and current AI risks.

Article
Computer Science and Mathematics
Robotics

Ashwin Kumar

,

P. Bavithra Matharasi

Abstract: Nano-UAVs weighing under 50g have become useful IoT platforms for GPS-denied navigation, but fitting a neural network into their sub-512kB memory and sub-100mW power budget remains an open engineering problem. PULP-Dronet v3 tackles this with depthwise separable (D+P) blocks and a channel-reduction factor γ. Even so, its most compressed variant (γ = /8, 1.1M MACs) loses 6 percentage points of collision accuracy versus the full model. Methods: We swap the 5×5 first convolution for a 3×3 depthwise + 1×1 pointwise pair, and retrain with cosine-annealing scheduling and per-epoch color-jitter augmentation. Results: At γ = /4 the model has 6409 parameters, needs only 540K MACs, and scores 83.97% collision accuracy with 0.372 steering RMSE on the official benchmark—+2.97pp over the same-γ baseline at 4.4× less compute. The full γ = /1 model (12M MACs) reaches 84%; our model nearly matches it with 22× fewer operations. Conclusions: Factorizing the stem and adjusting the training recipe recovers most of the accuracy lost to aggressive channel reduction, without adding inference cost.

Article
Computer Science and Mathematics
Robotics

Jiawei Li

,

Jiarui Yang

,

Peidong Liu

,

Shu-Tao Xia

,

Liang Lin

Abstract: World models aim to enable agents to perceive states, predict future outcomes, and reason for decision-making by simulating real-world environments, and are widely regarded as a crucial pathway toward artificial general intelligence (AGI). Video, as one of the most accessible and intuitively representative media of dynamic environments, naturally contains rich implicit representations of the physical world. Consequently, learning world models from videos has become a prominent research direction. However, a significant gap remains between video data and the real physical world: videos capture only superficial visual phenomena and lack explicit representations of three-dimensional structure, physical properties, and causal mechanisms. This limitation severely constrains the physical consistency and practical applicability of world models. Motivated by this, the present work provides a prospective study of recent research in this domain, encompassing: (1) key challenges arising from the video–physical world gap and representative solutions; (2) three major construction paradigms of physical world models; (3) a thorough summary of existing evaluation benchmarks; and (4) future research directions and discussions. It is noteworthy that this study is the first to systematically examine video-driven world model research from the perspective of physical world. In contrast to prior study that primarily focus on generative modeling or provide broad overviews, this work emphasizes world models with tangible physical grounding, explicitly excluding generative tasks such as video synthesis or 3D/4D modeling that diverge conceptually from the goal of modeling the physical world. Adopting a problem-oriented perspective, this study aims to provide subsequent researchers with a systematic framework and decision-making guidance for understanding existing work, designing innovative methods, and facilitating the deployment of world models in real-world applications.

Article
Computer Science and Mathematics
Robotics

Shuang Liu

,

Lei Wei

,

Xiaoqing Li

Abstract: Autonomous tracked amphibious robotic systems operating across water and land environments are essential for coastal inspection, disaster response, environmental monitoring, and complex terrain exploration. However, discontinuous water-land dynamics, unstable medium switching, and safety-critical control under environmental uncertainty pose significant challenges to existing amphibious navigation and path planning methods, where global reachability and adaptive decision-making are difficult to unify. Motivated by these challenges, this paper proposes CD-HSSRL, a Cross-Domain Hierarchical Safe-Switching Reinforcement Learning framework for autonomous tracked amphibious navigation. Specifically, a Cross-Domain Global Reachability Planner is developed to construct unified cost representations across heterogeneous water-land environments, a Hierarchical Safe Switching Policy enables stable medium-transition decision-making through option-based policy decomposition with switching regularization, and a Safety-Constrained Continuous Controller integrates action safety projection and risk-sensitive reward shaping to ensure collision-free control during complex shoreline interactions. These components are jointly optimized in an end-to-end manner to achieve robust cross-domain navigation. Comprehensive experiments on WaterScenes, MVTD, BARN, and Gazebo cross-domain benchmarks demonstrate that CD-HSSRL consistently outperforms state-of-the-art baselines, achieving up to 15% improvement in cross-domain transition success rate and 40% reduction in collision rate. Robustness and ablation studies further verify the effectiveness of hierarchical switching and safety-constrained control mechanisms. Overall, this work establishes a unified solution for safe and reliable cross-domain navigation of tracked amphibious robotic systems, providing new insights into hierarchical safe-switching architectures for multi-medium autonomous robots.

Review
Computer Science and Mathematics
Robotics

Fatma A.S. Alwafi

,

Reza Saatchi

Abstract: Path planning is critical for multi-robot systems (MRS), directly affecting task efficiency, execution time and operational cost. Despite extensive research and the successful application of numerous algorithms, achieving globally optimal solutions in cluttered or dynamic environments remains a significant challenge. Issues such as scalability with in-creasing numbers of robots, computational efficiency, system robustness, and coordination complexity continue to drive the development of more reliable approaches. This study reviews modelling approaches, optimisation criteria, and solution algorithms based on roadmap planning methods that are widely used for multi-robot path planning (MRPP). It focuses on three graph-based algorithms: Multi-Robot Path Planning algorithm, Central Algorithm (CA), and the Optimisation Central Algorithm (OCA). These algorithms utilise visibility graphs (VG) for environment representation and the Dijkstra’s algorithm for shortest-path computation, while incorporating algebraic connectivity to improve coordination, safety and scalability. In addition, the technological context and implementation platforms, including simulation environments, cloud robotics, and AI-based frameworks, are conceptually examined. The potential applications of these methods in assistive robotics are highlighted, particularly in supporting safe and reliable navigation in healthcare and human-centered environments. The paper synthesises theoretical and practical insights, identifies current limitations and challenges, and outlines future re-search directions for efficient, scalable and robust MRPP.

Article
Computer Science and Mathematics
Robotics

Jack M. Vice

,

Gita Sukthankar

Abstract: Traditional social navigation systems often treat perception and motion as decoupled tasks, leading to reactive behaviors and perceptual surprise due to limited field of view. While active vision—the ability to choose where to look—offers a solution, most existing frameworks decouple sensing from execution to simplify the learning process. This article introduces a novel joint reinforcement learning (RL) framework (Active Vision for Social Navigation) that unifies locomotion and discrete gaze control within a single, end-to-end policy. Unlike existing factored approaches, our method leverages a model-based RL architecture with a latent world model to explicitly address the credit assignment problem inherent in active sensing. Experimental results in cluttered, dynamic environments demonstrate that our joint policy outperforms factored sensing-action approaches by prioritizing viewpoints specifically relevant to social safety, such as checking blind spots and tracking human trajectories. Our findings suggest that tight sensorimotor coupling is essential for reducing perceptual surprise and ensuring safe, socially aware navigation in unstructured spaces.

Review
Computer Science and Mathematics
Robotics

Yunjia Sun

,

Tao Wang

Abstract: Autonomous robots are increasingly integrated into social contexts, making affective human–robot interaction (HRI) critical for their effectiveness and acceptance. However, existing research remains dispersed across domains and techniques, lacking a unified framework to characterize core robotic capabilities. To address this gap, we adopt a capability-oriented perspective and conduct a comprehensive literature review, through which we propose a structured taxonomy of capabilities for robots in affective HRI. The taxonomy comprises five core dimensions: Perception (recognizing human internal states), Strategy (planning responses based on human states and context), Expression (conveying robot lifelikeness and social presence), Sustainability (maintaining effective and reliable operation over time), and Ethics (ensuring behavior within ethical constraints). By organizing diverse research efforts into a structured framework, this taxonomy provides a systematic foundation for designing socially competent robots and guiding future research.

Communication
Computer Science and Mathematics
Robotics

Jingshu Shi

,

Hongwu Zhu

,

Yifei Yang

,

Bowen Liu

,

Xingjun Wang

Abstract: Data-driven exoskeletons promise adaptive augmentation of human mobility. Yet their widespread adoption is hindered by labor-intensive biomechanical data collection and extensive manual tuning. This study presents a highly efficient, simulation-generated synthetic data approach. It also designs a model-free algorithm for variable-speed walking to validate the method. We leveraged an Adversarial Motion Priors (AMP) agent to learn stylized walking within a massively parallel, physics-based simulation. The resulting high-fidelity data were collected and validated against OpenSim inverse dynamics pipelines. A novel CNN-Transformer architecture was developed to map contralateral swing-phase sensor data to variable-length push-off torque profiles. This enables real-time, adaptive torque assistance for exoskeletons. Experimental validation on a custom ankle exoskeleton demonstrated robust sim-to-real transferability. The system achieved approximately 85% torque prediction accuracy across speeds ranging from 0.6 to 1.75 m·s⁻¹. The controller significantly reduced user ankle positive mechanical work, thereby lowering metabolic demand. Furthermore, our multi-sensor configuration exhibited inherent fault tolerance, ensuring safe operation even under partial sensor failure. By replacing handcrafted control strategies with a scalable, data-driven approach, this work offers a practical pathway toward deploying autonomous exoskeletons in unconstrained, real-world environments.

Article
Computer Science and Mathematics
Robotics

Xianglong Liu

,

Zhangsong Shi

,

Huihui Xu

,

Hao Wu

,

Jialuo Jiang

Abstract: This paper proposes an adaptive prescribed performance sliding mode control method to address the input saturation problem in quadrotor UAVs. An offset function is designed to ensure that the initial values of the system errors always lie within the performance envelope. A first-order system is introduced to analyze error violation and compensate for the performance bounds, thereby enhancing system stability. An anti-windup auxiliary system and a non-singular fast terminal backstepping sliding mode controller are developed to mitigate the adverse effects of input saturation. A piecewise variable rate reaching law is designed to reduce controller chattering. An RBF neural network observer is constructed to compensate online for system modeling uncertainties and external disturbances. The uniform ultimate boundedness of all state errors is rigorously proved using Lyapunov theory. Simulation results demonstrate that, compared to traditional adaptive sliding mode control and PID control, the proposed method reduces the RMSE of the desired trajectory tracking error by 18.5% and 12.9%, respectively, and decreases the IAE by 34.3% and 23.3%, respectively, validating the effectiveness and superiority of the algorithm.

Article
Computer Science and Mathematics
Robotics

Kentaro Yamada

,

Nicholas Campbell

Abstract: Human-Robot Collaboration (HRC) holds significant potential but is hindered by real-world complexity, dynamism, and ambiguous human instructions. This paper introduces CADE-HRI, a novel multi-modal HRC system enabling natural and flexible interaction for assembly tasks. CADE-HRI integrates diverse sensor inputs—natural language, gesture, real-time visual perception (e.g., object pose, gaze), and force/torque feedback—fusing them into a Multi-modal Large Language Model (MM-LLM). The MM-LLM serves as central intelligence, orchestrating dynamic task planning, autonomous adaptation to anomalies, and intelligent conflict resolution to generate robust robot actions. Our methodology emphasizes system integration and prompt engineering with pre-trained models. Experimental validation, using fictitious data, demonstrates CADE-HRI significantly outperforms traditional scripted, NLP-Only, and VLM-Adapt baselines in task completion, efficiency, and robustness across complex assembly tasks with dynamic changes and ambiguous instructions. Human-centric evaluations indicate superior user satisfaction, and ablation studies confirm the synergistic contribution of multi-modal inputs. This work affirms the efficacy of integrating multi-modal perception with MM-LLM-driven dynamic planning to enhance collaborative robot performance and user experience in complex, unstructured workspaces.

Article
Computer Science and Mathematics
Robotics

Reza Arablouei

Abstract: We present an efficient incremental SLAM back-end that achieves the accuracy of full batch optimization while substantially reducing computational cost. The proposed approach combines two complementary ideas: information-guided gating (IGG) and selective partial optimization (SPO). IGG employs an information-theoretic criterion based on the log-determinant of the information matrix to quantify the contribution of new measurements, triggering global optimization only when a significant information gain is observed. This avoids unnecessary relinearization and factorization when incoming data provide little additional information. SPO executes multi-iteration Gauss-Newton (GN) updates but restricts each iteration to the subset of variables most affected by the new measurements, dynamically refining this active set until convergence. Together, these mechanisms retain all measurements to preserve global consistency while focusing computation on parts of the graph where it yields the greatest benefit. We provide a theoretical local perturbation analysis showing that, under standard regularity assumptions for GN, the proposed approach tracks full GN up to a neighborhood whose size is controlled by the approximation thresholds. Moreover, when the effective approximation error introduced by localization and screening vanishes asymptotically, it recovers the same local minimizer and asymptotic convergence rate as full GN. Extensive experiments on benchmark SLAM datasets show that our approach consistently matches the estimation accuracy of batch solvers, while achieving significant computational savings compared to conventional incremental approaches. Such efficiency is particularly important for mobile robots operating under onboard compute constraints, where timely state estimation is critical for localization, mapping, and downstream navigation and control. The results indicate that the proposed approach offers a principled balance between accuracy and efficiency, making it a robust and scalable solution for real-time robotic localization and mapping in dynamic, data-rich environments.

Review
Computer Science and Mathematics
Robotics

Utkarsh Grover

,

Ravi Ranjan

,

Mingyang Mao

,

Trung Tien Dong

,

Satvik Praveen

,

Zhenqi Wu

,

Morris Chang

,

Tinoosh Mohsenin

,

Yi Sheng

,

Agoritsa Polyzou

+2 authors

Abstract: Deploying foundation models in embodied edge systems is fundamentally a systems problem, not just a problem of model compression. Real-time control must operate within strict size, weight, and power constraints, where memory traffic, compute latency, timing variability, and safety margins interact directly. The Deployment Gauntlet organizes these constraints into eight coupled barriers that determine whether embodied foundation models can run reliably in practice. Across representative edge workloads, autoregressive Vision-Language-Action policies are constrained primarily by memory bandwidth, whereas diffusion-based controllers are limited more by compute latency and sustained execution cost. Reliable deployment therefore depends on system-level co-design across memory, scheduling, communication, and model architecture, including decompositions that separate fast control from slower semantic reasoning.

Article
Computer Science and Mathematics
Robotics

Qinglin Yang

,

Sheng Liu

Abstract: Elastic couplings and flexible joints introduce lightly damped vibration modes that significantly complicate stabilization of nonlinear, underactuated systems. This paper studies a spring-coupled cart–inverted-pendulum benchmark inspired by the Quanser Linear Flexible Joint with Inverted Pendulum platform, where a motor-driven cart excites a passive cart through a spring–damper connection and the pendulum is mounted on the passive cart. The control objective is to stabilize the pendulum near the upright equilibrium while simultaneously regulating spring deflection and suppressing vibration. To avoid manual derivation of high-order analytical dynamics for this coupled system, we adopt a model-based reinforcement learning framework that learns task-oriented latent dynamics and performs online receding-horizon planning. Concretely, we implement Task-Oriented Latent Dynamics (TOLD) for learning a compact latent model and Temporal- Difference Model Predictive Control (TD-MPC) for MPPI-style trajectory optimization in latent space. We evaluate TD-MPC in a high-fidelity Isaac Sim / Isaac Lab simulation and compare it against a model-free PPO baseline under the same observation and action interfaces. Training curves of physical variables and returns show that TD-MPC learns coordinated balancing and spring regulation with stable convergence behavior, while PPO achieves competitive balancing performance with more pronounced non-monotonic training dynamics and transient regressions. The study highlights when online planning with learned latent models is advantageous for elastically coupled mechanisms.

Article
Computer Science and Mathematics
Robotics

Israel Kolaïgué Bayaola

,

Jean Louis Ebongué Kedieng Fendji

,

Blaise Omer Yenke

,

Marcellin Atemkeng

,

Ibidun Christiana Obagbuwa

Abstract: The rapid proliferation of unmanned aerial vehicles (UAVs) in energy-intensive applications (such as autonomous logistics, continuous surveillance, and mobile edge computing) has driven a critical need for highly reliable energy consumption models. However, selecting an appropriate modeling strategy remains an ad-hoc process; researchers must frequently navigate complex, undocumented trade-offs among required predictive accuracy, empirical data availability, and access to aerodynamic testing infrastructure. This study proposes a systematic, two-stage decision-making framework designed to standardize UAV energy model selection. In the first stage, a qualitative decision tree is inductively derived from a comprehensive corpus of recent literature (an 80% training split), explicitly mapping infrastructural and informational constraints to five distinct modeling regimes, ranging from novel white-box derivations to deep-learning black-box applications. This structural logic is subsequently validated against an independent 20% literature holdout set, achieving a 100% predictive match. In the second stage, the Analytic Hierarchy Process (AHP) is applied to quantitatively rank the feasible alternatives based on context-specific criteria: accuracy, interpretability, development cost, and customization adaptability. Crucially, this quantitative scoring introduces "fallback flexibility," allowing researchers to seamlessly pivot to mathematically adjacent alternative models when unforeseen experimental roadblocks occur. Embedded within an open-source Python graphical interface, this framework mitigates methodological ambiguity, prevents the over-allocation of research resources, and fosters greater reproducibility within the energy-aware UAV research community.

Article
Computer Science and Mathematics
Robotics

Olena Pavliuk

,

Myroslav Mishchuk

Abstract: This article addresses the problem of finding the shortest route for Automated Guided Vehicles (AGVs) in a production environment with constrained battery state-of-charge (SoC) and time-dependent operating conditions. The route map is divided into a uniform grid containing stationary obstacles and two types of dynamic obstacles: human, for which AGV transportation is prohibited, and inanimate (moving objects), which impose a penalty function. A key contribution of the proposed methodology is the introduction of a battery residual charge matrix, which embeds cell-level energy feasibility into the environment representation by determining minimum admissible SoC constraints and accounting for transition-dependent energy costs. This matrix directly restricts the set of traversable cells under low-energy conditions. The proposed approach is based on the A* and D* Lite algorithms, providing shortest-path construction that explicitly integrates battery SoC into the spatio-temporal cost function. To avoid collisions in a multi-agent environment during routing, a simplified hybrid scheme with M* elements performs local coordination and adaptive trajectory replanning. The effectiveness of the proposed methodology was assessed using travel time, temporal complexity, and spatial complexity metrics. Simulation results on a 10×10 grid showed that agents with sufficient battery completed routes of 8 and 11 cells with travel times of 7.2 to 10.7 conventional units. A critically low-energy agent was initially unable to move, but after adjusting the minimum SoC constraint, all agents completed their routes with travel times up to 11.4 conventional units, demonstrating the direct impact of energy constraints on system performance.

Review
Computer Science and Mathematics
Robotics

Zecheng Li

,

Xiaolin Meng

,

Xu He

,

Youdong Zhang

,

Wenxuan Yin

Abstract: The ability to autonomously navigate and explore complex 3D environments in a purposeful manner, while integrating visual perception with natural language interaction in a human-like way, represents a longstanding research objective in Artificial Intelligence (AI) and embodied cognition. Vision-Language Navigation (VLN) has evolved from geometry-driven to semantics-driven and, more recently, knowledge-driven approaches. With the introduction of Large Language Models (LLMs) and Vision-Language Models (VLMs), recent methods have achieved substantial improvements in instruction interpretation, cross-modal alignment, and reasoning-based planning. However, existing surveys primarily focus on traditional VLN settings and offer limited coverage of LLM-based VLN, particularly in relation to Sim2Real transfer and edge-oriented deployment. This paper presents a structured review of LLM-enabled VLN, covering four core components: instruction understanding, environment perception, high-level planning, and low-level control. Edge deployment and implementation requirements, datasets, and evaluation protocols are summarized, along with an analysis of task evolution from path-following to goal-oriented and demand-driven navigation. Key challenges, including reasoning complexity, spatial cognition, real-time efficiency, robustness, and Sim2Real adaptation, are examined. Future research directions, such as knowledge-enhanced navigation, multimodal integration, and world-model-based frameworks, are discussed. Overall, LLM-driven VLN is progressing toward deeper cognitive integration, supporting the development of more explainable, generalizable, and deployable embodied navigation systems.

Article
Computer Science and Mathematics
Robotics

Lucas Pereira

,

Martina Kovács

,

Ahmed El-Masry

,

Feidlimid Shyama

Abstract: Multimodal Large Vision--Language Models (LVLMs) have emerged as a central paradigm in contemporary artificial intelligence, enabling machines to jointly perceive, reason, and communicate across visual and linguistic modalities at unprecedented scale. By integrating advances in large language models with powerful visual representation learning, LVLMs offer a unifying framework that bridges perception, cognition, and interaction. This capability is particularly consequential for Human--Computer Interaction (HCI) and robotic applications, where effective intelligence must be grounded in sensory input, responsive to human intent, and robust in dynamic, real-world environments.This review provides a comprehensive and in-depth examination of LVLMs from the perspective of interactive and embodied systems. We begin by situating LVLMs within the broader evolution of multimodal learning, highlighting the theoretical foundations and mathematical formulations that underpin vision--language alignment, representation fusion, and autoregressive generation. We then analyze dominant architectural paradigms, including dual-encoder models, fusion-based designs, and unified token-based transformers, discussing their respective trade-offs in terms of scalability, grounding fidelity, computational efficiency, and suitability for interaction-driven and robotic contexts.Building on these foundations, the review surveys a wide range of applications in HCI and robotics. In HCI, LVLMs enable visually grounded conversational agents, intelligent user assistance, explainable interfaces, and novel forms of human--AI co-creation that lower barriers to interaction and expand accessibility. In robotics, they support language-guided manipulation, navigation, exploration, and human--robot interaction by linking high-level natural language instructions with perceptual understanding and physical action. Across both domains, LVLMs facilitate generalization, adaptability, and more natural communication, while also exposing new challenges related to reliability, safety, and user trust.We further provide a critical analysis of current limitations and open research problems, including hallucination and weak grounding, limited temporal and causal reasoning, high computational cost, lack of interpretability, dataset bias, and insufficient evaluation methodologies for long-term interaction and embodied performance. These challenges highlight the gap between impressive benchmark results and the demands of real-world deployment. Finally, we outline key future research directions, emphasizing stronger grounding mechanisms, temporal and memory-aware modeling, efficiency and sustainability, human-centered and ethical design, and interdisciplinary evaluation and governance.By synthesizing insights across machine learning, HCI, and robotics, this review frames LVLMs not merely as technical artifacts but as interactive agents embedded in social and physical contexts. Our goal is to provide researchers and practitioners with a holistic understanding of the state of the field, clarify the opportunities and risks associated with deploying LVLMs in interactive and embodied systems, and chart a path toward multimodal AI technologies that are powerful, trustworthy, and aligned with human values.

Review
Computer Science and Mathematics
Robotics

Biyuan Liu

,

Daigang Xu

,

Lei Jiang

,

Wenjun Guo

,

Ping Chen

Abstract: As the application of Embodied AI Agents in avatars, wearable devices, and robotic systems continues to deepen, their core research challenges have gradually shifted from physical environment interaction to the accurate understanding of social interactions. Traditional physical world models (PWM) focus on quantifiable physical attributes such as space and motion, failing to meet the needs of social intelligence modeling. In contrast, the Mental World Model (MWM), as a structured representation of humans’ internal mental states, has become the critical cognitive foundation for embodied agents to achieve natural human-machine collaboration and dynamic social adaptation. However, current MWM research faces significant bottlenecks: such as fragmented conceptual framework with vague boundaries between MWM and PWM, disjointed reasoning mechanisms for the technical pathways and applicable scenarios of different Theory of Mind (ToM) reasoning paradigms, and detachment between evaluation and practice. To address these issues, this review systematically synthesizes over 100 authoritative studies to provide a comprehensive overview of MWM research for embodied AI. Its core contributions are threefold: First, it constructs a complete theoretical framework for MWM for the first time. Specifically, it distinguishes the essential differences between MWM and PWMs. Second, it systematically defines the key components of MWM through two paradigms for mental element representation. Third, it comprehensively analyzes two core ToM reasoning paradigms with 19 ToM methods. Finally, it also clarifies the integration trend of neuro-symbolic hybrid architectures, and synthesizes 26 ToM evaluation benchmarks. This work aim is to promote the integration of embodied agents into human society and advance the in-depth development of human-machine collaborative interaction.

Article
Computer Science and Mathematics
Robotics

Alexander Krasavin

,

Gaukhar Nazenova

,

Аdema Dairbekova

,

Albina Kadyroldina

,

Tamás Haidegger

,

Darya Alontseva

Abstract: This article investigates the trajectory-tracking control of a differential-drive two-wheeled mobile robot (DDWMR) using its kinematic model. A nonlinear-to-linear transformation based on differential flatness is employed to convert the original nonlinear system into two fully decoupled linear subsystems, enabling a simple and robust controller design. Unlike conventional flatness-based methods that rely on exact feedforward linearization around a reference trajectory, the proposed approach performs plant linearization, ensuring reliable tracking across a wide range of trajectories. The resulting two-loop architecture consists of an inner nonlinear loop implementing state prolongation and static feedback, and an outer linear controller performing trajectory tracking of the linearized system. Simulation results on a circular reference trajectory demonstrate high tracking accuracy, with a maximum transient deviation of 0.28 m, a settling time of approximately 120 s, and a steady-state mean tracking error below 0.01 m. These results confirm that the plant-linearization-based framework provides superior accuracy, robustness, and practical applicability for DDWMR trajectory tracking.

Article
Computer Science and Mathematics
Robotics

Élise Martin

,

Julien Moreau

,

Claire Dupont

,

Thomas Bernard

Abstract: This study proposes a blueprint-driven semantic interpretation framework that constructs high-level structural diagrams for textual content without any annotated labels. A blueprint generator synthesizes structural templates from syntactic patterns, and a constraint-matching engine aligns sentence components to blueprint slots. Tests on MixedNews-ZS, LegalText-ZS, and ScienceCorpus-ZS show improved structural consistency, with F1 gains of 11.7%, 13.4%, and 15.2% over syntax-only baselines. The model reduces role ambiguity errors by 26.9% and achieves a 19.8% improvement in zero-annotation entity function resolution. Cross-lingual experiments further demonstrate that blueprint mapping maintains 87.1% performance when transferring from English to German.

of 11

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated