Computer Science and Mathematics

Sort by

Article
Computer Science and Mathematics
Robotics

Lucas Pereira

,

Martina Kovács

,

Ahmed El-Masry

,

Feidlimid Shyama

Abstract: Multimodal Large Vision--Language Models (LVLMs) have emerged as a central paradigm in contemporary artificial intelligence, enabling machines to jointly perceive, reason, and communicate across visual and linguistic modalities at unprecedented scale. By integrating advances in large language models with powerful visual representation learning, LVLMs offer a unifying framework that bridges perception, cognition, and interaction. This capability is particularly consequential for Human--Computer Interaction (HCI) and robotic applications, where effective intelligence must be grounded in sensory input, responsive to human intent, and robust in dynamic, real-world environments.This review provides a comprehensive and in-depth examination of LVLMs from the perspective of interactive and embodied systems. We begin by situating LVLMs within the broader evolution of multimodal learning, highlighting the theoretical foundations and mathematical formulations that underpin vision--language alignment, representation fusion, and autoregressive generation. We then analyze dominant architectural paradigms, including dual-encoder models, fusion-based designs, and unified token-based transformers, discussing their respective trade-offs in terms of scalability, grounding fidelity, computational efficiency, and suitability for interaction-driven and robotic contexts.Building on these foundations, the review surveys a wide range of applications in HCI and robotics. In HCI, LVLMs enable visually grounded conversational agents, intelligent user assistance, explainable interfaces, and novel forms of human--AI co-creation that lower barriers to interaction and expand accessibility. In robotics, they support language-guided manipulation, navigation, exploration, and human--robot interaction by linking high-level natural language instructions with perceptual understanding and physical action. Across both domains, LVLMs facilitate generalization, adaptability, and more natural communication, while also exposing new challenges related to reliability, safety, and user trust.We further provide a critical analysis of current limitations and open research problems, including hallucination and weak grounding, limited temporal and causal reasoning, high computational cost, lack of interpretability, dataset bias, and insufficient evaluation methodologies for long-term interaction and embodied performance. These challenges highlight the gap between impressive benchmark results and the demands of real-world deployment. Finally, we outline key future research directions, emphasizing stronger grounding mechanisms, temporal and memory-aware modeling, efficiency and sustainability, human-centered and ethical design, and interdisciplinary evaluation and governance.By synthesizing insights across machine learning, HCI, and robotics, this review frames LVLMs not merely as technical artifacts but as interactive agents embedded in social and physical contexts. Our goal is to provide researchers and practitioners with a holistic understanding of the state of the field, clarify the opportunities and risks associated with deploying LVLMs in interactive and embodied systems, and chart a path toward multimodal AI technologies that are powerful, trustworthy, and aligned with human values.

Review
Computer Science and Mathematics
Robotics

Biyuan Liu

,

Daigang Xu

,

Lei Jiang

,

Wenjun Guo

,

Ping Chen

Abstract: As the application of Embodied AI Agents in avatars, wearable devices, and robotic systems continues to deepen, their core research challenges have gradually shifted from physical environment interaction to the accurate understanding of social interactions. Traditional physical world models (PWM) focus on quantifiable physical attributes such as space and motion, failing to meet the needs of social intelligence modeling. In contrast, the Mental World Model (MWM), as a structured representation of humans’ internal mental states, has become the critical cognitive foundation for embodied agents to achieve natural human-machine collaboration and dynamic social adaptation. However, current MWM research faces significant bottlenecks: such as fragmented conceptual framework with vague boundaries between MWM and PWM, disjointed reasoning mechanisms for the technical pathways and applicable scenarios of different Theory of Mind (ToM) reasoning paradigms, and detachment between evaluation and practice. To address these issues, this review systematically synthesizes over 100 authoritative studies to provide a comprehensive overview of MWM research for embodied AI. Its core contributions are threefold: First, it constructs a complete theoretical framework for MWM for the first time. Specifically, it distinguishes the essential differences between MWM and PWMs. Second, it systematically defines the key components of MWM through two paradigms for mental element representation. Third, it comprehensively analyzes two core ToM reasoning paradigms with 19 ToM methods. Finally, it also clarifies the integration trend of neuro-symbolic hybrid architectures, and synthesizes 26 ToM evaluation benchmarks. This work aim is to promote the integration of embodied agents into human society and advance the in-depth development of human-machine collaborative interaction.

Article
Computer Science and Mathematics
Robotics

Alexander Krasavin

,

Gaukhar Nazenova

,

Аdema Dairbekova

,

Albina Kadyroldina

,

Tamás Haidegger

,

Darya Alontseva

Abstract: This article investigates the trajectory-tracking control of a differential-drive two-wheeled mobile robot (DDWMR) using its kinematic model. A nonlinear-to-linear transformation based on differential flatness is employed to convert the original nonlinear system into two fully decoupled linear subsystems, enabling a simple and robust controller design. Unlike conventional flatness-based methods that rely on exact feedforward linearization around a reference trajectory, the proposed approach performs plant linearization, ensuring reliable tracking across a wide range of trajectories. The resulting two-loop architecture consists of an inner nonlinear loop implementing state prolongation and static feedback, and an outer linear controller performing trajectory tracking of the linearized system. Simulation results on a circular reference trajectory demonstrate high tracking accuracy, with a maximum transient deviation of 0.28 m, a settling time of approximately 120 s, and a steady-state mean tracking error below 0.01 m. These results confirm that the plant-linearization-based framework provides superior accuracy, robustness, and practical applicability for DDWMR trajectory tracking.

Article
Computer Science and Mathematics
Robotics

Élise Martin

,

Julien Moreau

,

Claire Dupont

,

Thomas Bernard

Abstract: This study proposes a blueprint-driven semantic interpretation framework that constructs high-level structural diagrams for textual content without any annotated labels. A blueprint generator synthesizes structural templates from syntactic patterns, and a constraint-matching engine aligns sentence components to blueprint slots. Tests on MixedNews-ZS, LegalText-ZS, and ScienceCorpus-ZS show improved structural consistency, with F1 gains of 11.7%, 13.4%, and 15.2% over syntax-only baselines. The model reduces role ambiguity errors by 26.9% and achieves a 19.8% improvement in zero-annotation entity function resolution. Cross-lingual experiments further demonstrate that blueprint mapping maintains 87.1% performance when transferring from English to German.

Article
Computer Science and Mathematics
Robotics

Oliver Bennett

,

Amelia Wright

,

James Holloway

Abstract: This work proposes a hypothesis-driven retrieval paradigm for discovering unseen knowledge schemata from text corpora without supervision. The framework generates hypothetical semantic signals derived from linguistic regularities and validates them through evidence retrieval across the corpus. A signal verification module filters inconsistent hypotheses, while a semantic clustering component consolidates validated structures. The system delivers notable improvements on OpenSchema-ZS, CorpusGraph-ZS, and Narrative-ZS datasets, achieving +19.5%, +21.3%, and +17.8% gains in structural retrieval accuracy. It also reduces schema hallucination errors by 28.4%. Human judges rate the extracted structures 24.9% higher on interpretability metrics.

Article
Computer Science and Mathematics
Robotics

Haruka Tanaka

,

Kenta Mori

,

Shiori Fujimoto

Abstract: Robotic perception in metallic factories is hindered by glare, dynamic reflections, and inconsistent lighting. This paper proposes a multi-spectral reflectance encoding method that captures stable reflectance signatures through narrow-band illumination and spectral normalization. A reflectance-stabilization module reconstructs geometry-consistent features under fluctuating brightness conditions. Evaluations on MetalBench-2025 and SteelLine-Vision datasets show improvements of 21.4% in recognition accuracy and a 33.7% reduction in reflection-induced errors. When deployed on a welding robot, the system improves joint alignment precision by 18.6% and reduces failure rates in reflective surface detection by 27.3%.

Article
Computer Science and Mathematics
Robotics

Mohamad Al Mdfaa

,

Raghad Salameh

,

Geesara Kulathunga

,

Sergey Zagoruyko

,

Gonzalo Ferrer

Abstract: Panoptic maps enable robots to reason about both geometry and semantics. However, open-vocabulary models repeatedly produce closely related labels that split panoptic entities and degrade volumetric consistency. The proposed UPPM advances open-world scene understanding by leveraging foundation models to introduce a panoptic Dynamic Descriptor that reconciles open-vocabulary labels with unified category structure and geometric size priors. The fusion for such dynamic descriptors is performed within a multi-resolution multi-TSDF map using language-guided open-vocabulary panoptic segmentation, and semantic retrieval, resulting in a persistent and promptable panoptic map without additional model training. Based on our evaluation experiments, UPPM shows the best overall performance in terms of the map reconstruction accuracy and the panoptic segmentation quality. The ablation study investigates the contribution for each component of UPPM (custom NMS, blurry-frame filtering, and unified semantics) to the overall system performance. Consequently, UPPM preserves open-vocabulary interpretability while delivering strong geometric and panoptic accuracy. The code will be released upon acceptance.

Article
Computer Science and Mathematics
Robotics

Junhao Wei

,

Yanzhao Gu

,

Ran Zhang

,

Wenxuan Zhu

,

Shuai Wu

,

Yapeng Wang

,

Ngai Cheong

,

Zhiwen Wang

,

Sio-Kei Im

,

Xu Yang

Abstract: Aiming at improving convergence rate and path feasibility of traditional Rapidly-exploring Random Tree algorithm (RRT), this paper proposed an enhanced RRT with heuristic search (AHRRT). The AHRRT cooperated four strategies: Adaptive step size, Target Bias, Attraction-repulsion strategy and a pruning operation. First, Adaptive step size strategy helps reduce collisions caused by large step sizes and improve the feasibility and safety of the path. Second, Target Bias strategy enhances tree expansion efficiency by directing the search tree toward the target, reducing computational overhead. Third, Attraction-repulsion strategy helps improve obstacle avoidance ability, making the path smoother and avoiding oscillations or invalid sampling caused by large step sizes. Finally, the pruning strategy can further optimize the initial path by removing redundant nodes. Simulation experiments in 2D and 3D validate the effectiveness of AHRRT, demonstrating significant improvements over traditional RRT, RRT variants, A* and ACO algorithm in terms of path quality, convergence speed, and computational efficiency, especially in complex urban environments, enhancing its practical applicability and feasibility in urban UAV path planning.

Article
Computer Science and Mathematics
Robotics

Sebastian Schmidt

,

Tobias Greiler

,

Stefan Fischer

,

Domenic Sommer

,

Florian Wahl

Abstract: The ageing population, the shortage of nurses and manual work processes are limiting healthcare provision, particularly in rural areas. At the same time, service robots are emerging. While service robots could save hospital staff time and increase efficiency, their use in patient care is still limited. Obstacles include the robots’ task-specific inflexibility and the lack of interoperability and integration into the building infrastructure. Individual robots are also occasionally used, which fulfill all functional requirements but are very complex and expensive as a result. In contrast, we propose to use and combine existing service robots for different tasks. We argue that for a better integration of service robots in healthcare, a dedicated service robot management system has become a necessity. We propose hospOS, a centralized system for orchestrating service robots in healthcare facilities. hospOS fills this gap by providing a modular, flexible and user-friendly platform that seamlessly integrates service robots into the hospital IT infrastructure, alleviating the shortage of caregivers and thus improving patient care. The platform was developed with a focus on interoperability, modularity and compliance with regulations and industry standards. We installed hospOS in two rural hospitals and evaluated three use cases: Telemedicine, Transportation and Orientation Services. This paper provides an overview of the architecture and discusses the functionalities and potential benefits of hospOS, along with its implementation in healthcare scenarios. To quantify the benefits, we provide initial results in time savings in two of the three use cases, derived from data from the two rural hospitals. The results of the deployments show time savings in transportation and orientation as well as large further potential through wider use.

Article
Computer Science and Mathematics
Robotics

Kaiyu Su

,

Yi Lu

,

Yiming Fang

Abstract: The A* algorithm is widely used in path planning for Automated Guided Vehicles (AGVs), but it lacks real-time obstacle avoidance capability. To address this limitation, this paper proposes a hybrid algorithm that integrates an improved A* with the Dynamic Window Approach (DWA). First, a global key-point extraction strategy is introduced: redundant and turning points are removed using Bresenham’s line algorithm to enhance path smoothness and continuity, while child nodes of the current position are redefined to improve search efficiency. Second, to strengthen obstacle avoidance in complex environments, turning points in the global path are designated as sub-targets for DWA, enabling path segmentation and local dynamic planning. This integration allows the system to react in real time to dynamic obstacles. Simulation results demonstrate that, compared to the traditional A* algorithm, the proposed method reduces planning time by 24.19%, the number of inflection points by 40.00%, and path length by 1.49%. In environments with random obstacles, the fused approach generates smoother trajectories and significantly enhances local obstacle avoidance and overall path-planning safety.

Article
Computer Science and Mathematics
Robotics

Beatriz Marques

,

Vítor Santos

,

Filipe Silva

Abstract: This paper presents the development of an anthropomorphic robotic hand designed for compliant grasping and intelligent control. Based on the open-source LEAP Hand, the servomotor actuation is tailored to enable passive compliance, a crucial feature for safe and adaptable interaction. A ROS-based modular control architecture is implemented to facilitate manipulation tasks, complemented by a distributed sensing system using piezoresistive sensors on the fingertips and palm for contact detection. A key challenge addressed is undesired finger overlaps during the hand opening sequence, which could lead to internal collisions and increased mechanical stress. To overcome this, a data acquisition pipeline is designed to collect information on finger configurations, which is used to train a multi-layer perceptron (MLP) classifier. This model is trained to identify three critical scenarios: "no overlap," "overlap with thumb underneath," and "overlap with thumb on top," achieving accuracy, precision, recall, and F1-score all above 97%. This highlights its potential to enhance system robustness and support future predictive control strategies for safe and coordinated finger motion.

Article
Computer Science and Mathematics
Robotics

Hongjun Yu

,

Alex McBratney

,

Salah Sukkarieh

Abstract: Field sampling is a critical task in applications such as environmental monitoring and precision agriculture. Efficiently completing these tasks while maintaining robots' tilt stability is particularly challenging when multiple robots are deployed. In this work, we explore how employing multiple robots can reduce operation time and wandering distances during sampling missions. The sample locations are assumed to follow a Gaussian distribution, providing a foundation for planning and evaluation. Robot instability is quantified using the bias angle, representing the front-facing tilt relative to the horizon, while operational efficiency is measured by the total distance traveled to interim targets and sample targets. A cost function, defined as a weighted sum of these metrics, balances stability and distance efficiency. Through extensive simulations, we demonstrate that increasing the number of robots significantly decreases operation time and improves the tilt stability defined by the cost function. These results offer valuable insights into designing multi-robot systems for efficient and stable field sampling.

Article
Computer Science and Mathematics
Robotics

Kelvin Olaiya

,

Giovanni Delnevo

,

Chan-Tong Lam

,

Giovanni Pau

,

Paola Salomoni

Abstract: This paper explores the capability of Large Language Models (LLMs) to perform zero-shot planning through multimodal reasoning, with a particular emphasis on applications to Autonomous Mobile Robots (AMR) and unmanned systems. We present a modular system architecture that integrates a general-purpose LLM with visual and spatial inputs for adaptive planning to iteratively guide robot behavior. To assess performance, we employ a continuous evaluation metric that jointly considers distance and orientation, offering a more informative and fine-grained alternative to binary success measures. We evaluate three foundational LLMs (i.e., GPT-4.1-nano, GPT-4o-mini, and Gemini 2.0 Flash) on a suite of zero-shot navigation and exploration tasks in simulated environments. Our findings show that LLMs exhibit encouraging signs of goal-directed spatial planning and partial task completion, even in a zero-shot setting. However, inconsistencies in plan generation across models highlight the need for task-specific adaptation or fine-tuning. The findings support the use of multimodal inputs as key enablers for advancing LLM-based autonomy in AMR and unmanned systems.

Article
Computer Science and Mathematics
Robotics

Aakriti Upadhyay

Abstract: This work presents an analysis of a novel approach to motion planning for serial manipulators, based on an explicit and conservative representation of the configuration space (Cspace). Traditional sampling-based planners typically rely on implicit Cspace representations defined by workspace obstacles, resulting in limitations such as discrete configuration validity checks and the need for dense roadmaps with many samples. We investigate the Free Volume Graph (FVG) planner, which constructs roadmaps using hypercube volumes of verified free Cspace. This method enables a resolution-free certificate of continuous path validity, addressing key challenges in standard planning frameworks. Through a series of case studies involving 6- and 7-DOF manipulators across single and multi-query planning tasks, we evaluate the performance of FVG against Probabilistic Roadmap (PRM) method. Our findings indicate that FVG provides significant benefits in memory efficiency and computation time. Additionally, we examine the applicability of this approach to an open motion planning problem, identifying both its advantages and the remaining challenges in extending the method to broader scenarios.

Review
Computer Science and Mathematics
Robotics

Sandeep Gupta

Abstract: This paper presents a comprehensive review of nonlinear geometric control strategies for unmanned helicopters or aerial vehicles. Unlike traditional control methods relying on local linearization or minimal angle representations, geometric control is directly formulated on manifolds such as SE(3), ensuring global applicability without singularities. We review theoretical foundations, highlight major contributions in robust trajectory tracking, payload transport, and aggressive maneuver execution, and explore advanced integrations with deep learning and event-triggered frameworks. Applications in aerial load transportation, backflips, and resource-constrained operations are discussed. The review identifies key challenges, including computational complexity, real-time implementation, and integration with perception, and outlines promising future research directions in combining geometric control with AI-driven adaptation and autonomy.

Article
Computer Science and Mathematics
Robotics

Noé Zapata

,

Gerardo Pérez

,

Alejandro Torrejón

,

Pedro Núñez

,

Pablo Bustos

Abstract: The perception of 3D space by mobile robots is rapidly moving from flat metric 1 grid representations to hybrid metric-semantic graphs built from human-interpretable 2 concepts. While most approaches first build metric maps and then add semantic layers, 3 we explore an alternative concept-first architecture where spatial understanding emerges 4 from asynchronous concept agents that directly instantiate and manage semantic entities. 5 Our robot employs two spatial concepts—room and door—implemented as autonomous 6 processes within a cognitive distributed architecture. These concept agents cooperatively 7 build a shared scene graph representation of indoor layouts through active exploration 8 and incremental validation. The key architectural principle is hierarchical constraint 9 propagation: room instantiation provides geometric and semantic priors that constrain and 10 improve door detection within wall boundaries. The resulting structure is maintained by a 11 complementary functional principle: prediction-matching loops. This approach builds an 12 actionable, human-interpretable spatial representation without relying on a pre-existing 13 global metric map, enabling scalable operation and persistent, task-relevant understanding 14 in structured indoor environments.

Article
Computer Science and Mathematics
Robotics

Zainab Salma

,

Raquel Hijón-Neira

,

Celeste Pizarro

Abstract: The rapid integration of generative Artificial Intelligence (AI) into creative workflows is transforming design from a human-driven activity into a synergistic process between humans and AI systems. Yet, most current tools still operate as linear “executors” of user commands, which fundamentally clashes with the non-linear, iterative, and ambiguous nature of human creativity. Addressing this gap, this article introduces a conceptual framework of five irreducible paradoxes—Ambiguity vs. Precision, Control vs. Serendipity, Speed vs. Reflection, Individual vs. Collective, and Originality vs. Remix—as core design tensions that shape Human–AI co-creative systems. Rather than treating these tensions as problems to solve, we argue they should be understood as design drivers that can guide the creation of next-generation co-creative environments. Through a critical synthesis of existing literature, we show how current executor-based AI tools (e.g., Microsoft 365 Copilot, Midjourney) fail to support non-linear exploration, refinement, and human creative agency. This study contributes a theoretical lens for analyzing existing systems and a generative framework for designing Human–AI collaboration environments that augment, rather than replace, human creativity.

Article
Computer Science and Mathematics
Robotics

Zhichao Ma

,

Zheyu Zhang

,

Zijun Gao

,

Aijia Sun

,

Yinuo Yang

,

Hao Liu

Abstract: Autonomous robots increasingly operate in complex and dynamic environments where energy resources are limited. Effective motion planning and scheduling strategies are critical to achieving mission objectives while minimizing energy consumption. This paper proposes an energy-constrained motion planning framework that jointly considers path optimization and task scheduling under limited energy budgets. We integrate an energy consumption model with a task-priority-based scheduling algorithm to ensure efficient execution of tasks while maintaining safety and mission performance. The proposed approach is validated in simulated complex environments with varying obstacle distributions and energy constraints, demonstrating improved performance compared to baseline greedy and shortest path-only planners. The results highlight the importance of coupling energy-aware planning with adaptive scheduling to enhance the autonomy of resource-constrained robotic systems.

Article
Computer Science and Mathematics
Robotics

Yuri Tavares dos Passos

,

Leandro Soriano Marcolino

Abstract: Coordination algorithms are required to minimise congestion when every robot in a robotic swarm has a common target area to visit. Some of these algorithms use artificial potential fields to enable path planning to become distributed and local. An efficiency measure for comparing them is the time to complete a task in relation to the number of individuals in the swarm. To compare distinct solutions as the swarm grows, experiments with different numbers of robots must be simulated to form a plot of the function of the task completion time versus the number of robots or other parameters. Nevertheless, plotting it for many robots through simulation is time-consuming. Additionally, the inference of a global swarm behaviour as the task completion time from the local individual robot motion controller based on potential fields and other dynamical variables is intractable and requires experimental analysis. Based on that, equations are presented and compared with simulation data for estimating the expected task completion time of state-of-the-art algorithms, robots using only attractive and repulsive force fields and mixed teams for the common target area problem in robotic swarms with not only the number of robots as input but also environment- and algorithm-related global variables, such as the size of the common target area and the working area, average speed and average distance between the robots. This paper is a fundamental first step to start a discussion on how better approximations can be achieved and which mathematical theories about local-to-global analysis are better suited to this problem.

Article
Computer Science and Mathematics
Robotics

Zhian Chen

,

Yaqi Hu

,

Yong Liu

Abstract: Existing systems integrating neural representations with visual SLAM excel in static scenes but falter in dynamic environments where moving objects degrade localization and mapping performance. To address this, we propose a robust dynamic SLAM framework that leverages explicit geometric features for localization while learning implicit photometric feature representations to capture the texture of the observed environment. Our method first employs an instance segmentation network and a Kalman filter for multi-object tracking. We then introduce a cascaded, coarse-to-fine strategy for efficient motion analysis. A lightweight, sparse optical flow method along object contours performs an initial coarse screening to identify clearly static or globally moving objects. For ambiguous targets requiring detailed analysis, a fine-grained motion segmentation is then conducted using dense optical flow clustering. By excluding features on identified dynamic regions, our system significantly improves camera pose estimation accuracy, reducing absolute trajectory error by up to 95% on dynamic TUM RGB-D sequences compared to ORB-SLAM3, and generates cleaner dense maps. The mapping backend utilizes a 3D Gaussian Splatting renderer, optimized with a Gaussian pyramid-based training strategy. Validations on diverse datasets demonstrate our system’s superior robustness, achieving accurate localization and high-quality mapping in dynamic scenarios, while the cascaded strategy reduces motion analysis computation time by 91.7% compared to a dense-only approach.

of 10

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated