Deformable and Fragile Object Manipulation: A Review and Prospect

Yicheng Zhu; David Yang; Yangming Lee

doi:10.20944/preprints202508.0254.v1

Submitted:

02 August 2025

Posted:

04 August 2025

You are already at the latest version

Abstract

Deformable object manipulation (DOM) is a primary bottleneck for the real-world application of autonomous robots, requiring advanced frameworks for sensing, perception, modeling, planning, and control. When fragile objects such as soft tissues or fruits are involved, ensuring safety becomes the paramount concern, fundamentally altering the manipulation problem from one of pure trajectory optimization to one of constrained optimization and real-time adaptive control. Existing DOM methodologies, however, often fall short of addressing fragility constraints as a core design feature, leading to significant gaps in real-time adaptiveness and generalization. This review systematically examines individual components in DOM with a focus on their effectiveness in handling fragile objects. We identify key limitations in current approaches and, based on this analysis, discussed a promising framework that utilizes both low-latency reflexive mechanisms and global optimization to dynamically adapt to specific object instances.

Keywords:

deformable object manipulation (DOM)

;

fragile object safety

;

multi-modal sensor fusion

;

adaptive planning

;

reflective control

;

robotics

Subject:

Computer Science and Mathematics - Robotics

1. Introduction

Deformable object manipulation (DOM) is a pivotal and complex challenge in robotics, with extensive real-world applications in domains such as robotic surgery, food handling, caregiving, and industrial automation[1]. Unlike rigid objects, deformable objects exhibit dynamic and variable behaviors that are influenced by their material properties, applied forces, and environmental constraints [2]. These inherent complexities demand advanced strategies for sensing, perception, modeling, planning, and control to enable precise and adaptive manipulation[3].

An overlooked yet critical aspect of DOM is the safe handling of fragile deformable objects—those highly susceptible to damages or irreversible deformation due to their structural or material characteristics[4]. Examples include soft anatomical tissues, brittle glassware, delicate pastries, or finely woven fabrics. These objects not only elevate the complexity of manipulation but also impose strict requirements for balancing adaptability with safety[5]. Damage caused by factors such as excessive force, inaccurate modeling, or delayed control responses—resulting in tearing, structural failure, or over-deformation—remains a major challenge for current robotic systems [6,7]. Addressing these safety concerns is of critical importance for expanding the applicability and reliability of autonomous robots.

While deformable object manipulation has been the focus of significant research efforts spanning modeling, perception, learning, and control, the safety-aware handling of fragile deformable objects remains insufficiently explored. Existing review works provide comprehensive overviews of technical advances in DOM, yet few systematically examine the unique challenges of ensuring safety during manipulation of fragile objects (Table 1).

To bridge this gap, this work adopts a safety-centric lens to review and analyze state-of-the-art methodologies in DOM. Our goal is to provide a comprehensive foundation for advancing safe and reliable robotic handling of fragile objects. Specifically, we aim to:

Identify and highlight the limitations of current DOM approaches in maintaining object integrity and safety during manipulation tasks.
Propose a structured taxonomy to categorize DOM methods based on their capacity to address safety concerns in fragile object manipulation.
Present a bio-inspired framework for future research, emphasizing reflex-driven safety mechanisms, proprioceptive integration, and predictive modeling for fragility.

In biological systems, safety in the manipulation of fragile objects is achieved through synergistic, multi-level sensorimotor strategies. Fast reflexive responses safeguard against immediate risks, while cognitive planning and predictive modeling ensure long-term adaptability and task-level safety [17]. These systems rely on the integration of proprioception, tactile sensing, and adaptive modeling and control mechanisms[18,19,20,21]. However, current robotic systems often lack these critical capabilities—struggling to achieve the balance between rapid responsiveness and context-aware adaptability required for fragile object manipulation [22].

By systematically analyzing existing research and introducing a safety-oriented perspective, this review seeks to unify efforts toward safer, more adaptive frameworks for DOM. We aim to inspire transformative advancements that will broaden the adoption of autonomous robots in both domestic and industrial settings while enhancing their reliability in delicate and safety-critical applications.

2. Sensing and Perception

Perception and sensing are fundamental to robotics and have been comprehensively reviewed [23,24,25]. Being different with other works, this work focuses on the specific and often unmet challenges that perception presents in the context of deformable object manipulation (DOM).

In DOM, the core perceptual challenge is to provide the necessary information for a robot to infer an object’s complex shape, material constraints, and state during interaction. This challenge is amplified when manipulating fragile objects, as safety depends entirely on the ability to detect subtle stress distributions and impending damage risks in real time—a task that pushes the limits of current sensing modalities. The following review examines the dominant sensing approaches not as a survey of their capabilities, but to analyze their limitations and the critical gaps that remain in achieving safe and robust perception for fragile DOM.

2.1. Vision Sensing and Perception

Vision-based methods are the predominant approach in perception, valued for their ability to extract global information about an object’s shape and motion. Depending on the application, a range of imaging technologies are employed to capture object geometry and deformation. These include standard monocular cameras [26,27], stereo vision systems [28], and RGB-D cameras that provide direct depth measurements [29]. For high-speed dynamic scenarios, event cameras have also been utilized to track rapid changes [30].

While vision systems excel in geometric reconstruction, they face limitations when interacting with fragile objects[17]. Vision is often insufficient for detecting internal stress distribution or micro-scale deformations, which are critical in fragile contexts[31]. Furthermore, visual occlusion during manipulation, irregular surfaces, or transparent materials (e.g., glass) can hinder performance[32]. Stress prediction based solely on vision is unreliable and can result in oversights during manipulation, leading to damage[33].

2.2. Tactile Sensing and Perception

Tactile sensors serve as the primary modality for acquiring rich, local information through direct physical contact[34]. Unlike remote sensors such as cameras, touch provides high-fidelity data about the interaction between the manipulator and an object’s surface[35]. This information is crucial for DOM, especially when manipulating fragile objects, as it enables the real-time control of forces and the detection of critical events like slip, which are often invisible to vision.

The data provided by tactile sensors can be understood in a hierarchy. At the most fundamental contact level, these sensors measure parameters like normal and shear forces, the position of contact, and local surface geometry [36]. This raw data can be processed to infer object-level properties, such as texture, compliance, or thermal characteristics. At the highest action level, this information is used to guide manipulation, for example, by adjust grip force in response to incipient slip detected through vibrations, or confirming a stable grasp has been achieved.

While a wide variety of tactile sensing technologies exist, a prominent recent trend is the development of vision-based tactile sensors, such as GelSight or TacTip [37,38]. These sensors typically use an internal camera to observe the deformation of a soft, often marker-patterned, skin. This approach provides a high-resolution "tactile image" from which detailed 3D shape, texture, and force distribution can be reconstructed with remarkable precision. For fragile objects, such sensors are particularly beneficial as they can detect subtle force thresholds and surface changes that are critical for preventing damage.

Despite these advancements, tactile sensing in robotics still faces challenges. A key limitation is the gap between the predominantly reactive nature of current robotic systems and the active perception employed by humans, where exploratory actions are proactively used to gather tactile information [34,39]. Integrating this active-sensing paradigm remains a significant frontier for making robotic manipulation more intelligent and adaptive.

2.3. Force/Torque Sensing and Perception

While tactile sensors excel at providing high-resolution local contact information, Force/Torque (F/T) sensors offer a complementary, global perspective on the physical interaction between the manipulator and an object[40]. Typically mounted at the robot’s wrist, an F/T sensor measures the net forces and torques resulting from the entire interaction, providing a direct, physically interpretable measure of the overall load on the end-effector [40,41]. This modality is crucial for tasks requiring precise force control, such as assembly, insertion, or carefully handling fragile objects where the total applied force must be kept below a critical threshold to prevent damage.

In the context of Deformable Object Manipulation (DOM), F/T sensing provides vital feedback for executing contact-rich tasks. For fragile objects, it enables robots to limit exerted forces below breaking or permanent deformation thresholds. However, a primary limitation of F/T sensing is its lack of spatial resolution [42]. Because it measures the aggregate load, it cannot distinguish between different contact points or provide information about the pressure distribution across a surface. This ambiguity can make it challenging to diagnose the cause of unexpected forces, especially in multi-contact scenarios. Therefore, F/T sensing is most powerful when fused with other modalities, such as vision or tactile feedback, to combine global dynamic information with local geometric context.

Table 2. Comparison of Perception Modalities for Deformable Object Manipulation (DOM).

Modality

Advantages for DOM

Limitations for DOM

Vision

Global, non-contact sensing of shape & motion

Highly prone to occlusion Cannot measure contact forces or internal stress

Poor with transparent or textureless objects

Tactile

High-resolution local data (force, slip, texture)

High-frequency feedback for fine control

Sensing area limited to direct contact

Complex or costly to integrate large arrays

Force/Torque

Measures net interaction force for global control

Excellent for enforcing overall force limits

Lacks spatial resolution (cannot localize contact)

Sensitive to noise from robot’s own dynamics

2.4. Challenges in Fragile Object Sensing and Perception

Current sensing modalities face several challenges when applied to fragile object manipulation:

Stress and Strain Detection: Vision systems struggle with detecting internal stresses, while tactile sensors are limited to surface interactions, leaving blind spots in real-time fragility assessment.
Occlusion and Transparency: Vision sensors fail in occluded environments or with transparent objects, negatively impacting safe manipulation tasks.
Bandwidth Limitations: High-bandwidth tactile feedback required for fragile object handling introduces complexities in both data acquisition and processing speeds.
Sensor Fusion: Effective integration of multiple sensing modalities (vision, tactile, force/torque) remains a challenge, particularly in fragility-aware systems requiring fine-grained real-time feedback.

2.5. Opportunities

To enhance perception for fragile object manipulation, future research must focus not just on improving individual sensors, but on advancing key technologies and integrating them into cohesive systems. The most prominent opportunities include:

Vision-Inferred Tactile Sensing: Beyond dedicated hardware like GelSight (which uses an internal camera), a prominent research direction uses external vision to infer tactile properties. By observing an object’s deformation, these methods can estimate contact forces and pressures without direct contact, offering a powerful solution for environments where physical tactile sensors are impractical or infeasible [43].
Leverage Proprioceptive Force Estimation: Using the robot’s own dynamic model and motor currents to estimate contact forces offers a low-cost, universally applicable alternative to dedicated sensors[44]. Future work must focus on creating highly accurate models and robust filtering techniques to disentangle delicate contact forces from the robot’s own dynamic noise.
Advance Holistic Sensor Fusion: The future of perception lies in methodologies that intelligently fuse the global context from vision with high-frequency local data from tactile sensors and the global interaction dynamics captured by force/torque feedback (either measured or estimated).

3. Modeling Deformable and Fragile Objects

A foundational step in Deformable Object Manipulation (DOM) is choosing an object model that balances representational fidelity with computational speed—a trade-off that is especially critical when handling fragile objects. Successfully modeling a deformable object requires addressing two distinct but related challenges: first, selecting a representation for the object’s geometry, and second, applying a model to predict its physical behavior under applied forces. The following sections discuss these challenges in turn.

3.1. Model Representation

Model representation is a challenging and widely studied problem [45]. in DOM, the choice of geometric representation dictates how an object’s shape is discretized and tracked. This choice is fundamental, as it impacts the performance and complexity of any subsequent physical model. Common representations range from coarse meshes, which are computationally efficient and allow for real-time collision checks [46], to denser point-cloud [47] or signed distance field (SDF) [48] representations that enable more precise tracking of complex deformations. Each representation offers a different balance between geometric fidelity and the computational cost of updating it over time. The trade-offs for these common methods are summarized in Table 3.

3.2. Analytical Models

Analytical models predict object deformation by applying first principles of physics to a chosen geometric representation. These models are often favored when physical accuracy is paramount, but they present a significant trade-off between fidelity and computational speed [49,50].

The most accurate and widely used analytical approach is the Finite Element Method (FEM). By discretizing an object into a mesh of finite elements, FEM can solve complex continuum mechanics equations to compute internal stress and strain distributions with high precision. This capability is invaluable for fragile object manipulation, as it allows for the prediction of potential failure points. However, the high computational cost of FEM is a major barrier, often making it too slow for the real-time feedback required in robotic control loops [50].

To address the speed limitations of FEM, discrete models like Mass-Spring Models (MSM) and, more recently, Position-Based Dynamics (PBD) are commonly used. These methods offer much faster simulation speeds suitable for interactive applications. Their limitation, however, lies in their physical realism. The parameters of an MSM, for example, often do not correspond directly to real-world material properties, leading to simulations that may look plausible but are not physically accurate. This lack of guaranteed fidelity poses a significant risk when manipulating fragile objects, where misjudging material response can lead to damage.

Ultimately, the choice of an analytical model is dictated by this fundamental trade-off. For tasks involving fragile objects, there is a critical need for models that can bridge the gap between the accuracy of FEM and the real-time performance of simpler discrete methods [50].

3.3. Data-driven Models

Methods using implicit neural representations (such as NeRFs), deep signed distance functions (SDFs), and diffusion models can learn to generate high-fidelity 3D models from a collection of 2D images[51,52,53]. For DOM, these techniques are promising because they can reconstruct the complex, non-rigid topology of a deformable object, even from partial or incomplete views. However, a primary challenge is that most of these methods are computationally intensive and optimized for static scenes[54]. Adapting them for the real-time tracking of dynamically deforming objects is a significant and active area of research.

Local tactile and force measurements can be used for object modeling through a process of interactive exploration and data fusion[39]. This process involves the robot interactively probing the object at multiple locations to gather local data by address the ambiguity of data alignment[55]. At each point of contact, tactile sensors can measure high-resolution data like pressure distribution, while force sensors measure the interaction force required to cause a certain deformation. This local force-deformation data is then used to infer an object-level property, such as the material’s stiffness or compliance, at that specific location.

3.4. Challenges in Fragile Object Modeling

Modeling approaches currently lack comprehensive mechanisms to handle fragility constraints effectively:

Stress Threshold Prediction: Fragile objects require precise stress and deformation predictions to avoid local damage, which remains challenging for both analytical and data-driven models.
Dynamic Fragility Modeling: Objects often change fragility conditions during manipulation (e.g., brittle transitions in glass or softening in tissues). Neither modeling approach fully accounts for these dynamic states.
Computational Trade-Offs: Analytical models are computationally expensive for high-resolution fragility simulations, whereas data-driven approaches struggle with real-time safety guarantees.

3.5. Opportunities

Future research must integrate fragility-specific constraints into modeling frameworks, prioritizing safe predictions without sacrificing adaptability:

Develop hybrid models that incorporate analytical accuracy with data-driven flexibility to adapt to unforeseen fragility changes.
Implement real-time fragility monitoring through feedback loops, leveraging high-bandwidth proprioceptive sensing.
Address computational challenges by optimizing algorithms for fragile object dynamics simulation without compromising safety.

By advancing modeling approaches to account for fragility, DOM systems can become both safer and more reliable, enabling applications in critical fields such as healthcare and delicate manufacturing.

4. Motion Planning for Deformable Object Manipulation

Motion planning for Deformable Object Manipulation (DOM) is a formidable task due to the unique challenges posed by non-rigid materials. Unlike rigid-body planning, DOM planners must contend with infinite degrees of freedom, complex and often unpredictable nonlinear dynamics, and the constant possibility of self-collision[75]. Furthermore, any viable solution must handle uncertainty and operate within the real-time constraints of a robotic system.

The strategies developed to address these challenges can be broadly categorized into two primary philosophies: those that rely on an explicit physics model and those that learn a manipulation policy directly from data. Many modern solutions also create hybrids of the two or use specialized representations and reactive control strategies to simplify the problem.

4.1. Model-Based Planning

Model-based approaches leverage a predictive model of the object, typically based on Finite Element Methods (FEM) or Mass-Spring Models (MSM), to simulate how it will deform[76,77,78]. This simulated behavior is then integrated into a planning framework. Common strategies include using sampling-based planners (e.g., RRT) where each sample is validated through simulation, or employing optimization-based planners (e.g., trajectory optimization) that incorporate soft-body physics as constraints[79]. While these methods can be highly physically realistic, their effectiveness is entirely dependent on the accuracy of the underlying model, and they are often too computationally expensive for real-time control[80].

4.2. Learning-Based Planning

Learning-based approaches bypass the need for an explicit analytical model by learning a control policy from data. This is typically achieved through:

Imitation Learning (IL): Where a policy is learned from expert demonstrations[81].
Reinforcement Learning (RL): Where a policy is learned through trial-and-error to maximize a reward signal[82].

These methods excel at learning complex policies that can adapt to real-world noise and sensory feedback (e.g., from vision or touch). However, they typically require vast amounts of data to train and may struggle to generalize to novel situations or provide formal safety guarantees.

4.3. Feedback-Based Control and Visual Servoing

Distinct from deliberative planners, feedback-based strategies use continuous sensory input to guide the robot’s motion in real-time based on known policies[83]. A prime example is visual servoing, where features in an image are used to derive control signals that drive the robot, closing the loop through the camera[31]. This approach is highly reactive and less dependent on an accurate predictive model. Its primary drawback, however, is a high sensitivity to perception errors and occlusions, which are common in DOM tasks.

4.4. Challenges in Planning for Fragile Objects

Planning systems for fragile objects face several fundamental challenges:

Integration of Fragility Constraints: Existing planning methods rarely embed fragility-related thresholds, such as limits on stress, strain, or applied force, into trajectory generation.
Adaptiveness to Uncertainty: Analytical and heuristic methods struggle to adapt when sensory feedback suggests dynamic changes in object fragility during manipulation.
Real-Time Decision Making: Learning-based approaches, particularly reinforcement learning, often face computational bottlenecks, making them unsuitable for real-time fragility-aware adjustments.
Task-Specific Limitations: Many planning frameworks are designed for specific applications (e.g., garment handling, food preparation) and are not generalizable to objects with diverse fragility profiles.

4.5. Opportunities

To improve planning for fragile object manipulation, future research must address the outlined limitations by focusing on:

Fragility-Aware Planning Models: Develop planning frameworks that incorporate safety constraints directly into trajectory generation, using global and local fragility predictions derived from sensing.
Hybrid Planning Architectures: Combine heuristic efficiency with learning-based adaptiveness, while embedding fragility rules to achieve both safety and flexibility.
Bio-Inspired Predictive Planning: Take inspiration from biological cognitive systems that integrate proprioception, vision, and tactile feedback for predictive adjustments during manipulation.
Real-Time Planning Optimization: Enhance computational efficiency for learning-based approaches to enable real-time fragility-aware decision-making.
Multi-Object Planning Integration: Expand existing frameworks to handle interactive tasks involving multiple fragile objects, such as simultaneous handling or assembly.

By advancing planning frameworks to account for fragility-specific constraints and safety considerations, DOM systems can achieve optimized trajectories that balance task success and damage prevention. This transformation is crucial for applications requiring safety-critical manipulation tasks, such as surgery, food handling, and glass manufacturing.

5. Control for Deformable Object Manipulation

While planning determines a high-level strategy, the control system is responsible for executing that strategy and making real-time adjustments to safely interact with the object. For deformable object manipulation (DOM), control is especially critical for managing contact forces and reacting to unexpected deformations. The primary control strategies can be categorized by their reliance on a physical model, their use of direct sensor feedback, or their foundation in machine learning.

5.1. Model-Based Control

Model-based control strategies leverage a known or approximated physical model of the object’s deformation to derive control laws.

Model Predictive Control (MPC): This advanced technique uses a predictive model (e.g., based on FEM or mass-springs) to forecast the object’s future states[76,77,84]. At each time step, it calculates an optimal sequence of control inputs to follow a reference trajectory or achieve a desired deformation. While MPC is powerful due to its predictive and optimal nature, its effectiveness depends entirely on the accuracy of the underlying model, and its significant computational expense can be prohibitive for real-time applications[85].

5.1.0.1. Impedance and Admittance Control

These methods control the interaction by regulating the relationship between force and motion[86]. Instead of commanding a strict trajectory, the robot behaves as a programmable spring-damper system[87]. This approach is excellent for ensuring safe physical interaction by making the robot compliant. However, it offers less direct control over the object’s specific deformation, focusing more on the interaction forces than the resulting shape.

5.2. Model-Free Feedback-Based Control

In contrast to model-based methods, feedback-based control avoids explicit predictive models and instead relies on continuous, real-time sensor data to correct the robot’s motion[21].

5.2.0.2. Visual Servoing

This strategy uses features from an image stream to derive control commands. The goal is typically to drive the robot’s motion to make the current image match a target reference image. While highly reactive and not dependent on a physics model, visual servoing is very sensitive to camera calibration, perception errors, and especially occlusions, which are common in manipulation tasks[88].

5.2.0.3. Tactile Feedback Control

This approach closes the control loop using data from tactile or force sensors[20]. By directly measuring contact information, the robot can adjust its grip force or pose to maintain a stable grasp or gently manipulate a surface. This method offers high sensitivity to contact events but is inherently limited by the spatial coverage of the sensor array.

5.3. Learning-Based Control

The dominant modern paradigm is to learn control policies directly from interaction data, bypassing the need for hand-crafted models or control laws.

Reinforcement Learning (RL): RL enables a robot to learn an optimal control policy through trial-and-error by maximizing a reward signal. It is extremely flexible and can learn to solve highly complex tasks. Its main drawbacks are the need for very large amounts of training data (often gathered in simulation, leading to a "sim-to-real" gap) and challenges in ensuring safety during the learning process.

Imitation Learning (IL): Also known as behavior cloning, IL learns a control policy by mimicking expert demonstrations. This approach is far more sample-efficient and safer to train than RL. However, the resulting policy is fundamentally limited by the quality of the demonstrations and may fail to generalize to states not seen during training. These learned policies are often implemented as neural feedback systems that map sensor observations directly to control commands in real-time.

5.4. Challenges in Control for Fragile Objects

The reviewed control strategies face numerous challenges when considering the safety of fragile objects:

Lack of Fragility Constraints: Control strategies, especially in Reinforcement Learning, often optimize for task completion without explicit fragility-aware parameters. Reward functions may not sufficiently penalize actions that cause subtle damage, and policies learned via Imitation Learning can fail when encountering unseen states where the object’s fragility becomes a factor.
Computational Bottlenecks: Model-based controllers like MPC, while capable of predictive planning, often cannot meet the real-time computational requirements for safety-critical tasks. The delay in optimizing a new plan can be longer than the time it takes to irreversibly damage a fragile object.
Response Latency and Sensor Limitations: The effectiveness of any feedback-based control is limited by sensor and processing latency. For fragile objects, even a small delay in detecting a force spike or slip from visual or tactile data can be the difference between a successful manipulation and a failed one. Furthermore, the limited spatial coverage of tactile sensors means the controller is blind to damaging events happening outside the contact patch.
Generalization Gaps: Learning-based methods frequently fail to generalize from simulation to the real world or from training objects to new ones with different fragility properties. A policy trained to handle a firm object may apply excessive force when confronted with a softer, more delicate variant.

5.5. Future Opportunities in Control

To develop fragility-aware control frameworks, future efforts should prioritize the following research directions:

Fragility-Aware Learning: A significant opportunity lies in incorporating fragility constraints directly into the learning process. This can be achieved through safety-constrained reward functions, intrinsic penalties for high forces or rapid deformations, or by training a dedicated "safety critic" that evaluates the risk of an action in parallel with the main control policy.
Hybrid Control Systems: Future work should explore hybrid frameworks that combine the predictive, optimal nature of model-based controllers with the rapid response of reactive mechanisms. For example, a high-level MPC could plan a safe, long-horizon trajectory, while a low-level impedance controller or a simple reflexive loop provides an instantaneous safety net against unexpected forces.
Hierarchical and Bio-Inspired Control: There is great potential in exploring hierarchical architectures that mimic biological systems. These would feature a high-level cognitive layer for strategic planning and a low-level reflexive layer that handles immediate safety based on high-frequency feedback from proprioceptive or tactile sensors, creating a system that is both intelligent and robustly safe.

6. Discussion: New Frameworks for Fragile Object Manipulation

6.1. Fragility Constraints

Manipulating fragile objects introduces unique challenges that require systems to consider not only the object’s deformability but also its susceptibility to physical damage, such as tearing, fracturing, or crushing. Fragility constraints relate to physical limits on stress, strain, or applied forces beyond which the object’s structural integrity is compromised. Existing approaches to DOM often address fragility as a secondary or ad hoc consideration, treating object safety as task-specific rather than embedding it as a core feature of control, modeling, and perception systems.

6.2. Global versus Local Fragility Constraints

Fragile objects exhibit distinct characteristics that can be classified into global fragility and local fragility:

Global Fragility: Some objects, such as glass rods or thin sheets, exhibit fragility thresholds determined by cumulative stresses from all interactions. Existing approaches that estimate global stresses often focus on force/torque balance but rarely incorporate long-term fatigue or stress accumulation during extended manipulation tasks.
Local Fragility: For objects like soft tissues or brittle composites, damage may result from localized forces concentrated at specific points of contact. Current tactile and force/torque sensing systems are limited in detecting and predicting these localized risks, especially without detailed geometry or internal stress models.

The lack of standardized models for integrating both global and local fragility constraints presents a critical gap in current DOM research.

6.3. The Need for Predictive Internal Models

For tasks involving fragile objects like soft tissue, real-time adaptation is not a feature but a necessity. The immense variability of object properties, such as tissues and the constant potential for unexpected events mean that purely model-free learning methods, which rely on extensive trial-and-error, are insufficient (Table 4). Therefore, we believe internal physical models will allows the system to predict the consequences of its actions, enabling it to anticipate and avoid harm rather than just reacting to it after the fact.

6.4. Sensing for On-the-Fly Model Adaptation

An internal model is only effective if it accurately reflects the current state of the environment. To maintain this accuracy, the system must continuously learn and adapt its model parameters on the fly using sensory feedback. Vision, haptics, and force sensing are critical modalities for perceiving object properties such as tissue stiffness and deformation [35]. However, in constrained environments like surgical settings, the use of physical force or tactile sensors is often impractical due to challenges related to sterilization, miniaturization, and cost [89]. This limitation has led to the development of virtual sensors, such as a visual force proxy, which estimate interaction forces using vision and other data streams in lieu of dedicated physical sensors.

6.5. Planning Safety and Control Safety

6.5.1. Cognitive-Level Predictive Planning

Predictive planning relies on higher-level cognitive loop architectures that integrate multi-modal sensory data, object models, and task constraints to anticipate fragility risks and optimize manipulation strategies.

Features: Includes trajectory planning, predictive modeling, and task-specific constraint optimization, with explicit attention to fragility limits.
Applications: Suitable for complex tasks requiring foresight, such as multi-object assembly or surgical robotics, where precise manipulation is necessary over longer time horizons.
Current Limitations: Computational inefficiency and lack of real-time adaptability when fragile objects exhibit changing material properties during tasks.

6.5.2. Low-Latency Reflex Responses

Low-latency responses offer intermediate corrections that rely on fused proprioceptive, force, and tactile sensing for adaptive adjustments. These systems mimic skill-tuned reflexes in humans, which operate slightly slower than spinal reflexes but provide greater flexibility.

Features: Incorporate feedback loops to moderate applied forces, correct slippage, or redistribute grip dynamically across fragile surfaces.
Applications: Effective for tasks involving elastic objects (e.g., soft tissues or rubber) where force modulation must match deformation tolerance.
Current Limitations: Reliance on sensory resolution and latency, which can impede safety-critical tasks requiring rapid reactions.

6.6. Proprioception: The Synergistic Bridge

The key to unifying these components is proprioception[18]. Defined as the unified perception of the robot’s own kinematic and dynamic state (position, velocity, force), it is the sensory modality that connects all other observations. While vision provides static snapshots of the scene, proprioception continuously and directly reflects the physical state changes that result from tool-tissue interactions[90]. More importantly, it serves as the causal link between robot action and physical consequence. By using proprioceptive feedback to measure the inconsistency between the cognitive model’s predictions and observed reality, the system converts a series of independent sensor observations into a single, synergistic, and dynamic measurement of change, driving both real-time model adaptation and reflexive safety responses[91].

Proprioception, or a subset thereof, has been applied in robotics to improve adaptability to dynamic environments, as summarized in Table 5.

7. Conclusions

Deformable object manipulation (DOM) represents one of the most complex and multidisciplinary challenges in robotics, especially when considering the safe handling of fragile objects. Objects such as soft tissues, fruits, or elastic components introduce unique demands on sensing and perception, modeling, planning, and control frameworks. Ensuring safety and preventing damage to these objects require integrating fragility constraints at every level of robotic system design.

This review has systematically analyzed related works in DOM across key areas, including sensing and perception, modeling, planning, and control, with particular emphasis on safety for fragile object manipulation. Through this analysis, several critical insights have emerged:

Existing methods often treat fragility as a secondary or task-specific consideration, leading to gaps in safety and generalization across object types.
Reflex-based safety mechanisms remain underutilized, and current systems lack the rapid response capabilities necessary for sub-millisecond corrections during fragile object handling.
Sensor fusion across modalities such as vision, tactile feedback, and force/torque sensing is insufficient for real-time fragility evaluation and safety.
Planning and control frameworks, although capable of executing complex tasks, lack adequate integration of dynamic fragility constraints, limiting their adaptiveness during highly sensitive interactions.

Based on these insights, we presented a fragility-centered taxonomy of DOM approaches, categorizing them into reflex-based mechanisms, long-latency reflex adaptations, and cognitive-level planning. This taxonomy serves as an organizational framework for understanding and addressing current gaps in fragility-aware manipulation. By mapping existing methodologies onto this taxonomy, we have identified key areas where research efforts should be focused: emphasizing bio-inspired hierarchical control architectures, dynamic fragility modeling, hybrid planning frameworks, and high-resolution multi-modal sensing.

Looking forward, we highlighted open challenges and future research directions that aim to shift paradigms in DOM towards safety, adaptability, and generalization:

Develop high-bandwidth, high-resolution sensing frameworks that enable real-time fragility-aware feedback.
Integrate reflexive and cognitive systems into dynamic multi-loop architectures to ensure both responsiveness and long-term task planning.
Design hybrid planning and control systems that embed global and local fragility constraints while balancing computational efficiency.
Establish robust and generalizable models for diverse fragile objects and tasks, drawing inspiration from biological systems and leveraging advancements in dynamic simulation environments.

By addressing these challenges, fragility-aware manipulation can unlock transformative progress in many fields. Applications such as robotic surgery, delicate manufacturing, and caregiving demand systems that mitigate risks while maximizing both functionality and safety. Moving forward, the integration of fragility constraints as a core design principle in DOM systems offers exciting possibilities for creating adaptable, safe, and robust robotic solutions.

This review serves as a stepping stone for advancing fragility-aware research in DOM and aims to unify efforts across modeling, sensing, planning, and control. Incorporating these systems holistically will allow robotic platforms to meet the growing demand for safe manipulation in delicate and safety-critical tasks.

References

Yin, H.; Varava, A.; Kragic, D. Modeling, learning, perception, and control methods for deformable object manipulation. Science Robotics 2021, 6, eabd8803. [Google Scholar] [CrossRef]
Kragic, D.; Björkman, M.; Christensen, H.I.; Eklundh, J.O. Vision for robotic object manipulation in domestic settings. Robotics and autonomous Systems 2005, 52, 85–100. [Google Scholar] [CrossRef]
Lee, Y.; Virgala, I.; Sadati, S.H.; Falotico, E. Design, modeling and control of kinematically redundant robots. Frontiers in Robotics and AI 2024, 11, 1399217. [Google Scholar] [CrossRef] [PubMed]
Ishikawa, R.; Hamaya, M.; Von Drigalski, F.; Tanaka, K.; Hashimoto, A. Learning by breaking: Food fracture anticipation for robotic food manipulation. IEEE Access 2022, 10, 99321–99329. [Google Scholar] [CrossRef]
Li, Y.; Bly, R.; Akkina, S.; Qin, F.; Saxena, R.C.; Humphreys, I.; Whipple, M.; Moe, K.; Hannaford, B. Learning surgical motion pattern from small data in endoscopic sinus and skull base surgeries. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2021; pp. 7751–7757. [Google Scholar]
De, S.; Rosen, J.; Dagan, A.; Hannaford, B.; Swanson, P.; Sinanan, M. Assessment of tissue damage due to mechanical stresses. The International Journal of Robotics Research 2007, 26, 1159–1171. [Google Scholar] [CrossRef]
Li, Y.; Konuthula, N.; Humphreys, I.M.; Moe, K.; Hannaford, B.; Bly, R. Real-time virtual intraoperative CT in endoscopic sinus surgery. International Journal of Computer Assisted Radiology and Surgery 2022, 1–12. [Google Scholar] [CrossRef] [PubMed]
Gu, F.; Zhou, Y.; Wang, Z.; Jiang, S.; He, B. A Survey on Robotic Manipulation of Deformable Objects: Recent Advances, Open Challenges and New Frontiers, 2023. [CrossRef]
Zhu, J.; Cherubini, A.; Dune, C.; Navarro-Alarcon, D.; Alambeigi, F.; Berenson, D.; Ficuciello, F.; Harada, K.; Kober, J.; Li, X.; et al. Challenges and Outlook in Robotic Manipulation of Deformable Objects, 2021. [CrossRef]
Jiménez, P. Survey on model-based manipulation planning of deformable objects. Robotics and Computer-Integrated Manufacturing 2012, 28, 154–163. [Google Scholar] [CrossRef]
Herguedas, R.; López-Nicolás, G.; Aragüés, R.; Sagüés, C. Survey on multi-robot manipulation of deformable objects. In Proceedings of the 2019 24th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA); 2019; pp. 977–984. [Google Scholar] [CrossRef]
Arriola-Rios, V.E.; Guler, P.; Ficuciello, F.; Kragic, D.; Siciliano, B.; Wyatt, J.L. Modeling of Deformable Objects for Robotic Manipulation: A Tutorial and Review. Frontiers in Robotics and AI 2020, 7, 82. [Google Scholar] [CrossRef]
Yin, H.; Varava, A.; Kragic, D. Modeling, learning, perception, and control methods for deformable object manipulation. Science Robotics 2021, 6, eabd8803. [Google Scholar] [CrossRef]
Kadi, H.A.; Terzić, K. Data-Driven Robotic Manipulation of Cloth-like Deformable Objects: The Present, Challenges and Future Prospects. Sensors 2023, 23, 2389. [Google Scholar] [CrossRef]
Blanco-Mulero, D.; Dong, Y.; Borras, J.; Pokorny, F.T.; Torras, C. T-DOM: A Taxonomy for Robotic Manipulation of Deformable Objects, 2024. [CrossRef]
Sanchez, J.; Mohy El Dine, K.; Corrales, J.A.; Bouzgarrou, B.C.; Mezouar, Y. Blind Manipulation of Deformable Objects Based on Force Sensing and Finite Element Modeling. Frontiers in Robotics and AI 2020, 7. [Google Scholar] [CrossRef] [PubMed]
Gorniak, S.L.; Zatsiorsky, V.M.; Latash, M.L. Manipulation of a fragile object. Experimental brain research 2010, 202, 413–430. [Google Scholar] [CrossRef]
Tuthill, J.C.; Azim, E. Proprioception. Current Biology 2018, 28, R194–R203. [Google Scholar] [CrossRef] [PubMed]
Li, Y. Trends in Control and Decision-Making for Human-Robot Collaboration Systems. IEEE Control Systems Magazine 2019, 39, 101–103. [Google Scholar] [CrossRef]
Hogan, F.R.; Ballester, J.; Dong, S.; Rodriguez, A. Tactile dexterity: Manipulation primitives with tactile feedback. In Proceedings of the 2020 IEEE international conference on robotics and automation (ICRA). IEEE; 2020; pp. 8863–8869. [Google Scholar]
Qi, Y.; Jin, L.; Li, H.; Li, Y.; Liu, M. Discrete Computational Neural Dynamics Models for Solving Time-Dependent Sylvester Equations with Applications to Robotics and MIMO Systems. IEEE Transactions on Industrial Informatics 2020. [Google Scholar] [CrossRef]
Billard, A.; Kragic, D. Trends and challenges in robot manipulation. Science 2019, 364, eaat8414. [Google Scholar] [CrossRef]
Shahian Jahromi, B.; Tulabandhula, T.; Cetin, S. Real-time hybrid multi-sensor fusion framework for perception in autonomous vehicles. Sensors 2019, 19, 4357. [Google Scholar] [CrossRef]
Ferreira, J.F.; Portugal, D.; Andrada, M.E.; Machado, P.; Rocha, R.P.; Peixoto, P. Sensing and artificial perception for robots in precision forestry: a survey. Robotics 2023, 12, 139. [Google Scholar] [CrossRef]
Luo, J.; Zhou, X.; Zeng, C.; Jiang, Y.; Qi, W.; Xiang, K.; Pang, M.; Tang, B. Robotics perception and control: Key technologies and applications. Micromachines 2024, 15, 531. [Google Scholar] [CrossRef] [PubMed]
Gadipudi, N.; Elamvazuthi, I.; Izhar, L.I.; Tiwari, L.; Hebbalaguppe, R.; Lu, C.K.; Doss, A.S.A. A review on monocular tracking and mapping: from model-based to data-driven methods. The Visual Computer 2023, 39, 5897–5924. [Google Scholar] [CrossRef]
Li, Y.; Zhang, J.; Li, S. STMVO: biologically inspired monocular visual odometry. Neural Computing and Applications 2018, 29, 215–225. [Google Scholar] [CrossRef]
Elmquist, A.; Negrut, D. Modeling cameras for autonomous vehicle and robot simulation: An overview. IEEE Sensors Journal 2021, 21, 25547–25560. [Google Scholar] [CrossRef]
Li, J.; Gao, W.; Wu, Y.; Liu, Y.; Shen, Y. High-quality indoor scene 3D reconstruction with RGB-D cameras: A brief review. Computational Visual Media 2022, 8, 369–393. [Google Scholar] [CrossRef]
Chakravarthi, B.; Verma, A.A.; Daniilidis, K.; Fermuller, C.; Yang, Y. Recent event camera innovations: A survey. In Proceedings of the European Conference on Computer Vision. Springer; 2024; pp. 342–376. [Google Scholar]
Guo, Y.; Jiang, X.; Liu, Y. Deformation control of a deformable object based on visual and tactile feedback. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2021; pp. 675–681. [Google Scholar]
Lee, Y. Three-Dimensional Dense Reconstruction: A Review of Algorithms and Datasets. Sensors 2024, 24, 5861. [Google Scholar] [CrossRef] [PubMed]
Miyasaka, M.; Haghighipanah, M.; Li, Y.; Hannaford, B. Hysteresis model of longitudinally loaded cable for cable driven robots and identification of the parameters. In Proceedings of the Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 2016; pp. 4051–4057. [Google Scholar]
Mandil, W.; Rajendran, V.; Nazari, K.; Ghalamzan-Esfahani, A. Tactile-sensing technologies: Trends, challenges and outlook in agri-food manipulation. Sensors 2023, 23, 7362. [Google Scholar] [CrossRef]
Shimonomura, K. Tactile image sensors employing camera: A review. Sensors 2019, 19, 3933. [Google Scholar] [CrossRef]
Yousef, H.; Boukallel, M.; Althoefer, K. Tactile sensing for dexterous in-hand manipulation in robotics—A review. Sensors and Actuators A: physical 2011, 167, 171–187. [Google Scholar] [CrossRef]
Yuan, W.; Dong, S.; Adelson, E.H. Gelsight: High-resolution robot tactile sensors for estimating geometry and force. Sensors 2017, 17, 2762. [Google Scholar] [CrossRef]
Ward-Cherrier, B.; Pestell, N.; Cramphorn, L.; Winstone, B.; Giannaccini, M.E.; Rossiter, J.; Lepora, N.F. The tactip family: Soft optical tactile sensors with 3d-printed biomimetic morphologies. Soft robotics 2018, 5, 216–227. [Google Scholar] [CrossRef] [PubMed]
Meribout, M.; Takele, N.A.; Derege, O.; Rifiki, N.; El Khalil, M.; Tiwari, V.; Zhong, J. Tactile sensors: A review. Measurement 2024, 238, 115332. [Google Scholar] [CrossRef]
Muscolo, G.G.; Fiorini, P. Force–torque sensors for minimally invasive surgery robotic tools: An overview. IEEE Transactions on Medical Robotics and Bionics 2023, 5, 458–471. [Google Scholar] [CrossRef]
Miyasaka, M.; Haghighipanah, M.; Li, Y.; Matheson, J.; Lewis, A.; Hannaford, B. Modeling Cable-Driven Robot With Hysteresis and Cable–Pulley Network Friction. IEEE/ASME Transactions on Mechatronics 2020, 25, 1095–1104. [Google Scholar] [CrossRef]
Cao, M.Y.; Laws, S.; y Baena, F.R. Six-axis force/torque sensors for robotics applications: A review. IEEE Sensors Journal 2021, 21, 27238–27251. [Google Scholar] [CrossRef]
Yamaguchi, A.; Atkeson, C.G. Recent progress in tactile sensing and sensors for robotic manipulation: can we turn tactile sensing into vision? Advanced Robotics 2019, 33, 661–673. [Google Scholar] [CrossRef]
Li, Y.; Hannaford, B. Gaussian Process Regression for Sensorless Grip Force Estimation of Cable-Driven Elongated Surgical Instruments. IEEE Robotics and Automation Letters 2017, 2, 1312–1319. [Google Scholar] [CrossRef]
Yi, H.C.; You, Z.H.; Huang, D.S.; Guo, Z.H.; Chan, K.C.; Li, Y. Learning Representations to Predict Intermolecular Interactions on Large-Scale Heterogeneous Molecular Association Network. Iscience 2020, 23. [Google Scholar] [CrossRef]
Malassiotis, S.; Strintzis, M. Tracking textured deformable objects using a finite-element mesh. IEEE Transactions on Circuits and Systems for Video Technology 1998, 8, 756–774. [Google Scholar] [CrossRef]
Schulman, J.; Lee, A.; Ho, J.; Abbeel, P. Tracking deformable objects with point clouds. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation; 2013; pp. 1130–1137. [Google Scholar] [CrossRef]
Li, H.; Shan, J.; Wang, H. SDFPlane: Explicit Neural Surface Reconstruction of Deformable Tissues. In Proceedings of the Medical Image Computing and Computer Assisted Intervention – MICCAI 2024; Linguraru, M.G.; Dou, Q.; Feragen, A.; Giannarou, S.; Glocker, B.; Lekadir, K.; Schnabel, J.A., Eds., Cham; 2024; pp. 542–552. [Google Scholar] [CrossRef]
Zhu, J.; Cherubini, A.; Dune, C.; Navarro-Alarcon, D.; Alambeigi, F.; Berenson, D.; Ficuciello, F.; Harada, K.; Kober, J.; Li, X.; et al. Challenges and outlook in robotic manipulation of deformable objects. IEEE Robotics & Automation Magazine 2022, 29, 67–77. [Google Scholar] [CrossRef]
Fu, J.; Xiang, C.; Yin, C.; Guo, Y.X.; Yin, Z.Y.; Cheng, H.D.; Sun, X. Basic Principles of Deformed Objects with Methods of Analytical Mechanics. Journal of Nonlinear Mathematical Physics 2024, 31, 57. [Google Scholar] [CrossRef]
Gao, K.; Gao, Y.; He, H.; Lu, D.; Xu, L.; Li, J. Nerf: Neural radiance field in 3d vision, a comprehensive review. arXiv preprint arXiv:2210.00379, arXiv:2210.00379 2022.
Zobeidi, E.; Atanasov, N. A deep signed directional distance function for object shape representation. arXiv preprint arXiv:2107.11024, arXiv:2107.11024 2021.
Zhou, Y.; Lee, Y. Simultaneous Super-resolution and Depth Estimation for Satellite Images Based on Diffusion Model. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2024; pp. 1–8. [Google Scholar]
Yi, H.C.; You, Z.H.; Huang, D.S.; Guo, Z.H.; Chan, K.C.; Li, Y. Learning representations of molecules to predict intermolecular interactions by constructing a large-scale heterogeneous molecular association network. iScience, 2020, p.101261. [Google Scholar]
Li, Y.; Li, S.; Song, Q.; Liu, H.; Meng, M.Q.H. Fast and robust data association using posterior based approximate joint compatibility test. IEEE Transactions on Industrial Informatics 2014, 10, 331–339. [Google Scholar] [CrossRef]
De Luca, A.; Albu-Schaffer, A.; Haddadin, S.; Hirzinger, G. Collision Detection and Safe Reaction with the DLR-III Lightweight Manipulator Arm. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2006; pp. 1623–1630. [Google Scholar] [CrossRef]
Du, Z.; Wang, W.; Yan, Z.; Dong, W.; Wang, W. Variable Admittance Control Based on Fuzzy Reinforcement Learning for Minimally Invasive Surgery Manipulator. Sensors 2017, 17, 844. [Google Scholar] [CrossRef] [PubMed]
She, Y.; Wang, S.; Dong, S.; Sunil, N.; Rodriguez, A.; Adelson, E. Cable Manipulation with a Tactile-Reactive Gripper, 2020. [CrossRef]
Inceoglu, A.; Ince, G.; Yaslan, Y.; Sariel, S. Failure Detection Using Proprioceptive, Auditory and Visual Modalities. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2018; pp. 2491–2496. [Google Scholar] [CrossRef]
Zhou, P.; Zheng, P.; Qi, J.; Li, C.; Lee, H.Y.; Duan, A.; Lu, L.; Li, Z.; Hu, L.; Navarro-Alarcon, D. Reactive human–robot collaborative manipulation of deformable linear objects using a new topological latent control model. Robotics and Computer-Integrated Manufacturing 2024, 88, 102727. [Google Scholar] [CrossRef]
Patni, S.P.; Stoudek, P.; Chlup, H.; Hoffmann, M. Online elasticity estimation and material sorting using standard robot grippers. The International Journal of Advanced Manufacturing Technology 2024, 132, 6033–6051. [Google Scholar] [CrossRef]
Gutierrez-Giles, A.; Padilla-Castañeda, M.A.; Alvarez-Icaza, L.; Gutierrez-Herrera, E. Force-Sensorless Identification and Classification of Tissue Biomechanical Parameters for Robot-Assisted Palpation. Sensors 2022, 22, 8670. [Google Scholar] [CrossRef]
Chen, P.Y.; Liu, C.; Ma, P.; Eastman, J.; Rus, D.; Randle, D.; Ivanov, Y.; Matusik, W. Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction, 2025. [CrossRef]
Kaboli, M.; Yao, K.; Cheng, G. Tactile-based manipulation of deformable objects with dynamic center of mass. In Proceedings of the 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids); 2016; pp. 752–757. [Google Scholar] [CrossRef]
Bekiroglu, Y. Learning to Assess Grasp Stability from Vision, Touch and Proprioception 2012.
Blanco-Mulero, D.; Alcan, G.; Abu-Dakka, F.J.; Kyrki, V. QDP: Learning to Sequentially Optimise Quasi-Static and Dynamic Manipulation Primitives for Robotic Cloth Manipulation. 2023, pp. 984–991. [CrossRef]
Hietala, J.; Blanco-Mulero, D.; Alcan, G.; Kyrki, V. Learning Visual Feedback Control for Dynamic Cloth Folding. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2022; pp. 1455–1462. [Google Scholar] [CrossRef]
Elbrechter, C.; Haschke, R.; Ritter, H. Folding paper with anthropomorphic robot hands using real-time physics-based modeling. In Proceedings of the 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012); 2012; pp. 210–215. [Google Scholar] [CrossRef]
Kim, S.C.; Ryu, S. Robotic Kinesthesia: Estimating Object Geometry and Material With Robot’s Haptic Senses. IEEE Transactions on Haptics 2024, 17, 998–1005. [Google Scholar] [CrossRef]
Mitsioni, I.; Karayiannidis, Y.; Stork, J.A.; Kragic, D. Data-Driven Model Predictive Control for the Contact-Rich Task of Food Cutting. In Proceedings of the 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids); 2019; pp. 244–250. [Google Scholar] [CrossRef]
Gemici, M.C.; Saxena, A. Learning haptic representation for manipulating deformable food objects. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA; 2014; pp. 638–645. [Google Scholar] [CrossRef]
Bednarek, M.; Kicki, P.; Bednarek, J.; Walas, K. Gaining a Sense of Touch Object Stiffness Estimation Using a Soft Gripper and Neural Networks. Electronics 2021, 10, 96. [Google Scholar] [CrossRef]
Yao, S.; Hauser, K. Estimating Tactile Models of Heterogeneous Deformable Objects in Real Time. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA); 2023; pp. 12583–12589. [Google Scholar] [CrossRef]
Lee, M.A.; Zhu, Y.; Zachares, P.; Tan, M.; Srinivasan, K.; Savarese, S.; Fei-Fei, L.; Garg, A.; Bohg, J. Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks, 2019. [CrossRef]
Li, Y.; Li, S.; Hannaford, B. A model based recurrent neural network with randomness for efficient control with applications. IEEE Transactions on Industrial Informatics 2018. [Google Scholar] [CrossRef]
Ficuciello, F.; Migliozzi, A.; Coevoet, E.; Petit, A.; Duriez, C. FEM-based deformation control for dexterous manipulation of 3D soft objects. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2018; pp. 4007–4013. [Google Scholar]
Makiyeh, F. Vision-based shape servoing of soft objects using the mass-spring model. PhD thesis, Université de Rennes, 2023.
Li, Y.; Li, S.; Hannaford, B. A novel recurrent neural network for improving redundant manipulator motion planning completeness. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2018; pp. 2956–2961. [Google Scholar]
LaValle, S.M.; Kuffner Jr, J.J. Randomized kinodynamic planning. The International Journal of Robotics Research 2001, 20, 378–400. [Google Scholar] [CrossRef]
Li, Y.; Hannaford, B. Soft-obstacle Avoidance for Redundant Manipulators with Recurrent Neural Network. In Proceedings of the Intelligent Robots and Systems (IROS), 2018 IEEE/RSJ International Conference on. IEEE, 2018; pp. 1–6. [Google Scholar]
Salhotra, G.; Liu, I.C.A.; Dominguez-Kuhne, M.; Sukhatme, G.S. Learning deformable object manipulation from expert demonstrations. IEEE Robotics and Automation Letters 2022, 7, 8775–8782. [Google Scholar] [CrossRef]
Matas, J.; James, S.; Davison, A.J. Sim-to-real reinforcement learning for deformable object manipulation. In Proceedings of the Conference on Robot Learning. PMLR; 2018; pp. 734–743. [Google Scholar]
Li, Y.; Bly, R.; Whipple, M.; Humphreys, I.; Hannaford, B.; Moe, K. Use endoscope and instrument and pathway relative motion as metric for automated objective surgical skill assessment in skull base and sinus surgery. Georg Thieme Verlag KG, 2018, Vol. 79, p. A194.
Lee, J.H. Model predictive control: Review of the three decades of development. International Journal of Control, Automation and Systems 2011, 9, 415–424. [Google Scholar] [CrossRef]
Li, Y.; Li, S.; Miyasaka, M.; Lewis, A.; Hannaford, B. Improving Control Precision and Motion Adaptiveness for Surgical Robot with Recurrent Neural Network. In Proceedings of the Intelligent Robots and Systems (IROS), 2017, 2017 IEEE/RSJ International Conference on. IEEE; pp. 1–6. [Google Scholar]
Mizanoor Rahman, S.; Ikeura, R. Cognition-based variable admittance control for active compliance in flexible manipulation of heavy objects with a power-assist robotic system. Robotics and Biomimetics 2018, 5, 7. [Google Scholar] [CrossRef]
Li, M.; Yin, H.; Tahara, K.; Billard, A. Learning object-level impedance control for robust grasping and dexterous manipulation. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2014; pp. 6784–6791. [Google Scholar]
Lagneau, R.; Krupa, A.; Marchal, M. Automatic shape control of deformable wires based on model-free visual servoing. IEEE Robotics and Automation Letters 2020, 5, 5252–5259. [Google Scholar] [CrossRef]
Li, Y.; Miyasaka, M.; Haghighipanah, M.; Cheng, L.; Hannaford, B. Dynamic modeling of cable driven elongated surgical instruments for sensorless grip force estimation. In Proceedings of the Robotics and Automation (ICRA), 2016, 2016 IEEE International Conference on. IEEE; pp. 4128–4134. [Google Scholar]
King, D.; Adidharma, L.; Peng, H.; Moe, K.; Li, Y.; Yang, Z.; Young, C.; Ferreria, M.; Humphreys, I.; Abuzeid, W.M.; et al. Automatic summarization of endoscopic skull base surgical videos through object detection and hidden Markov modeling. Computerized Medical Imaging and Graphics 2023, 108, 102248. [Google Scholar] [CrossRef] [PubMed]
Li, Y. Deep Causal Learning for Robotic Intelligence. Frontiers in Neurorobotics. [CrossRef]
Khalil, F.; Payeur, P.; Cretu, A.M. Integrated Multisensory Robotic Hand System for Deformable Object Manipulation. In Proceedings of the IASTED Technology Conferences / 705: ARP / 706: RA / 707: NANA / 728: CompBIO, Cambridge, Massachusetts, USA; 2010. [Google Scholar] [CrossRef]
Caldwell, T.M.; Coleman, D.; Correll, N. Optimal parameter identification for discrete mechanical systems with application to flexible object manipulation. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2014; pp. 898–905. [Google Scholar] [CrossRef]
Mazhitov, A.; Adilkhanov, A.; Massalim, Y.; Kappassov, Z.; Varol, H.A. Deformable Object Recognition Using Proprioceptive and Exteroceptive Tactile Sensing. In Proceedings of the 2019 IEEE/SICE International Symposium on System Integration (SII); 2019; pp. 734–739. [Google Scholar] [CrossRef]
Yong, S.; Chapman, J.; Aw, K. Soft and flexible large-strain piezoresistive sensors: On implementing proprioception, object classification and curvature estimation systems in adaptive, human-like robot hands. Sensors and Actuators A: Physical 2022, 341, 113609. [Google Scholar] [CrossRef]
Cretu, A.M.; Payeur, P.; Petriu, E.M. Soft Object Deformation Monitoring and Learning for Model-Based Robotic Hand Manipulation. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 2012, 42, 740–753. [Google Scholar] [CrossRef]
Rostamian, B.; Koolani, M.; Abdollahzade, P.; Lankarany, M.; Falotico, E.; Amiri, M.; V. Thakor, N. Texture recognition based on multi-sensory integration of proprioceptive and tactile signals. Scientific Reports 2022, 12, 21690. [CrossRef]
Oller, M.; Planas, M.; Berenson, D.; Fazeli, N. Manipulation via Membranes: High-Resolution and Highly Deformable Tactile Sensing and Control. arXiv, 2022. [CrossRef]
Chen, L.; Lu, W.; Zhang, K.; Zhang, Y.; Zhao, L.; Zheng, Y. TossNet: Learning to Accurately Measure and Predict Robot Throwing of Arbitrary Objects in Real Time With Proprioceptive Sensing. IEEE Transactions on Robotics 2024, 40, 3232–3251. [Google Scholar] [CrossRef]
Luo, S.; Mou, W.; Althoefer, K.; Liu, H. iCLAP: shape recognition by combining proprioception and touch sensing. Autonomous Robots 2019, 43, 993–1004. [Google Scholar] [CrossRef]
Sipos, A.; Fazeli, N. MultiSCOPE: Disambiguating In-Hand Object Poses with Proprioception and Tactile Feedback. In Proceedings of the Robotics: Science and Systems XIX. Robotics: Science and Systems Foundation; 2023. [Google Scholar] [CrossRef]

Table 1. Summary of Existing Surveys on Deformable Object Manipulation.

Reference	Focus Area	Modalities	Noted Limitations
Gu et al. (2023) [8]	General review of DOM; data-driven and hybrid methods	Vision, tactile, force	Limited mention of proprioception; minimal focus on fusion
Zhu et al. (2021) [9]	Challenges and future directions in DOM	Vision, force, tactile	Suggests multi-modal fusion but without deep implementation details
Jiménez (2012) [10]	Model-based manipulation planning	Mostly modeling	Little discussion of sensing modalities
Herguedas et al. (2019) [11]	Multi-robot systems for DOM	Vision, force	Limited on tactile and proprioception; focuses on coordination
Arriola-Rios et al. (2020) [12]	Modeling of deformable objects for robotic manipulation	Vision, force	Focuses on object modeling; less discussion on action planning and multi-modal fusion
Yin et al. (2021) [13]	Modeling, learning, perception and control methods	Vision, tactile	Briefly mentions force; lacks multi-modal integration
Kadi and Terzić (2023) [14]	Data-driven approaches for cloth-like deformables	Vision, tactile	Discusses challenges but does not cover proprioception deeply
Blanco-Mulero et al. (2024) [15]	Proposed taxonomy (T-DOM) for deformable manipulation tasks	Vision, force, tactile	High-level categorization; not focused on sensing strategies
Sanchez et al. (2018) [16]	Robotic manipulation and sensing of deformable objects in domestic and industrial applications	Vision, force, tactile	Broad classification across object types and tasks; limited depth on sensor-fusion strategies and minimal focus on proprioception

Table 3. Comparison of Deformable Object Representation Methods.

Method	Advantages	Disadvantages
Mesh-based	Real-time collision checks; straightforward to implement	Limited deformation fidelity; mesh artifacts under large strains
SDF	Smooth, continuous geometry; precise deformation recovery	High memory footprint; expensive distance queries
Mass–spring	Very fast simulation; intuitive parameter tuning	Oversimplified physics; cannot capture complex material behaviors
FEM	High-fidelity modeling; supports nonlinear constitutive laws	Computationally intensive; requires mesh generation and parameter tuning
Data-driven	Learns from real examples; often real-time inference	Data-hungry; limited interpretability; risk of overfitting and poor generalization

Table 4. Comparison of Sensing Modality, Control Method, Architecture, and Corresponding Performance.

	Sensing Modalities	Control Method	Assigned Loop	Note
[56]	Joint torque	Ultra-fast proprioceptive collision-detection within the joint servo driver	Spinal-Reflex (<50 ms)	Leverages high-frequency torque error thresholds to instantly halt motor commands at sub-millisecond latencies without higher-level inference.
[57]	Joint torque	Hybrid variable-admittance via Fuzzy Sarsalearning	Long-Latency (50–100 ms)	Adapts admittance gains online based on torque feedback, providing skill-tuned compliance in tens of milliseconds.
[58]	GelSight	Parallel PD grip control and LQR pose control on a learned linear model	Long-Latency (50–100 ms)	Runs lightweight learned models at ∼60–125 Hz on tactile cues to maintain cable alignment without full planning.
[59]	Proprioception, vision, audio	HMM-based multimodal anomaly detection	Long-Latency (50–100 ms)	Fuses proprioceptive residuals with audio/vision in an HMM to quickly flag failures without deliberation.
[60]	RGB-D vision	Topological autoencoder + fixed-time sliding-mode controller (∼20 Hz)	Long-Latency (50–100 ms)	Provides reflexive shape corrections using low-dimensional latent models for real-time adaptation.
[61]	Wrist force/torque	Real-time elasticity estimation from force–position curves	Long-Latency (50–100 ms)	Infers material properties on-the-fly to adjust grasp strategies within tens of milliseconds.
[62]	Joint positions	Observer for force/velocity estimation + Bayesian parameter classifier	Long-Latency (50–100 ms)	Uses a state observer on proprioceptive data to infer forces and classify tissue parameters rapidly.
[63]	Joint-encoder	Differentiable simulation pipeline for inverse parameter identification	Long-Latency (50–100 ms)	Inverts a differentiable model on high-rate encoder streams to infer mass and elasticity in real time.
[64]	tactile	Slip detection via tangential-force thresholds + immediate position adjustment	Long-Latency (50–100 ms)	Detects slip through fast tactile thresholds and issues corrective motions to prevent object loss.
[65]	Vision, tactile, encoder	HMM + kernel logistic regression + Bayesian networks	Cognitive (>100 ms)	Integrates multi-modal cues with probabilistic learning to predict and replan stable grasps.
[66]	Vision	Sequential RL for manipulation-primitive parameters	Cognitive (>100 ms)	Learns high-level parameter sequences for long-horizon cloth tasks via deliberative policy optimization.
[67]	Vision	RL with dynamics domain randomization (∼25 fps)	Cognitive (>100 ms)	Trains end-to-end visual policies for cloth folding through deliberative RL.
[68]	Vision, proprioception, tactile	Predefined folding trajectories + sensory feedback	Cognitive (>100 ms)	Uses physics-based modeling and sensory fusion to plan multi-step folding sequences.
[69]	Joint torque	Supervised learning on haptic time-series for classification	Cognitive (>100 ms)	Trains models on torque signatures to classify geometry/material and inform high-level planning.
[70]	Force, proprioception	MPC with RNN/LSTM dynamics (∼10 Hz)	Cognitive (>100 ms)	Embeds learned RNN dynamics into MPC for deliberative adaptation to varied food properties.
[71]	proprioception, dynamics	SVR on haptic histograms + Monte Carlo–greedy planning	Cognitive (>100 ms)	Builds latent haptic belief models to guide long-horizon manipulation planning.
[72]	IMUs	ConvBiLSTM regression on squeeze–release inertial data	Cognitive (>100 ms)	Learns inertial patterns to predict stiffness, informing subsequent manipulation trajectories.
[73]	Joint angles	Projected diagonal Kalman filter on spring-voxel models (∼23 Hz)	Cognitive (>100 ms)	Recursively updates voxel-wise stiffness estimates to support planning over object compliance.
[74]	RGB, F/T, joint encoder	Self-supervised latent fusion + deep RL	Cognitive (>100 ms)	Trains compact embeddings to improve sample-efficient, deliberative control in contact-rich scenarios.

Table 5. Summary of Using Proprioception for Object Manipulation. P: Position, V: Velocity, T: Torque, I: Inertia, Ta: Tactile, Vi: Vision.

	Proprioception				Ta	Vi	Design Philosophy
	P	V	T	I	✓	✓
[92]	✓	✓	✓	–	✓	✓	Real-time fusion for deformable-object modeling and control
[93]	✓	–	✓	–	–	–	Proprioceptive torque/angle-based identification of flexible-loop spring constants via variational integrators.
[71]	✓	–	✓	–	–	–	Haptic (encoder + effort/F/T) fusion for deformable-food property estimation and planning (no velocity/IMU/tactile).
[94]	✓	–	✓	–	✓	–	Fusion of joint-encoder and torque sensing with a tactile array for rigid vs. deformable classification (97.5% accuracy).
[72]	–	–	–	✓	–	–	Deep-learning stiffness regression using only IMU-based inertial proprioception (≤8.7 % MAPE).
[73]	✓	–	✓	–	–	✓	Real-time volumetric stiffness field estimation from joint-torque and optional vision for heterogeneous deformables.
[63]	✓	–	–	–	–	–	Differentiable simulation for mass and elastic-modulus estimation from joint-encoder signals alone.
[95]	✓	–	–	–	–	–	Large-strain piezoresistive proprioceptive sensing for single-grasp object shape classification and curvature estimation.
[96]	✓	–	–	–	✓	✓	Neural-network–based vision–force fusion for predictive deformable-object modeling (no joint-torque/IMU proprioception).
[62]	✓	✓	–	–	–	–	Sensorless force/velocity estimation from joint positions and commanded torques for biomechanical parameter identification and classification in robotic palpation.
[61]	✓	–	✓	–	–	–	Online elasticity/viscoelasticity estimation from gripper position and F/T sensing for real-time material sorting.
[97]	–	✓	–	–	✓	–	Neuromorphic fusion for speed-invariant texture discrimination
[98]	–	–	✓	–	✓	–	Learning soft-membrane dynamics from high-res tactile geometry and reaction wrenches for real-time dexterous control.
[99]	✓	✓	✓	–	–	–	TossNet: real-time trajectory prediction from end-effector pose, velocity, and F/T-based proprioception.
[100]	✓	–	–	–	✓	–	4D ICP–based fusion of encoder positions and tactile codebook labels for high-accuracy shape recognition.
[101]	✓	–	✓	–	–	–	Bimanual in-hand object-pose disambiguation via iterative contact probing using only joint-encoder and wrist F/T feedback, refined by dual particle-filter estimation.
[69]	–	–	✓	–	–	–	Joint-torque-driven classification/regression for simultaneous estimation of object geometry and material using kinesthetic sensing.
[74]	✓	✓	✓	–	–	✓	Variational self-supervised fusion of RGB-D, EE pose/velocity, and F/T for RL-based peg-insertion (no IMU/tactile arrays).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.