Design and Implementation of an Autonomous Surgical Robotic Aspirator

Eva Góngora-Rodríguez; Irene Rivas-Blanco; Álvaro Galán-Cuenca; Carmen López-Casado; Isabel García-Morales; Víctor F. Muñoz

doi:10.20944/preprints202604.0123.v1

Submitted:

01 April 2026

Posted:

02 April 2026

You are already at the latest version

Abstract

Robotic assistance in minimally invasive surgery has significantly improved precision and dexterity; however, many supportive tasks, such as blood aspiration, still rely on manual operation. This work presents the design and implementation of an autonomous robotic aspirator capable of detecting and removing intraoperative bleeding without continuous human intervention. The proposed system integrates a perception module based on a convolutional neural network for real-time blood segmentation, a task planner for high-level actions execution, and a control strategy based on artificial potential fields for autonomous navigation. Additionally, a mixed-reality human–robot interaction interface is incorporated to enable system supervision and seamless transition to teleoperation when required. The system was experimentally validated with a set of in-vitro experiments under three representative bleeding scenarios, evaluating four suction strategies based on the computation method for the target selection. Results demonstrate fast reaction times (below 0.04 s) and high blood removal rates (above 80% in all cases). The comparative analysis reveals that the performance of the suction strategies is scenario-dependent and highlights a trade-off between suction efficiency and removed area. These findings support the feasibility of autonomous robotic aspiration and provide insights into the design of adaptive strategies for surgical assistance, contributing toward increased autonomy and improved workflow efficiency in minimally invasive procedures.

Keywords:

surgical robotics

;

autonomous systems

;

blood detection

;

robot-assisted surgery

;

robotic aspirator

Subject:

Engineering - Bioengineering

1. Introduction

Robotic assistance has become a fundamental component of modern minimally invasive surgery, enabling enhanced precision, tremor filtering, and improved dexterity compared to conventional laparoscopy. Robotic platforms such as the da Vinci Surgical System have achieved widespread clinical adoption, with more than 8,600 systems installed worldwide and over 14 million procedures performed to date. Despite these advances, the predominant paradigm in robotic surgery remains master–slave teleoperation, in which the surgeon directly controls the surgical instruments while repetitive supportive tasks are still performed manually. For this reason, a growing research trend in medical robotics focuses on introducing increasing levels of autonomy into surgical assistants in order to reduce surgeon workload and improve workflow efficiency [1].

Early efforts in surgical automation have primarily addressed supportive tasks that exhibit a high degree of standardization. Among them, endoscope guidance has received considerable attention, as it is a simple task that can reduce surgeon cognitive load while maintaining a stable visual field. Several approaches have been proposed based on explicit surgeon commands, including voice interfaces [2], head-motion tracking [3], and gaze-based control [4]. Although these solutions eliminate the need for a human assistant, they still require continuous attention from the surgeon to guide the camera. More recently, research has explored camera guidance approaches in which the robot directly analyzes the surgical scene. Vision-based strategies that track surgical instruments have demonstrated the potential of this paradigm by endowing the robot with greater autonomy without continuous supervision by the surgeon [5,6,7].

Other surgical tasks that have been subject to automation are those common to a wide range of surgical procedures, and that exhibit a high degree of standardization, such as organ or tissue retraction [8,9,10], suturing [11,12,13], and knot tying [14]. These works demonstrate that automating supportive tasks can significantly streamline the surgical workflow.

Within this context, intraoperative blood aspiration represents a particularly suitable candidate for automation. During minimally invasive procedures, the accumulation of blood can rapidly degrade the visual field and obscure critical anatomical structures, requiring a human assistant to manually operate a suction tool to remove blood. This manual intervention introduces variability and coordination challenges that an autonomous robotic counterpart could mitigate. Previous studies have demonstrated that semi-autonomous suction can improve surgical performance by preventing states of high cognitive demands of the main surgeon [15]. However, achieving autonomous blood aspiration requires the integration of multiple technological challenges that need to be addressed: a robust real-time blood detection algorithm, a safe control strategy, and an efficient trajectory generation to optimize suction.

Several works have addressed automated blood removal. Richter et al. proposed an autonomous suction framework for detecting and removing blood using the da Vinci Research Kit, incorporating an autonomous trajectory generation based on temporal information (pixels age) to prioritize active bleeding regions [16]. Ou et al. tested a reinforcement learning strategy for rinsing and cleaning surgical fields [17]. Their approach is based on customized simulation environments, where two autonomous agents are trained using robot learning approaches and transferred to the real world. Although the results are very promising, in-depth information, inherent to real surgical scenarios, is not considered in this work. Barragan et al. studied the surgeon workload with a semi-autonomous suction robotic assistant [15]. They computed the centroids and areas of detected blood blobs using a fully convolutional model. Then, a straight-line trajectory from the current position to the calculated target position was calculated and executed by the robot. In this work, collision avoidance is not directly addressed; instead, the surgeon is advised of the robot’s target position to avoid the collision.

From the perception perspective, automatic bleeding detection in endoscopic imagery has significantly advanced in recent years. Early approaches relied on color filtering and handcrafted features [18], whereas modern methods predominantly employ deep learning models. Convolutional neural networks (CNNs) have demonstrated strong performance in detecting bleeding in laparoscopic videos. Horita et al. developed a YOLOv7-based detector capable of identifying active bleeding in real time during laparoscopic procedures [19]. Similarly, Hua et al. have proposed a hybrid spatiotemporal architecture combining RGB images and optical flow to improve robustness in dynamic surgical scenes [20]. Rabbani et al. have also proposed a ResNet-50-based space-time memory network with positional encoding for video-based bleeding source segmentation via adversarial domain adaptation between real and synthetic data [21].

In addition to perception, autonomous navigation of surgical instruments poses unique challenges due to the constraints of laparoscopic manipulation. Learning-based motion planning strategies, such as Deep Reinforcement Learning (DRL), have been explored for complex robotic manipulation tasks [22,23]. However, these approaches typically require large datasets and extensive training procedures, and they often lack the transparency and predictability required for safety-critical medical applications. Alternatively, deterministic motion planning techniques are attractive alternatives for surgical robotics. Among them, Artificial Potential Fields (APF), originally formulated by Khatib [24], provide a computationally efficient method for reactive navigation by combining attractive forces toward the goal and repulsive forces from obstacles.

The APF algorithm is based on the generation of artificial potential fields in Cartesian space, where the goal location induces an attractive force and obstacles generate repulsive forces that influence the robot motion. This technique is widely applied in robotic applications for obstacle avoidance [25,26], providing real-time low-level path planning and control. APF-based strategies have also demonstrated strong performance in scenarios involving dual-arm manipulators, enabling stabilized motion planning with smoother trajectories and optimized spatial separation [27]. These characteristics make APFs particularly suitable for MIS applications, where instruments operate within a confined workspace and unexpected collisions between instruments or with surrounding anatomical structures may occur. In this context, Tang et al. proposed an APF method for master-slave surgical robots in which virtual fixtures are automatically generated based on marked points by the surgeon before the intervention [28]. Similarly, Hao et al. proposed a novel path planning algorithm for surgical robots based on APFs to guide the tool to the goal position and a primal-dual neural network to minimize the angular velocity [29].

Motivated by these challenges, this work proposes an autonomous surgical robotic aspirator with a mixed-reality HRI interface for human supervision. An overview of this scenario is illustrated in Figure 1, showing a conventional minimally invasive procedure augmented with a robotic aspirator for autonomous blood removal. The system consists of a suction tool attached to an external robot, enabling autonomous control of the instrument. Supervision of the robot’s performance is done through a human-robot interaction (HRI) interface, consisting of a virtual reality (VR) headset for visualizing the intraoperative endoscopic image. Relevant information regarding the intervention and the performance of the robotic aspirator is overlaid onto the endoscopic view within a mixed-reality (MR) environment. This concept is applicable for either manual or robot-assisted management of the surgical tools, controlled by the main surgeon.

Thus, the main contributions of this paper are:

The design and implementation of a unified framework for autonomous surgical blood aspiration, integrating perception, a task planner for high-level actions, a navigation controller based on artificial potential fields, together with a mixed-reality human-robot interaction interface for human supervision and teleoperation if required.
Analysis of different suction strategies based on centroid-based computation methods for the target selection, including a novel evaluation of their spatial discrepancy and its impact on the robotic navigation behavior.
An extensive experimental validation under multiple representative bleeding scenarios, providing a systematic comparison of four centroid-based strategies in terms of reaction time, suction efficiency, and removal performance.

The paper is organized as follows. Section 2 describes the overall system architecture and details the bleeding detection and autonomous navigation algorithms. This section concludes with the experimental software and hardware experimental setup used for the validation of the system, along with the experimental design. Section 3 analyzes the experimental results and a discussion is presented in Section 4. Finally, Section 5 summarizes the conclusions of the work.

2. Material and Methods

Figure 2 shows the overall system architecture and workflow of the autonomous robotic aspirator proposed in this work. First, endoscopic images are processed in the perception layer, which performs blood segmentation using a pretrained convolutional neural network (CNN) that outputs a mask of the detected blood regions. Then, region analysis is performed to compute the following blood descriptors for each region: (1) the area, used to filter small blood regions and to control suction deactivation, (2) the centroid, which is used as candidate targets for tool navigation, and (3) a temporal persistence map, which represents the age of the blood and gives an idea of the blood flow.

Based on this information, the task planner generates the high-level navigation and suction actions. The mode selector module allows switching from autonomous navigation (task planner input) or teleoperation (haptic controller input) in case of emergency. Navigation towards a target position is implemented through a local planner based on an Artificial Potential Field (APF) approach to generate collision-free trajectories towards the target blood region while avoiding interactions with other surgical instruments and anatomical structures, such as organs or suturing needles. Robot low-level controller ensures that the tool motion satisfies the remote-center of motion (RCM) constraint imposed by the fulcrum point. Once the tool reaches the target position, the suction control module activates the suction tool. The system then monitors the area of the target region and deactivates suction once the blood has been removed.

To ensure safety, human supervision is provided through a human–robot interaction (HRI) interface. In this work, we propose a mixed-reality (MR) supervisory console based on a virtual reality (VR) headset and a haptic controller. Within the headset, the virtual environment is replaced by a real-time video stream captured by the laparoscopic endoscope, allowing the operator to directly visualize the intraoperative scene. This view is augmented with overlaid information about the scene (detected blood regions) and system performance (triggered high-level actions). The interface enables both passive supervision and direct teleoperation. Under normal operation, the robotic aspirator performs the task autonomously while the human assistant supervises its behavior through the HRI interface. If necessary, the operator can take direct control of the instrument by switching to teleoperation mode using a button on the haptic controller. In this mode, instrument motion is controlled through the hand motions of the controller, allowing intuitive manipulation of the tool.

2.1. Blood Segmentation and Region Analysis

The perception pipeline is summarized in Figure 3. For blood segmentation, we employ a CNN based on the U-Net architecture, a well-established model for medical image segmentation. This architecture is particularly suitable due to its ability to capture both global and local features, enabling precise blood segmentation. For each incoming endoscopic frame

I_{t}

, the CNN model produces a pixel-wise probability map indicating the likelihood of blood presence. By applying a predefined threshold, this probability map is converted into a binary mask

M_{t}

representing the segmented blood regions.

This binary mask is then analyzed to extract descriptive features for each detected blood region, which are aggregated into a blood descriptor tuple

D_{r}

as:

D_{r} = (r, A_{r},^{C} P_{r}, H_{r})

(1)

where r denotes the region identifier,

A_{r}

is the area of the region

,^{C} P_{r}

represents the 3D coordinates of its centroid with respect to the camera, and

H_{r}

is the temporal persistence descriptor encoding the age of the region pixels.

First, to improve robustness against noise and small artifacts, area filtering is applied to the mask: regions whose area is smaller than a predefined threshold

A_{t h}

are discarded to avoid spurious detections. In addition to spatial information, the algorithm maintains a temporal persistence map

H_{t} (i)

, which accumulates the number of consecutive frames in which each pixel i is classified as blood. This allows the system to estimate the temporal age of bleeding regions and distinguish persistent bleeding from transient artifacts. Then, the maximum persistence value within each region is computed as:

H_{r} = max_{i \in r} H_{t} (i)

(2)

This value is used by the task planner to prioritize blood regions in the presence of simultaneous bleeding.

Finally, the 3D centroid of each blood region

,^{C} P_{r}

, is transformed into the robot frame and used as the target location for the navigation action. As this position evolves with blood flow and progressive removal, it defines the dynamic trajectory of the aspirator. Thus, different centroid definitions can be employed, resulting in different suction strategies: while a geometric centroid provides a purely spatial estimate, a persistence-weighted centroid—computed based on pixel age guides the target towards either the bleeding source (older pixels) or the region boundaries. In this work, we explore both approaches (see Section 2.5.2) to evaluate their impact on the suction strategy.

The centroid

p (x_{c}, y_{c})

is initially computed in the 2D image space, as shown in Figure 4, which illustrates the complete geometric model of the system. The 3D

{centroid}^{C} P_{r} (x_{r}, y_{r}, z_{r})

is obtained by back-projecting the image coordinates using the pinhole camera model of an RGB-D camera as:

\begin{matrix} x_{r} & = \frac{(x_{c} - o_{x}) Z}{f_{x}} \\ y_{r} & = \frac{(y_{c} - o_{y}) Z}{f_{y}} \\ z_{r} & = Z \end{matrix}

(3)

where Z corresponds to the depth value at the centroid location,

(f_{x}, f_{y})

are the focal lengths in pixels, and

(o_{x}, o_{y})

denote the optical center of the camera. The transformation from the camera frame C to the robot base frame R is detailed in Section 2.2.

2.2. Task Planner

The task planner triggers the system high-level actions, namely navigation toward a target position (NavigateTo(pos)) and suction activation or deactivation (Suction(on/off)). The task planner workflow is described in Algorithm 1. When the perception layer identifies blood regions in the image, whose descriptors are stored in

D_{r}

, the blood region with the largest area is selected as the target region to be removed (

r^{*}

), i.e.:

r^{*} = arg max_{r \in D_{r}} A_{r}

(4)

While blood of the target region remains, i.e.,

A_{r^{*}}

is larger than a certain threshold

A_{s t o p}

, the system continuously updates the descriptor of the selected region

r^{*}

, i.e.,

D_{r^{*}} = (r^{*}, A_{r^{*}},^{C} P_{r^{*}}, H_{r^{*}})

, and computes the target position used to navigate the robot,

P_{r}

as:

P_{r} =^{R} T_{C} [^{C} \begin{matrix} P_{r^{*}} \\ 1 \end{matrix}]

(5)

{where}^{R} T_{C}

is the homogeneous transformation between the camera reference frame

{C}

and the robot base frame

{R}

. A navigation command is then issued to guide the suction tool toward

P_{r}

(note that the low-level control is handled by the control layer, see Section 2.3). The system evaluates the distance between the tool tip,

P_{tool}

, and the target position and activates suction when

∥ P_{tool} - P_{r} ∥ \leq δ

, being

δ

a certain distance threshold. Once the target blood region has been removed, suction is deactivated. If no blood regions are detected, the robot is commanded to move to its idle position.

Algorithm 1: Task planner workflow

2.3. Autonomous Navigation Method

Figure 5 illustrates the implementation of the navigation toward a target position action (NavigateTo(pos)). The local planner takes as input the target location of the blood region to be removed,

P_{r}

, and generates the desired reference position for the robot,

P_{d}

. The low-level controller then maps this reference into the corresponding robot end-effector position,

P_{E F}

, while enforcing the remote center of motion (RCM) constraint imposed by the fulcrum point, which is inherent to minimally invasive procedures. This autonomous navigation mode can be switched to teleoperation mode through the haptic controller of the HRI interface.

The local planner to guide the suction tool toward the target position is performed through Artificial Potential Fields (APF). This is a motion planning method in which the robot is guided by an artificial potential function defined over the workspace. Thus, the robot moves under the influence of a virtual force derived from this potential, which attracts it toward the goal and repels it away from obstacles.

Let

P (x, y, z)

denote the current position of the suction tool tip, and

P_{r} (x_{r}, y_{r}, z_{r})

the desired target position. The attractive potential is defined as a quadratic function of the distance to the goal:

U_{a t t} (x, y, z) = \frac{1}{2} K_{a t t} {∥ P - P_{r} ∥}^{2}

(6)

where

K_{a t t}

is a positive scalar gain that determines the strength of the attractive potential toward the goal. The repulsive potential associated with the m-th obstacle is defined as:

U_{r e p}^{m} (x, y, z) = \{\begin{matrix} \frac{1}{2} K_{r e p} {(\frac{1}{ρ_{m}} - \frac{1}{ρ_{0}})}^{2}, & ρ_{m} < ρ_{0}, \\ 0, & ρ_{m} \geq ρ_{0}, \end{matrix}

(7)

where

K_{r e p}

is a positive gain that determines how strongly the robot is pushed away from obstacles,

ρ_{m}

is the distance between the robot tool tip and the m-th obstacle, and

ρ_{0}

is the obstacle influence radius. The total potential field is defined as the sum of the attractive and repulsive contributions:

U (x, y, z) = U_{a t t} (x, y, z) + \sum_{m} U_{r e p}^{m} (x, y, z)

(8)

The robot motion is driven by the negative gradient of the potential field. In this work, a kinematic formulation is adopted, where the commanded velocity of the tool tip is defined as:

v_{c} (x, y, z) = - \nabla U (x, y, z)

(9)

Therefore, the desired position of the tool tip,

P_{d} (t)

, commanded to the low-level controller of the robot is computed as:

P_{d} (k + 1) = P_{d} (k) + v_{c} Δ (t)

(10)

In practice, several known limitations of classical APF methods are alleviated in the proposed implementation. In particular, oscillatory behavior near the goal and excessive velocities are reduced by introducing a dead zone around the target and saturating the commanded velocity. Additionally, robustness against outdated visual references is improved through a timeout mechanism that prevents the robot from reacting to stale target updates. It should be noted, however, that the present formulation corresponds primarily to an attractive-field implementation, and therefore obstacle-induced local minima are not explicitly addressed through repulsive interactions in the current system.

Finally, the desired reference position is then transformed into the corresponding command of the robot end-effector,

P_{E F}

, considering the RCM constraints imposed by the fulcrum position,

P_{f u l c r u m}

. Estimation of this position is implemented following the algorithm described in our previous work [30], which relies on the wrench, measured by a force/torque sensor at the end-effector of the robot,

W_{E F}

.

2.4. Experimental Setup

The experimental setup implemented in this work is illustrated in Figure 6. The robotic aspirator was implemented using a UR3 robot (Universal Robots) together with a conventional surgical aspirator donated by the Hospital Materno-Infantil of Málaga (Spain), which was modified to allow automatic activation and deactivation of the suction process.

To automate the suction process, a solenoid valve (model ST-DA

1 / 8^{''}

brass FKM, JP Fluid Control) controlled by a microcontroller (ESP32) was integrated into the system. The solenoid valve is connected on one side to the laparoscopic suction tool and on the other side to a suction pump (New Aspirate, CA-MI), allowing the suction flow to be automatically activated or deactivated through commands generated by the ESP32. To integrate the suction mechanism with the robotic manipulator, a dedicated tool adapter was designed. This component, shown in Figure 6(a), includes an internal cavity to house the solenoid valve and an outlet port for the suction tool. The rear part of the adapter is designed to be mechanically attached to the robot end effector, enabling the robot to manipulate the suction tool. Although the perception system has been evaluated using real surgical images (see Section 3.1), artificial bleeding was employed for laboratory experimentation. Bleeding was simulated using a custom-made fluid composed of water, milk, glycerin, and red dye, contained in a transparent reservoir.

As the vision system, an Intel RealSense D405 RGB-D camera (Intel Corp., USA) was used. This device provides both RGB images and depth information by combining two infrared sensors for stereo vision with an infrared projector. The camera offers a maximum RGB resolution of 1920 × 1080 pixels and a depth resolution of 1280 × 720 pixels, with a frame rate of up to 30 frames per second. The depth sensing range extends approximately from 0.2 m to 10 m, making it suitable for close-range perception tasks in the experimental surgical setup. The RGB-D camera is mounted on a second robot in order to facilitate the computation of the homogeneous transformation between the camera reference frame and the robot base

{frame,}^{R} T_{C}

of Equation 5.

As shown in Figure 6(b), the supervisory interface was implemented using the Meta Quest 3 platform. The headset provides a per-eye resolution of 2064 × 1920 pixels and a refresh rate of 90 Hz, enabling real-time visualization of the surgical field. The handheld controllers provide six degrees of freedom and include inertial sensors and programmable buttons, which are used to trigger teleoperation and to enable aspirator suction in this mode. The scene information integrated into the HRI interface includes:

Navigation: on/off. This indicator informs the human assistant that a bleeding region has been detected and that the system has initiated tool navigation towards the target position.
Suction: on/off. This indicator is set to on when the suction tool is activated, and to off otherwise.

The software integration of the different modules of the system was implemented using the Robot Operating System (ROS) as the common middleware platform. The mixed-reality application that merges the image from the vision system and the scene information for the HRI interface was developed in the Unity environment and deployed on the Meta Quest 3 headset. The integration with the rest of the system was implemented using a ROS bridge server to subscribe to the system topics, and a ROSConnection plugin to publish the information from the haptic controller.

2.5. Evaluation Methodology

Evaluation of the autonomous robotic aspirator proposed in this work was carried out following a twofold approach: (i) assessment of the bleeding segmentation module using real surgical images, and (ii) evaluation of the overall system performance in an in-vitro scenario.

2.5.1. Blood Segmentation Evaluation

The blood segmentation model described in Section 2.1 was evaluated using the dataset presented in [21], which contains 750 annotated images from real gynecological surgeries annotated with bleeding masks by junior surgeons and reviewed by experts clinicians. The dataset was split into training, validation, and test sets with proportions of 80%, 10%, and 10%, respectively. To improve generalization, data augmentation techniques including horizontal flipping and image resizing were applied. All images and corresponding masks were resized to

256 \times 256

pixels. Images were processed in RGB format, while masks were converted to grayscale and binarized. The network was trained using the Binary Cross-Entropy loss function and optimized with Adam. Training was conducted for 45 epochs with a batch size of 16. All experiments were performed on a system equipped with an NVIDIA Tesla V100 GPU (32 GB).

The performance of the segmentation model was evaluated using standard metrics in medical image analysis, including the Dice coefficient, Intersection over Union (IoU), precision, and recall. Since the model outputs a pixel-wise probability map, different binarization thresholds

τ

were analyzed to determine the optimal operating point.

2.5.2. Robotic Aspirator Evaluation

The performance and robustness of the autonomous robotic aspirator has been evaluated under three distinct bleeding scenarios (see Figure 7). These scenarios, designed to emulate representative intraoperative conditions, were defined based on the relative position between the bleeding source and the resulting blood accumulation, which is influenced by gravity and the surface inclination of the bleeding container:

Source-centered accumulation (S1): the inclination of the blood container is such that blood remains localized in close proximity to the bleeding source. This condition represents a flat or near-horizontal surgical field where gravitational effects are minimal, allowing blood to pool around the origin of bleeding.
Downstream flow accumulation (S2): the blood container is inclined such that blood flows away from the bleeding source, accumulating at a distal location. This scenario mimics surgical conditions where the patient’s anatomy or positioning introduces a slope, such as in laparoscopic procedures or surgeries involving angled anatomical structures, where fluids tend to migrate due to gravity.
Bilateral flow distribution (S3): in this case, the bleeding source is positioned such that blood spreads semi-symmetrically to both sides of the source, creating multiple accumulation regions, leading to a more complex spatial distribution of blood. This situation reflects cases where anatomical features or tissue geometries cause blood to bifurcate, such as around raised structures, cavities, or during procedures with uneven surfaces where fluid disperses in multiple directions

For each bleeding scenario, four different suction strategies were implemented and compared, namely: geometric centroid (C1), source-oriented centroid (C2), front-oriented centroid (C3), and deepest-point target (C4). These strategies differ in the method used to compute the centroid of the detected blood region (

p (x_{c}, y_{c})

in Figure 4), which directly determines the target position of the suction tool. Let B denote the set of pixels belonging to the bleeding region, with cardinality

N = | B |

. Let

(x_{i}, y_{i})

denote the image-plane coordinates of each pixel

i \in B

, and let

p = (x_{c}, y_{c})

denote the 2D target point. The computation of

(x_{c}, y_{c})

for each suction strategy is performed as follows:

Geometric centroid (C1): This method provides a global estimate of the spatial distribution of the bleeding region. The centroid is computed as:

$\begin{matrix} x_{c} & = \frac{1}{N} \sum_{i \in B} x_{i} \\ y_{c} & = \frac{1}{N} \sum_{i \in B} y_{i} \end{matrix}$

(11)
Source-oriented centroid (C2): This strategy exploits the temporal persistence map $H_{t} (i)$ to prioritize pixels in the vicinity of the bleeding source. Let $B_{core}$ denote the subset of persistent pixels:

$B_{core} = {i \in B ∣ H_{t} (i) \geq P_{p} (H_{t})}$

(12)

where $P_{p} (H_{t})$ denotes the p-th percentile of persistence values within the bleeding region ( $p = 80$ in our implementation). The centroid is then computed as:

$\begin{matrix} x_{c} & = \frac{1}{| B_{core} |} \sum_{i \in B_{core}} x_{i} \\ y_{c} & = \frac{1}{| B_{core} |} \sum_{i \in B_{core}} y_{i} \end{matrix}$

(13)
Front-oriented centroid (C3): In contrast to the previous strategy, this method prioritizes pixels with lower persistence, which are assumed to correspond to the advancing bleeding front. Each pixel is assigned a weight $w (i)$ inversely proportional to its persistence value:

$w (i) = \frac{1}{H_{t} (i) + 1}$

(14)

The weighted centroid is then computed as:

$\begin{matrix} x_{c} & = \frac{\sum_{i \in B} x_{i} w (i)}{\sum_{i \in B} w (i)} \\ y_{c} & = \frac{\sum_{i \in B} y_{i} w (i)}{\sum_{i \in B} w (i)} \end{matrix}$

(15)
Deepest-point target (C4): This strategy selects the most interior point of the bleeding region by exploiting the distance transform of the segmented mask. Let $D (i)$ denote the distance from pixel $i \in B$ to the nearest boundary of the bleeding region. The target index $i^{*}$ is defined as:

$i^{*} = arg max_{i \in B} D (i)$

(16)

The corresponding image-plane coordinates are then given by

$\begin{matrix} x_{c} & = x_{i^{*}} \\ y_{c} & = y_{i^{*}} \end{matrix}$

(17)

Figure 8 illustrates the temporal evolution of the target position computation using the four centroid-based methods as the bleeding region progressively expands. This comparison highlights the distinct behaviors of the proposed strategies under dynamic conditions. From left to right, the sequence shows how each method responds to changes in the spatial distribution of blood over time. The geometric centroid (C1) remains centered within the overall bleeding region, providing a global representation of its distribution. The source-oriented centroid (C2) focuses on areas with sustained bleeding, remaining close to the bleeding source. In contrast, the front-oriented centroid (C3) shifts toward newly emerging regions, following the expansion of the bleeding front. Finally, the deepest-point target (C4) consistently identifies the most interior region of the accumulated blood, avoiding situations in which the estimated target falls outside the active bleeding flow, as may occur with other methods.

For each suction strategy, 20 independent trials were conducted in each bleeding scenario, being a trial defined as a single experimental run, beginning with the manual injection of approximately 10 ml of artificial blood and ending when the suction process is completed. Considering the four strategies and three scenarios, this resulted in a total of 240 trials, with 80 trials per scenario (see Table 1). This design enables a comprehensive assessment of how centroid estimation methods perform under varying fluid dynamics and spatial distributions, which are critical factors in real surgical environments.

To quantitatively assess the performance of the proposed system, the following metrics were computed for each trial:

Reaction time (s): time between a blood region is detected and the robot starts navigating.
Suction time (s): corresponds to the duration required to complete the removal of the target bleeding region. To ensure comparability across trials with different initial blood distributions, suction time was normalized by multiplying it by the factor $A_{m e a n} / A$ , where $A_{m e a n}$ represents the average blood area across trials within the same bleeding scenario, and A is the maximum area within each trial.
Suction efficiency (pixels/s): defined as the rate of blood removal, computed as the maximum area within each trial, A, divided by the non-normalized suction time.
Removed area (%): represents the percentage of blood that was successfully removed upon completion of the suction process.

To further analyze the differences between the centroid-based strategies, an additional geometric metric was defined to quantify the spatial discrepancy between target positions. Let

p_{i} = (x_{c}^{i}, y_{c}^{i})

denote the 2D coordinates of the centroid obtained with strategy

C_{i}

. The pairwise distance between centroids i and j is defined as:

d_{i j} = \sqrt{{(x_{c}^{i} - x_{c}^{j})}^{2} + {(y_{c}^{i} - y_{c}^{j})}^{2}}

(18)

This metric provides a quantitative measure of the spatial variability between centroid-based strategies. For each bleeding scenario, the reported distances correspond to the mean values computed over the 80 trials performed under that scenario.