Reinforcement Learning for Robot Assisted Live Ultrasound Examination

Chenyang Li; Tao Zhang; Ziqi Zhou; Baoliang Zhao; Peng Zhang; Xiaozhi Qi

doi:10.20944/preprints202508.1647.v1

Submitted:

21 August 2025

Posted:

22 August 2025

You are already at the latest version

Abstract

Due to its portability, non-invasiveness, and real-time capabilities, ultrasound imaging has been widely adopted for liver disease detection. However, conventional ultrasound examinations heavily rely on operator expertise, leading to high workload and inconsistent imaging quality. To address these challenges, we propose a Robotic Ultrasound Scanning System (RUSS) based on reinforcement learning to automate the localization of standard liver planes. And it can help reduce physician burden while improve scanning efficiency and accuracy. The reinforcement learning agent employs a Deep Q-Network (DQN) integrated with LSTM to control probe movements within a discrete action space, utilizing the cross-sectional area of the abdominal aorta region as the criterion for standard plane determination. Experimental results demonstrate that the system successfully obtained the target plane in three real-world trials. The Peak Signal-to-Noise Ratio (PSNR) is used to evaluate the image similarity and the average value is 21.53 dB, which verifies the effectiveness of the proposed method.

Keywords:

live ultrasound examination

;

reinforcement learning

;

robotic ultrasound scanning system

Subject:

Engineering - Control and Systems Engineering

1. Introduction

Liver diseases persistently threaten human health, rendering their diagnosis and treatment a perennial focus of medical research. In recent years, the incidence of hepatic pathologies such as fatty liver disease, cirrhosis, and hepatocellular carcinoma has shown a marked increase worldwide. Medical imaging plays a pivotal role in the diagnostic workflow, with modalities including ultrasonography, computed tomography (CT), and magnetic resonance imaging (MRI). Since its inaugural diagnostic application in the 1940s, ultrasound (US) imaging has emerged as one of the most ubiquitous diagnostic modalities globally [1,2]. Owing to its portability, non-invasiveness, cost-effectiveness, and real-time capabilities compared to alternative imaging techniques, ultrasonography has been extensively adopted across medical disciplines including cardiology [3], urology [4], neurology [5], and obstetrics/gynecology [6], demonstrating substantial clinical utility in disease diagnosis and treatment.

In standardized US examinations, standard planes refer to anatomically defined imaging planes established through expert consensus, which encapsulate essential structural information for diagnostic interpretation, biometric measurement, or interventional guidance [7,8]. The localization of these standardized planes facilitates clinicians’ identification of hepatic anatomical landmarks for diagnostic or surgical purposes [9]. However, current standard plane localization necessitates manual probe navigation based on real-time US image interpretation and anatomical knowledge, demanding sonographers’ extensive training and experience. Consequently, practitioners endure significant physical and cognitive burdens due to excessive workloads [10], while imaging quality remains highly operator-dependent [11]. These challenges underscore the potential of autonomous robotic US scanning systems to mitigate user fatigue and enhance imaging consistency [12].

Robotic Ultrasound Scanning Systems (RUSS) represent sensor-integrated platforms capable of performing optimized US scanning with minimal human intervention. Such systems exhibit adaptive control capabilities through sensor feedback. Nakadate et al. [13,14] developed a hybrid robotic system combining a 6-DoF parallel manipulator with a passive serial arm, integrated with real-time image processing algorithms for automated localization of optimal carotid artery longitudinal views. Mustafa et al. [15,16] employed a commercial 6-DoF robotic manipulator with RGB camera guidance to autonomously screen liver regions, utilizing surface topography derived from abdominal RGB images. Conventional robotic US systems typically employ stereovision sensors for depth perception, implementing point cloud processing algorithms for 3D scene reconstruction and kinematics-based path optimization [17,18,19,20,21]. However, such scene reconstruction approaches face inherent occlusion challenges, particularly in marker-dependent methods, while their accuracy and efficiency remain constrained by 3D imaging hardware limitations. The complex hepatic anatomy, multiplicity of standard scanning planes, and probe-position-dependent image quality render surface-topography-based planning inadequate for comprehensive US path planning.

Reinforcement learning (RL) offers a promising paradigm for autonomous US probe navigation. Given RL’s inherent strengths in sequential decision-making and exploratory tasks, increasing research efforts have focused on RL-based probe navigation. Dou et al. [22] adopted this methodology to fetal brain US standard plane detection. However, these approaches remain limited to pre-acquired 3D volume analysis rather than realtime probe control. Jarosik et al. [23] implemented RL agents for virtual probe manipulation in simplified static phantom environments. Similarly, Milletari’s work [24] demonstrated RL-based cardiac US navigation in a simulation environment constructed from spatially tracked US frames. Hase et al. [25] employed 2D image grids for RL training in sacrum localization, while it is restricted to 2-DoF translational movements requiring precise initialization. Li et al. [26] proposed a Deep Q-Learning framework accommodating 6-DoF movements while constraining probe contact, validated in a virtual spine US environment. Bi et al. [27] introduced VesNet-RL, a simulation-based RL framework for vascular standard view localization. While these researches have not applied RL algorithms on robotic US scanning systems in actual test environment.

To alleviate sonographers’ workload while improving scanning efficiency and accuracy, this study proposes a novel robotic system for autonomous liver US scanning based on reinforcement learning algorithm. Section 2 details the robotic system architecture, hepatic structure segmentation algorithms, and RL-based standard plane localization methodology. Experimental validation follows in Section 3, with discussion and conclusions presented in Section 4.

2. Materials and Methods

2.1. Robotic Ultrasound Scanning System Construction

The robotic ultrasound scanning system comprises the following components: a 7 DOF robotic arm (Diania7 Med, Agile Robots, China), an ATI force sensor (Mini 40, ATI Industrial Automation, USA), and an ultrasound system (Affiniti 30, Philips, Inc., Holland) with convex array probe (C6-2), a control computer and a liver phantom (057A, CIRS, USA). The configuration of the robotic system is illustrated in Figure 1.

The system utilizes the official C++ based API interfaces provided for the robotic arm. The upper computer application was developed using Qt framework (5.9.5). This integrated system incorporates multiple functionalities including force feedback, pose information, and real-time image display. For the reinforcement learning based liver ultrasound standard plane localization algorithm, the computational hardware configuration consists of an NVIDIA RTX 3070Ti GPU, an Intel i7-12700K CPU, 32GB RAM, running on Windows 10 operating system. The center point of the ultrasound probe’s scanning array was designated as the origin of the tool coordinate system. The six-axis force sensor and ultrasound probe were integrated at the robotic arm’s flange end and constituted the end-effector tool assembly, as shown in Figure 2a.

2.2. Liver Standard Plane Localization

To start the liver standard plane localization, the probe is initially perpendicularly in contact with the abdominal surface, with its long axis aligned parallel to the body’s longitudinal axis，as illustrated in Figure 2b. Then the robot, guided by a reinforcement learning agent, incrementally adjusts the position and orientation of the probe in order to acquire a clear and accurate image of the subcostal longitudinal section of the aorta. During this process, the contact force is set as 7N to keep stable ultrasound image quality, and the ultrasound images provide interactive information that serves as state feedback to the reinforcement learning agent. Based on the current state, the agent generates a sequence of action commands to drive the robot in executing precise movements.

2.3. Construction and Training of the Reinforcement Learning Agent

The training of the reinforcement learning agent requires continuous interaction with the environment and involves lengthy training cycles. Considering safety concerns associated with training directly in a physical environment, the training is conducted within a simulated environment. Upon completion of training, the agent is then deployed onto the robot. Since the standard liver imaging planes are typically located within a fixed acoustic window, and given the occlusion caused by the ribs which makes full three-dimensional reconstruction of the liver challenging and unnecessary, a local 3D reconstruction is performed solely within the acoustic window containing the standard plane. This reconstructed volume serves as the simulation environment for the reinforcement learning agent’s training.

The local 3D ultrasound reconstruction of the liver is implemented using the open-source medical software 3D Slicer, which can autonomously perform 3D reconstruction and rendering based on DICOM slices acquired in parallel at fixed intervals. In this study, the task of acquiring parallel images at fixed intervals is controlled by a robot through an auxiliary localization system developed with Qt.

This study takes the identification of the standard plane of subcostal longitudinal section of the aorta as an example. In the ultrasound image shown in Figure 3, the completeness and clarity of the abdominal aorta (highlighted by the red rectangular box) serve as critical criteria for identifying the standard plane of subcostal longitudinal section of the aorta. Hence, these key features must be distinctly annotated.

To simplify the training environment and enable the reinforcement learning agent to focus on scanning the liver and key anatomical structures, a deep learning model, Unet-Liver, was trained to segment the liver and abdominal aorta from complex ultrasound images. This model is based on the U-Net architecture [28], and the training workflow is illustrated in Figure 4. To ensure the network focuses on relevant information and meets the input requirements of Unet-Liver, the ultrasound images undergo cropping and padding processing. The original image size is 1024 × 768 pixels and the processed image size is 560 × 560 pixels. The trained Unet-Liver network outputs a three-class segmentation map, with red indicating the liver, green representing the abdominal aorta, and black representing cavity.

After completing the local 3D liver reconstruction and the training of the segmentation network, the reinforcement learning simulation environment can be constructed. This environment facilitates interaction with the agent during training and consists of two main functions: state acquisition and action execution. The state acquisition function performs the following operations: first, within the local 3D liver reconstruction volume, a probe pose is randomly initialized within a given range. The intersection plane between the ultrasound probe at this pose and the 3D reconstructed liver volume is computed to generate the corresponding ultrasound image. Next, this ultrasound image is processed and fed into the Unet-Liver network to produce a three-class segmentation map. Finally, the current state is returned. The action execution function includes the following steps: first, the decision for the next probe movement is obtained by inputting the current state into the DQN network. Then, the chosen movement is executed, resulting in an updated probe pose and a new ultrasound image acquired at this pose. Subsequently, the reward feedback for the current movement is computed based on the data from the current and previous time steps. Finally, relevant data including the updated state are returned, concluding the current action step. This simulation environment provides an interactive platform for training the reinforcement learning agent and lays the foundation for subsequent agent training.

The Markov Decision Process (MDP) constitutes the standard framework for reinforcement learning, encompassing the state space, action space, state transition function, reward function, and discount factor. In the task of localizing standard ultrasound planes, the true state of the ultrasound probe is not directly observable, resulting in a partially observable Markov decision process (POMDP).

Within the reinforcement learning framework, the agent’s behavior is determined by the actions it executes, and the set of all possible actions is referred to as the action space. The action space is designed as a discrete set. All translational and rotational movements are performed within the end-effector coordinate. Specifically, translational movements occur along the end-effector’s X and Y axes with 1 mm increments, and rotational movements are executed around the end-effector’s Z axis in 1° increments. This action space corresponds to the probe’s three degrees of freedom(translational motion along the X/Y axes and rotational motion around the Z-axis).

The state represents a comprehensive description of the environment during the agent’s interaction, encompassing all information necessary for decision-making. The area of the abdominal aorta region within the ultrasound image are utilized as observations to estimate the actual state.

Reward is the immediate feedback provided by the environment following the agent’s action execution, serving to quantify the quality of that action. The reward constitutes the core driving force for the agent’s learning, with the ultimate goal of maximizing the cumulative reward, i.e., the long-term return. The design of the reward function directly impacts the agent’s learning effectiveness and policy optimization. The reward is designed to motivate the agent to locate the maximal transverse cross-section of the abdominal aorta, which corresponds to the standard plane of subcostal longitudinal section of the aorta. The reward design incorporates the area of the segmented abdominal aorta region together with the distance to the target. In the local 3D liver reconstruction volume, the position of the probe corresponding to the standard imaging plane is known. The distance to the standard plane posture can be defined as

d_{t} = \sqrt{{(x_{2} - x_{1})}^{2} + {(y_{2} - y_{1})}^{2}}

(1)

where

(x_{1}, y_{1})

denotes the position of the ultrasound probe corresponding to the standard imaging plane,

(x_{2}, y_{2})

represents the current position of the ultrasound probe, and

d_{t}

is the Euclidean distance between the two positions. A distance-related reward can be defined for the current state as

v_{d} = (d_{t} - d_{t + 1}) / d_{\max}

(2)

where

d_{t}

denotes probe-to-target distance at the last time step,

d_{t + 1}

denotes probe-to-target distance at current time step and

d_{\max}

denotes the maximum distance from any position within the defined acoustic window to the target position. In addition to the distance-based reward, the score of the current state is also influenced by the reward associated with the segmented abdominal aorta region in the current ultrasound frame, which is defined as

v_{s} = (g_{t + 1} - g_{t}) / g_{\max}

(3)

where

g_{\max}

denotes pixel number of the abdominal aorta in the standard plane,

g_{t}

denotes the number of pixels corresponding to the abdominal aorta in the ultrasound image at the last time step, and

g_{t + 1}

represents the number of pixels in the abdominal aorta at the current time step. The reward function settings are shown in Table 1.

Since the reinforcement learning problem in this work involves a continuous state space and a discrete action space, the DQN algorithm [29] is selected for the reinforcement learning agent. As previously described, the process in this study is modeled as a Partially Observable Markov Decision Process (POMDP). To handle the uncertainties introduced by partial observability, LSTM units [30] are incorporated to fully exploit sequential information and to help the agent better understand its environment. The network architecture is illustrated in Figure 5.

3. Results

3.1. Three-Dimensional Reconstruction Results of Local Liver Ultrasound

To enable the training interaction of the reinforcement learning agent, a 3D liver reconstruction model was required to simulate the training environment. A Qt-based auxiliary localization system was developed and used in conjunction with the PHILIPS Affiniti 30 ultrasound system to collect ultrasound data that met the reconstruction requirements. In this study, the fixed movement distance was set to 0.26 mm, corresponding to the pixel spacing of the DICOM images. A total of 538 ultrasound images were acquired, as shown in Figure 6a. These images were imported into 3D Slicer and reconstructed along the X-axis, resulting in a local 3D liver reconstruction volume, as shown in Figure 6b. A slice along the Y-axis at the center of the reconstructed volume is shown in Figure 6c, which clearly presents the anatomical structure of the organ.

Figure 6. (a) Sample of the acquired ultrasound images, (b) local 3D reconstruction volume of the liver, (c) reconstructed slice along the Y-Axis at the center of the reconstructed volume.

Figure 7. Example of segmentation Results. (a) Original image, (b) segmentation result of Unet-Liver, (c) ground truth label.

3.2. Image Segmentation and Recognition Results

To ensure data diversity and enhance the model’s generalization capability, the training dataset for the network was composed of both reconstructed ultrasound slices and real ultrasound slices obtained by scanning the phantom. This approach facilitates the later transfer of the reinforcement learning agent from the virtual environment to the real environment. A total of 310 ultrasound images were collected, including 280 images from the 3D reconstructed volume and 30 images from the real phantom scanning. Among the 3D reconstructed volume images, 136 were obtained by translational movement of the probe collected at different positions, and 144 were obtained by rotational movement of the probe around Z-axis with varying angles. The images in the dataset were annotated using LabelMe [31] and converted into final three-class label maps, where red indicates the liver, green indicates the abdominal aorta, and black indicates cavity. The 310 image-label pairs were split into training, validation, and test sets in a ratio of 7:2:1, resulting in 217 pairs for training, 62 for validation, and 31 for testing.

The Unet-Liver model was trained on the PyCharm platform with a learning rate set to 0.0001, using the Adam optimizer and cross-entropy loss function. The model was trained for 100 epochs. The performance of Unet-Liver was evaluated using the Intersection over Union (IoU) and Dice metrics. On the test set, it achieved an average IoU of 0.9717 and an average Dice of 0.9838. The predicted segmentation maps generated by the model exhibit high similarity to the ground truth labels, as shown in Figure 7, demonstrating that the model meets the requirements of this study.

3.3. Experiment Results of the Reinforcement Learning Agent

The designed network model described above was implemented and trained based on the open-source deep learning framework PyTorch. The training was conducted in the simulated environment with the following parameters: initial learning rate set to 1e-3, which was decreased by 0.5% after each episode, with a minimum learning rate of 0.00001; initial exploration rate set to 0.9, which was also decreased by 0.5% after each episode, with a minimum exploration rate of 0.02. The total number of training episodes was 1500. Other hyperparameters included a hidden layer dimension of 128, discount factor of 0.98, target network update interval of 10 episodes, LSTM hidden unit dimension of 64, experience replay buffer size of 10,000, minimum buffer size for sampling of 500, batch size of 64, historical sequence length of 5, state dimension of 1, and action space dimension of 6.

To validate the effectiveness of the reinforcement learning agent, experiments were conducted to acquire the standard plane of subcostal longitudinal section of the aorta. The experimental procedure is as follows: First, the ultrasound probe starts from a random posture within the target acoustic window. Then, the system begins using the trained RL agent to make decisions based on real-time ultrasound image inputs, autonomously controlling the robotic arm’s next movement. When the termination condition is met, the scanning ends and the final ultrasound image is saved.

The system successfully obtained the target plane in three real-world trials, as shown in Figure 8. The target ultrasound image is shown in Figure 9. The similarity between the three successfully acquired standard plane images and the target ultrasound image was evaluated. In this study, the Mean Squared Error (MSE) was used to calculate the difference between each of the three standard plane images and the target ultrasound image. Then, the Peak Signal-to-Noise Ratio (PSNR) was computed using Equation (4) to intuitively quantify the image similarity. The experimental results are shown in Table 2.

P S N R = 10 \times \log (\frac{255^{2}}{M S E})

(4)

According to Table 2, the PSNR values of the three experiments are all between 20 dB and 25 dB, with an average value of 21.53 dB. This indicates that the differences between the obtained standard plane images and the target ultrasound images fall within an acceptable range.

4. Discussion

We developed a robotic system for autonomous liver ultrasound examination, and the reinforcement learning method is adopted to drive the robot to successfully obtain the standard liver ultrasound planes, ultimately realizing an autonomous, accurate, and efficient liver scanning task.

The study began with local 3D reconstruction of liver ultrasound data, followed by the development of a segmentation model for the liver and abdominal aorta based on the Unet network. This model achieved an average IoU of 0.9717 on the test set, providing a high-fidelity simulated environment for reinforcement learning training. The designed reinforcement learning agent was then trained interactively with this environment to obtain the liver standard plane. Experimental validation showed that the agent successfully found the standard plane in three real-world trials. The PSNR values for these three experiments were between 20 dB and 25 dB, with an average of 21.53 dB, indicating high similarity between the localized ultrasound images and the standard plane images, demonstrating good performance in autonomous scanning.

Future work could focus on adopting novel methods for liver 3D reconstruction to achieve more accurate and complete modeling. Additionally, future efforts will consider incorporating more image features into the reinforcement learning agent’s reward design to provide better guidance for accurately locating the standard plane. Moreover, enriching the training dataset for the Unet-Liver network will enhance its generalization capability, leading to improved segmentation performance.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org

Author Contributions

Conceptualization, T.Z.; methodology, C.L.; software, Z.Z; formal analysis, B.Z.; resources, X.Q.; data curation, P.Z.; writing—original draft preparation, T.Z.; writing—review and editing, B.Z.; visualization, Z.Z.; project administration, B.Z.; funding acquisition, X.Q. All authors have read and agreed to the published version of the manuscript..

Funding

This work was supported by the Shenzhen Key Technology Research and Development Project (Grant No. JSGG20220831100202004), Shenzhen Fundamental Research Funds (Grant No. JCYJ20241202152803005, KJZD20240903100200002), the National Natural Science Foundation of China (Grant No. 62403450, U23A20391).

Data Availability Statement

Dataset available on request from the authors.

Acknowledgments

During the preparation of this manuscript, the authors used Google Translate for the purposes of translating Chinese into English. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Newman, P. G., & Rozycki, G. S. (1998). The history of ultrasound. Surgical clinics of north America, 78(2), 179-195.
Shung, K. K. (2011). Diagnostic ultrasound: Past, present, and future. J Med Biol Eng, 31(6), 371-4.
K. J. Schmailzl and O. Ormerod, Ultrasound in Cardiology. Oxford, U.K.: Blackwell Sci., 1994.
Peeling, W. B., & Griffiths, G. J. (1984). Imaging of the prostate by ultrasound. The Journal of urology, 132(2), 217-224.
Leinenga, G., Langton, C., Nisbet, R., & Götz, J. (2016). Ultrasound treatment of neurological diseases—current and emerging applications. Nature Reviews Neurology, 12(3), 161-174. [CrossRef]
P. W. Callen, Ultrasonography in Obstetrics and Gynecology. London,U.K.: Elsevier Health Sci., 2011.
Baumgartner, C. F., Kamnitsas, K., Matthew, J., Fletcher, T. P., Smith, S., Koch, L. M., ... & Rueckert, D. (2017). SonoNet: real-time detection and localisation of fetal standard scan planes in freehand ultrasound. IEEE transactions on medical imaging, 36(11), 2204-2215. [CrossRef]
Chang, K. V., Kara, M., Su, D. C. J., Gürçay, E., Kaymak, B., Wu, W. T., & Özçakar, L. (2019). Sonoanatomy of the spine: a comprehensive scanning protocol from cervical to sacral region. Medical ultrasonography, 21(4), 474-482. [CrossRef]
Karmakar, M. K., & Chin, K. J. (2017). Spinal sonography and applications of ultrasound for central neuraxial blocks. Diunduh dari: http://www. nysora. com/techniques/neuraxialand-perineuraxial-techniques/ultrasoundguided/3276-spinal-and-epidural-block. html, 1.
Muir, M., Hrynkow, P., Chase, R., Boyce, D., & Mclean, D. (2004). The nature, cause, and extent of occupational musculoskeletal injuries among sonographers: recommendations for treatment and prevention. Journal of Diagnostic Medical Sonography, 20(5), 317-325.
Berg, W. A., Blume, J. D., Cormack, J. B., & Mendelson, E. B. (2006). Operator dependence of physician-performed whole-breast US: lesion detection and characterization. Radiology, 241(2), 355-365. [CrossRef]
Yang, G. Z., J. Nelson, B., Murphy, R. R., Choset, H., Christensen, H., H. Collins, S., ... & McNutt, M. (2020). Combating COVID-19—The role of robotics in managing public health and infectious diseases. Science robotics, 5(40), eabb5589.
Nakadate, R., Solis, J., Takanishi, A., Minagawa, E., Sugawara, M., & Niki, K. (2010, October). Implementation of an automatic scanning and detection algorithm for the carotid artery by an assisted-robotic measurement system. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 313-318). IEEE.
Nakadate, R., Uda, H., Hirano, H., Solis, J., Takanishi, A., Minagawa, E., ... & Niki, K. (2009, October). Development of assisted-robotic system designed to measure the wave intensity with an ultrasonic diagnostic device. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 510-515). IEEE.
Mustafa, A. S. B., Ishii, T., Matsunaga, Y., Nakadate, R., Ishii, H., Ogawa, K., ... & Takanishi, A. (2013, December). Development of robotic system for autonomous liver screening using ultrasound scanning device. In 2013 IEEE international conference on robotics and biomimetics (ROBIO) (pp. 804-809). IEEE.
Mustafa, A. S. B., Ishii, T., Matsunaga, Y., Nakadate, R., Ishii, H., Ogawa, K., ... & Takanishi, A. (2013, July). Human abdomen recognition using camera and force sensor in medical robot system for automatic ultrasound scan. In 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 4855-4858). IEEE.
ROSEN J. Surgical robotics[M]// Medical Devices: Surgical and Image-Guided Technologies. Hoboken, NJ: Wiley, 2013: 63-98.
Pan, Z., Tian, S., Guo, M., Zhang, J., Yu, N., & Xin, Y. (2017, August). Comparison of medical image 3D reconstruction rendering methods for robot-assisted surgery. In 2017 2nd International Conference on Advanced Robotics and Mechatronics (ICARM) (pp. 94-99). IEEE.
Sung, G. T., & Gill, I. S. (2001). Robotic laparoscopic surgery: a comparison of the da Vinci and Zeus systems. Urology, 58(6), 893-898. [CrossRef]
Huang, Q., Lan, J., & Li, X. (2018). Robotic arm based automatic ultrasound scanning for three-dimensional imaging. IEEE Transactions on Industrial Informatics, 15(2), 1173-1182. [CrossRef]
Merouche, S., Allard, L., Montagnon, E., Soulez, G., Bigras, P., & Cloutier, G. (2015). A robotic ultrasound scanner for automatic vessel tracking and three-dimensional reconstruction of b-mode images. IEEE transactions on ultrasonics, ferroelectrics, and frequency control, 63(1), 35-46. [CrossRef]
Dou, H., Yang, X., Qian, J., Xue, W., Qin, H., Wang, X., ... & Ni, D. (2019, October). Agent with warm start and active termination for plane localization in 3D ultrasound. In International conference on medical image computing and computer-assisted intervention (pp. 290-298). Cham: Springer International Publishing.
Jarosik, P., & Lewandowski, M. (2019, October). Automatic ultrasound guidance based on deep reinforcement learning. In 2019 IEEE International Ultrasonics Symposium (IUS)(pp. 475-478). IEEE..
Milletari, F., Birodkar, V., & Sofka, M. (2019). Straight to the point: Reinforcement learning for user guidance in ultrasound. In Smart Ultrasound Imaging and Perinatal, Preterm and Paediatric Image Analysis: First International Workshop, SUSI 2019, and 4th International Workshop, PIPPI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13 and 17, 2019, Proceedings 4 (pp. 3-10). Springer International Publishing.
Hase, H., Azampour, M. F., Tirindelli, M., Paschali, M., Simson, W., Fatemizadeh, E., & Navab, N. (2020, October). Ultrasound-guided robotic navigation with deep reinforcement learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 5534-5541). IEEE.
Li, K., Wang, J., Xu, Y., Qin, H., Liu, D., Liu, L., & Meng, M. Q. H. (2021, May). Autonomous navigation of an ultrasound probe towards standard scan planes with deep reinforcement learning. In 2021 IEEE International Conference on Robotics and Automation (ICRA) (pp. 8302-8308). IEEE.
Bi, Y., Jiang, Z., Gao, Y., Wendler, T., Karlas, A., & Navab, N. (2022). VesNet-RL: Simulation-based reinforcement learning for real-world US probe navigation. IEEE Robotics and Automation Letters, 7(3), 6638-6645. [CrossRef]
Al Qurri, A., & Almekkawy, M. (2023). Improved UNet with attention for medical image segmentation. Sensors, 23(20), 8589. [CrossRef]
Jain, G., Kumar, A., & Bhat, S. A. (2024). Recent developments of game theory and reinforcement learning approaches: A systematic review. IEEE Access, 12, 9999-10011. [CrossRef]
Wen, X., & Li, W. (2023). Time series prediction based on LSTM-attention-LSTM model. IEEE access, 11, 48322-48331. [CrossRef]
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). LabelMe: a database and web-based tool for image annotation. International journal of computer vision, 77, 157-173. [CrossRef]

Figure 1. Configuration of the robotic ultrasound scanning system.

Figure 2. (a) End-effector tool assembly model, (b) Probe coordinate system.

Figure 3. The standard plane of subcostal longitudinal section of the aorta.

Figure 4. Training workflow of the Unet-Liver model.

Figure 5. LSTM-DQN network.

Figure 8. The ultrasound image of standard plane obtained in three experiments.

Figure 9. The target ultrasound image of standard plane.

Table 1. Reward Function.

Reward Function $r$	Condition
-1	Out of bounds ¹
$- 0.5 + v_{s} + 6 v_{d} - 0.04 - 0.01$	Outside the alert bounds & $g_{t + 1} < 1000 & s t e p < 50$
$- 0.5 + v_{s} + 6 v_{d} - 0.01$	Within the alert bounds & $g_{t + 1} < 1000 & s t e p < 50$
$1 + 10 (g_{t + 1} / g_{\max} - 0.8) + v_{s} + 6 v_{d} - 0.04 - 0.01$	Outside the alert bounds & $0 . 8 \leq g_{t + 1} / g_{\max} < 0.85 & s t e p < 50$
$1 + 10 (g_{t + 1} / g_{\max} - 0.8) + v_{s} + 6 v_{d} - 0.01$	Within the alert bounds & $0 . 8 \leq g_{t + 1} / g_{\max} < 0.85 & s t e p < 50$
$v_{s} + 6 v_{d} - 0.04 - 0.01$	Outside the alert bounds & $1000 / g_{\max} \leq g_{t + 1} / g_{\max} < 0.8 & s t e p < 50$
$v_{s} + 6 v_{d} - 0.01$	Within the alert bounds & $1000 / g_{\max} \leq g_{t + 1} / g_{\max} < 0.8 & s t e p < 50$
5	$g_{t + 1} / g_{\max} \geq 0 . 85$
1	$s t e p = 50$

¹ Out-of-bounds refers to exceeding the predefined rectangular acoustic window specified in this study. The alert bounds is defined as a new rectangular region obtained by inwardly contracting the original acoustic window by 1 mm. The term step denotes the cumulative number of movements within a single episode.

Table 2. Experiment results summary.

	First Test	Second Test	Third Test	Average
MSE	539.5	351.0	505.6	465.37
PSNR	20.8 dB	22.7 dB	21.09dB	21.53dB

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Reinforcement Learning for Robot Assisted Live Ultrasound Examination

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. Robotic Ultrasound Scanning System Construction

2.2. Liver Standard Plane Localization

2.3. Construction and Training of the Reinforcement Learning Agent

3. Results

3.1. Three-Dimensional Reconstruction Results of Local Liver Ultrasound

3.2. Image Segmentation and Recognition Results

3.3. Experiment Results of the Reinforcement Learning Agent

4. Discussion

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe