Submitted:
20 March 2025
Posted:
21 March 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- design of a VS framework utilizing an early fusion CNN-based method that employs feature points to access low-level pixel information. This design comprehensively describes the initial and final scenes for all neural inputs.
- construction of a large dataset for end-to-end VS control using a Universal Robot manipulator (UR5) and implementation of the network in a real robot as an extension of [24]
- implementation of the Deep Learning-based Visual Servoing control law in a real system using a UR5 robot
2. Hybrid Deep Learning Visual Servoing
2.1. Feature Points Maps
- feature points can be considered as low-level information, as they pinpoint exact regions in the image where one or more objects of interest appear
- segmented regions can be considered as mid-level information, offering a broader understanding of an object’s location and attributes by partitioning the image into distinct areas
- image moments can be considered high-level information by summarizing the distribution of pixel intensities within an image. This summary can estimate the object’s pose and separate linear from angular camera velocities.
| Algorithm 1:Generation of Feature Point Maps Using SURF |
Require: An RGB image I (e.g., or )
|
2.2. CNN-Based Visual Servoing Control
- , an RGB image representing the initial scene configuration.
- , an RGB image representing the desired scene configuration.
- and , each of size , which are additional feature maps derived from and , respectively. These maps provide supplementary information that can enhance the learning process.
3. Dataset Configuration
| Algorithm 2:Synthetic Dataset Generation |
|
| Algorithm 3 Pose-to-Velocity Computation |
![]() |
4. Experimental Results
4.1. Training Setup
4.2. Offline Results
4.3. Online Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Hutchinson, S.; Hager, G.D.; Corke, P.I. A tutorial on visual servo control. IEEE transactions on robotics and automation 1996, 12, 651–670. [Google Scholar] [CrossRef]
- Chaumette, F.; Hutchinson, S. Visual servo control Part I: Basic approaches. IEEE Robotics & Automation Magazine 2006, 13, 82–90. [Google Scholar]
- Chaumette, F.; Hutchinson, S. Visual servo controlPart II: Advanced approaches. IEEE Robotics & Automation Magazine 2007, 14, 109–118. [Google Scholar]
- Chaumette, F.; Hutchinson, S.; Corke, P. Visual Servoing. In Handbook of Robotics; Springer; pp. 841–867.
- Wilson, W.J.; Hulls, C.W.; Bell, G.S. Relative end-effector control using cartesian position based visual servoing. IEEE Transactions on Robotics and Automation 1996, 12, 684–696. [Google Scholar] [CrossRef]
- Kelly, R. Robust asymptotically stable visual servoing of planar robots. IEEE Transactions on Robotics and Automation 1996, 12, 759–766. [Google Scholar] [CrossRef]
- Haviland, J.; Dayoub, F.; Corke, P. Control of the final-phase of closed-loop visual grasping using image-based visual servoing. arXiv 2020, arXiv:2001.05650. [Google Scholar]
- Saxena, A., Pandya; Kumar, G., Gaud. Exploring convolutional networks for end-to-end visual servoing. In Proceedings of the Proc. IEEE International Conference on Robotics and Automation, Singapore; 2017; pp. 3817–3823. [Google Scholar]
- Bateux, Q.; Marchand, E.; Leitner, J.; Chaumette, F.; Corke, P. Training deep neural networks for visual servoing. In Proceedings of the Proc. IEEE International Conference on Robotics and Automation, Brisbane; 2018; pp. 1–8. [Google Scholar]
- Tokuda, F.; Arai, S.; Kosuge, K. Convolutional neural network based visual servoing for eye-to-hand manipulator. IEEE Access 2021, 9, 91820–91835. [Google Scholar] [CrossRef]
- Ribeiro, E.; Mendes, R.; Grassi, V. Real-time deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation. Elsevier’s Robotics and Autonomous Systems 2021, 139, 103757. [Google Scholar] [CrossRef]
- Mateus, A.; Tahri, O.; Miraldo, P. Active structure-from-motion for 3d straight lines. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2018; pp. 5819–5825. [Google Scholar]
- Bista, S.R.; Giordano, P.R.; Chaumette, F. Combining line segments and points for appearance-based indoor navigation by image based visual servoing. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2017; pp. 2960–2967. [Google Scholar]
- Azizian, M.; Khoshnam, M.; Najmaei, N.; Patel, R.V. Visual servoing in medical robotics: a survey. Part I: endoscopic and direct vision imaging–techniques and applications. The international journal of medical robotics and computer assisted surgery 2014, 10, 263–274. [Google Scholar] [CrossRef] [PubMed]
- Mathiassen, K.; Glette, K.; Elle, O.J. Visual servoing of a medical ultrasound probe for needle insertion. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2016; pp. 3426–3433. [Google Scholar]
- Zettinig, O.; Frisch, B.; Virga, S.; Esposito, M.; Rienmüller, A.; Meyer, B.; Hennersperger, C.; Ryang, Y.M.; Navab, N. 3D ultrasound registration-based visual servoing for neurosurgical navigation. International journal of computer assisted radiology and surgery 2017, 12, 1607–1619. [Google Scholar] [CrossRef] [PubMed]
- Hashimoto, K.; Kimoto, T.; Ebine, T.; Kimura, H. Manipulator control with image-based visual servo. In Proceedings of the Proceedings. 1991 IEEE International Conference on Robotics and Automation. IEEE Computer Society; 1991; pp. 2267–2268. [Google Scholar]
- Thuilot, B.; Martinet, P.; Cordesses, L.; Gallice, J. Position based visual servoing: keeping the object in the field of vision. In Proceedings of the Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292). IEEE, Vol. 2; 2002; pp. 1624–1629. [Google Scholar]
- Chen, S.; Wen, J.T. Industrial robot trajectory tracking control using multi-layer neural networks trained by iterative learning control. Robotics 2021, 10, 50. [Google Scholar] [CrossRef]
- A.Krizhevsky.; I.Sutskever.; G.E.Hinton. ImageNet Classification with Deep Convolutional Neural Networks. Advances in neural information processing systems 2012, pp. 1097–1105.
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations 2015. [Google Scholar]
- Dosovitsky, A.; Fischery, P.; Ilg, E.; Hazirbas, C.; Golkov, V.; van der Smagt, P.; Cremers, D.; Brox, T. Flownet: Learning optical flow with convolutional networks. International Conference on Computer Vision 2015, 2758–2766. [Google Scholar]
- He, Y.; Gao, J.; Chen, Y. Deep learning-based pose prediction for visual servoing of robotic manipulators using image similarity. Neurocomputing 2022, 491, 343–352. [Google Scholar] [CrossRef]
- Botezatu, A.P.; Ferariu, L.E.; Burlacu, A. Enhancing Visual Feedback Control through Early Fusion Deep Learning. Entropy 2023, 25, 1378. [Google Scholar] [CrossRef] [PubMed]
- Shademan, A.; Janabi-Sharifi, F. Using scale-invariant feature points in visual servoing. In Proceedings of the Machine Vision and its Optomechatronic Applications. SPIE, 2004, Vol. 5603, pp. 63–70.
- La Anh, T.; Song, J.B. Robotic grasping based on efficient tracking and visual servoing using local feature descriptors. International Journal of Precision Engineering and Manufacturing 2012, 13, 387–393. [Google Scholar] [CrossRef]
- Bay, H.; Ess, A.; Tuytelaars, T.; Gool, L.V. SURF: Speeded Up Robust Features. In Proceedings of the Computer Vision and Image Understanding Journal, 2008.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- Dubrofsky, E. Homography estimation. Diplomová práce. Vancouver: Univerzita Britské Kolumbie 2009, 5. [Google Scholar]
- Agarwal, A.; Jawahar, C.; Narayanan, P. A survey of planar homography estimation techniques. Centre for Visual Information Technology, Tech. Rep. IIIT/TR/2005/12 2005. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Deep image homography estimation. arXiv 2016, arXiv:1606.03798. [Google Scholar]










| Parameter | Value and Information |
|---|---|
| Training optimizer | Adam |
| Mini-Batch Size | 256 ×number of GPUs |
| GPU Setup | 3 x NVIDIA A100 Tensor Core GPU with 40 GB |
| Number of Epochs | 100 |
| Initial Learning Rate | |
| L2 Regularization | |
| Learning Rate Schedule | Piecewise |
| Learning Rate Drop Factor | 0.1 |
| Learning Rate Drop Period | 30 |
| Validation Frequency | |
| Validation Patience | 10 |
| Loss function | Root Mean Squared Error |
| CNN | [m/s ] | [m/s ] | [m/s ] | [rad/s ] | [rad/s ] | [rad/s ] |
|---|---|---|---|---|---|---|
| Testing | ||||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
