Submitted:
20 February 2025
Posted:
20 February 2025
You are already at the latest version
Abstract
In robot navigation and manipulation, accurately determining the camera’s pose relative to the environment is crucial for effective task execution. In this paper, we systematically prove that this problem corresponds to the Perspective-3-Point (P3P) formulation, where exactly three known 3D points and their corresponding 2D image projections are used to estimate the pose of a stereo camera. In image-based visual servoing (IBVS) control, the system becomes overdetermined, as the 6 degrees of freedom (DoF) of the stereo camera must align with 9 observed 2D features in the scene. When more constraints are imposed than available DoFs, global stability cannot be guaranteed, as the camera may become trapped in a local minimum far from the desired configuration during servoing. To address this issue, we propose a novel control strategy for accurately positioning a calibrated stereo camera. Our approach integrates a feedforward controller with a Youla parame-terization-based feedback controller, ensuring robust servoing performance. Through simulations, we demonstrate that our method effectively avoids local minima and enables the camera to reach the desired pose accurately and efficiently.
Keywords:
1. Introduction
2. System Configuration
3. Proof of P3P for the Stereo Camera System


Prove Concluded
4. Model Development
4.1. Stereo Camera Model
4.2. Robot Manipulator Kinematic Model
4.3. Eye-in-Hand Kinematic Model
4.4. Robot Inverse Kinematic Model
4.5. Robot Dynamic Model
5. Control Policy Diagram
6. Controller Designs
6.1. Inner Joint Angle Control Loop
6.2. Feedforward Control Loop
6.3. Outer Feedback Control Loop
7. Simulations Results
- Scenario 1: Without input disturbances.
- Scenario 2: With a 1° step input disturbance added to each joint of the robot arms for the entire simulation time.
8. Conclusions
Appendix A


| Parameters | Values |
|---|---|
| Length of Link 1: | 495 mm |
| Length of Link 2: | 900 mm |
| Length of Link 3: | 175 mm |
| Length of Link 3: | 960 mm |
| Length of Link 1 offset: | 175 mm |
| Length of Spherical wrist: | 135 mm |
| Tool length (screwdriver): | 127 mm |
| Axis Movement | Working range |
|---|---|
| Axis 1 rotation | +180to -180 |
| Axis 2 arm | +150to -90 |
| Axis 3 arm | +75to -180 |
| Axis 4 wrist | +400to -400 |
| Axis 5 bend | +120to -125 |
| Axis 6 turn | +400to -400 |
| Parameters | Values |
|---|---|
| Focus length: f | 2.8 mm |
| Baseline: B | 120 mm |
| Weight: W | 170g |
| Depth range: | 0.5m-25m |
| Diagonal Sensor Size: | 6mm |
| Sensor Format: | 16:9 |
| Sensor Size: W X H | 5.23mm X 2.94mm |
| Angle of view in width: | 86.09° |
| Angle of view in height: | 55.35 |
| Parameters | Values |
|---|---|
| DC Motor | |
| Armature Resistance: | 0.03 |
| Armature Inductance: | 0.1 mH |
| Back emf Constant: | 7 mv/rpm |
| Torque Constant: | 0.0674 N/A |
| Armature Moment of Inertia: | 0.09847 kg |
| Gear | |
| Gear ratio: | 200:1 |
| Moment of Inertia: | 0.05 kg |
| Damping ratio: | 0.06 |
Appendix B
References
- E. Cai, R. Rossi, and C. Xiao, “Improving learning-based camera pose estimation for image-based augmented reality applications,” in Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 2023, pp. 1-6. [CrossRef]
- Stier, N., Angles, B., Yang, L., Yan, Y., Colburn, A., & Chuang, M. (2023). LivePose: Online 3D Reconstruction from Monocular Video with Dynamic Camera Poses. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
- X. Li and H. Ling, “Hybrid Camera Pose Estimation With Online Partitioning for SLAM,” in IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1453–1460, Apr. 2020. [CrossRef]
- S. S. Jacob and S. S, “A comparative study of pose estimation algorithms for visual navigation in autonomous robots,” International Robotics & Automation Journal, vol. 9, no. 3, pp. 1–7, 2023. [CrossRef]
- T. E. Lee, J. Tremblay, T. To, J. Cheng, T. Mosier, O. Kroemer, D. Fox, and S. Birchfield, “Camera-to-Robot Pose Estimation from a Single Image,” arXiv preprint arXiv:1911.09231, 2020. [Online]. Available: https://arxiv.org/abs/1911.09231.
- Fischler, M. A.; Bolles, R. C. (1981). “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”. Communications of the ACM. 24 (6): 381–395. doi:10.1145/358669.358692. S2CID 972888.
- M. Bujnak, Z. Kukelova and T. Pajdla, “A general solution to the P4P problem for camera with unknown focal length,” 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008, pp. 1-8. [CrossRef]
- Lepetit, V.; Moreno-Noguer, M.; Fua, P. (2009). “EPnP: An Accurate O(n) Solution to the PnP Problem”. International Journal of Computer Vision. 81 (2): 155–166. doi:10.1007/s11263-008-0152-6. hdl:2117/10327. S2CID 207252029.
- Terzakis, George; Lourakis, Manolis (2020). “A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem”. Computer Vision – ECCV 2020. Lecture Notes in Computer Science. Vol. 12346. pp. 478–494. doi:10.1007/978-3-030-58452-8_28. ISBN 978-3-030-58451-1. S2CID 226239551.
- M.N. Alkhatib, A.V. Bobkov, N.M. Zadoroznaya, Camera pose estimation based on structure from motion, Procedia Computer Science, Volume 186, 2021, Pages 146-153, ISSN 1877-0509. [CrossRef]
- J. Wang, Y. Wang, C. Guo, S. Xing and X. Ye, “Fusion of Visual Odometry Information for Enhanced Camera Pose Estimation,” 2023 8th International Conference on Control, Robotics and Cybernetics (CRC), Changsha, China, 2024, pp. 306-309. [CrossRef]
- Naseer T., Burgard W. Deep regression for monocular camera-based 6-dof global localization in outdoor environments 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE (2017), pp. 1525-1530.
- A. Kendall, M. Grimes, R. Cipolla, Posenet: A convolutional network for real-time 6-DOF camera relocalization, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2938–2946.
- A. Kendall, R. Cipolla, Geometric loss functions for camera pose regression with deep learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5974–5983.
- F. Chaumette and S. Hutchinson. Visual servo control. I. Basic approaches. IEEE Robotics & Automation Magazine.2006;13(4):82-90. [CrossRef]
- Y. Ma, X. Liu, J. Zhang, D. Xu, D. Zhang, and W. Wu, “Robotic grasping and alignment for small size components assembly based on visual servoing,” Int. J. Adv. Manuf. Technol., vol. 106, nos. 11–12, pp. 4827–4843, Feb. 2020.
- T. Hao, D. Xu and F. Qin, “Image-Based Visual Servoing for Position Alignment With Orthogonal Binocular Vision,” in IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1-10, 2023, Art no. 5019010. [CrossRef]
- M. Sheckells, G. Garimella and M. Kobilarov, “Optimal Visual Servoing for differentially flat underactuated systems,” 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea (South), 2016, pp. 5541-5548. [CrossRef]
- Wang, Yuanze, et al. “NeRF-IBVS: Visual Servo Based on NeRF for Visual Localization and Navigation.” Advances in Neural Information Processing Systems, 2023.
- Guo K, Cao R, Tian Y, Ji B, Dong X, Li X. Pose and Focal Length Estimation Using Two Vanishing Points with Known Camera Position. Sensors (Basel). 2023 Apr 3;23(7):3694. [CrossRef] [PubMed] [PubMed Central]
- N. R. Gans and S. A. Hutchinson, “Stable Visual Servoing Through Hybrid Switched-System Control,” in IEEE Transactions on Robotics, vol. 23, no. 3, pp. 530-540, June 2007. keywords: {Visual servoing;Cameras;Error correction;Robot vision systems;Gallium nitride;Servosystems;Control. [CrossRef]
- F. Chaumette and E. Malis, “2 1/2 D visual servoing: a possible solution to improve image-based and position-based visual servoings,” Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), San Francisco, CA, USA, 2000, pp. 630-635 vol.1. [CrossRef]
- P. Roque, E. Bin, P. Miraldo and D. V. Dimarogonas, “Fast Model Predictive Image-Based Visual Servoing for Quadrotors,” 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 2020, pp. 7566-7572. [CrossRef]
- Zhu, X., Bai, Y., Yu, C. et al. A new computational approach for optimal control of switched systems. J Inequal Appl 2024, 53 (2024). [CrossRef]
- E. Malis, F. Chaumette, and S. Boudet, “2 1/2 D Visual Servoing,” 1999.
- Z. Ma and J. Su, “Robust uncalibrated visual servoing control based on disturbance observer,” ISA Transactions, vol. 59, pp. 193–204, 2015. [CrossRef]
- Youla, D., Jabr, H., and Bongiorno, J. Modern Wiener-Hopf Design of Optimal Controllers-Part II: The Multivariable Case. IEEE Transactions on Automatic Control.1976; 21(3):319-338. [CrossRef]
- Illingworth J., Kittler J.: A survey of the Hough transform. Comput. Vis. Graph. Image Process. 44(1), 87–116 (1988).
- Denavit, Jacques; Hartenberg, Richard Scheunemann (1955). A kinematic notation for lower-pair mechanisms based on matrices. Trans ASME J. Appl. Mech. .1955;23 (2): 215–221. [CrossRef]
- Anonymous. ABB IRB 4600 -40/2.55 Product Manual [Internet]. 2013. Available from: https://www.manualslib.com/manual/1449302.
- Mark, W.S., M.V. (1989). Robot Dynamics and control. John Wiley & Sons, Inc. 1989.
- F. Assadian and K. Mallon “Robust Control: Youla Parameterization Approach” John Wiley & Sons, Ltd., 2022, ch. 10, pp. 217-246. [CrossRef]
- Anonymous. Stereolabs Docs: API Reference, Tutorials, and Integration. Available from: https://www.stereolabs.com/docs [Accessed: 2024-7-18].
- Patidar, P., Gupta, M., Srivastava, S., & Nagawat, A.K. Image De-noising by Various Filters for Different Noise. International Journal of Computer Applications. 2010; 9: 45-50.
- R. Zhao and H. Cui. Improved threshold denoising method based on wavelet transform. 2015 7th International Conference on Modelling, Identification and Control (ICMIC); 2015: pp. 1-4. [CrossRef]
- Ng J., Goldberger J.J. Signal Averaging for Noise Reduction. In: Goldberger J., Ng J. (eds) Practical Signal and Image Processing in Clinical Cardiology. London: Springer; 2010. p. 69-77. [CrossRef]
- Zhou, K., Doyle, J. C., & Glover, K. (1996). Robust and Optimal Control. Prentice Hall.














| Format | Camera Pose | Robot JointAngles | ||
| Where , and are Yaw, Pitch, and Roll, and (in meters) is the position, measured in inertial frame {O}. | Where [] in (degrees) are robot joint angles. | |||
| Camera Pose | Robot Joint Angles |
Camera Pose | Robot Joint Angles |
|
| Initial State | ||||
| Final State | ||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).