Submitted:
07 March 2025
Posted:
10 March 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
2.1. Principle of PET Scan Imaging
2.2. Image Information Acquisition
2.3. Detection and Recognition of Head Feature Points
2.4. Space Motion Monitoring of Rigid-Like Objects
3. Research Methods
3.1. Camera Registration and Image Acquisition
3.2. Head Feature Points Recognition
- Use the HOG-based cascaded classifier to extract all feature vectors, including HOG features, from patient images;
- Input the extracted feature vectors into the SVM model inherited from the CPP-DLIB library to classify and extract features around the facial region, thereby identifying and annotating the location of the face in the image;
- Pass the annotated facial region as input to the DLIB 68-point alignment model to achieve real-time detection of 68 facial landmarks. The alignment standard, shown in Figure 3, includes key regions such as facial contours, eyes, eyebrows, nasal triangle, and mouth.
- In the detected video stream, the landmark information of each frame is recorded in real time. A filtering algorithm is applied to extract 12 robust landmarks, typically located in regions such as the nasal triangle and eye corners. These selected landmarks are used as input data for the spatial pose estimation algorithm to achieve precise motion state evaluation.
3.3. Fusion Registration of RGB Images and Depth Images
3.4. Calculation of Head Space Pose and Exercise Intensity Estimation
- Feature Point Occlusion: During head rotation, when the yaw angle becomes large, some facial feature points may move out of the depth camera’s view. To solve this problem, a feature point compensation method is proposed. When a feature point (such as the outer corner of the eye) experiences significant fluctuations in its spatial pose across consecutive frames, it is determined that the feature point is no longer suitable for input into the spatial pose estimation algorithm. Other stable feature points are then used to supplement the calculation, ensuring the continuity and accuracy of the spatial pose estimation.
- Outliers in Motion Synthesis: When synthesizing the motion intensity curve from consecutive frames, outliers may occur, causing the curve to exhibit abnormal fluctuations. To address this issue, a low-pass filtering method is applied to smooth the calculated motion intensity curve, eliminating noise interference. This results in a stable and continuous motion intensity curve, which is then used for motion pattern classification.
3.5. Interaction Software Design
- Image Registration Module: This module receives RGB and depth images, fuses them using visualization methods, and outputs the combined image to the visualization module. The depth image is stored on the local server, and the RGB image is sent to the feature point recognition module. The module returns the 2D coordinates of facial feature points, which are then used for 3D reconstruction, and the 3D information is input into the pose estimation module.
- Feature Point Recognition Module: This module processes the RGB images provided by the image registration module using deep learning networks to detect facial feature points and return their 2D coordinates.
- Pose Estimation Module: This module receives the 3D feature point data, applies rigid body motion algorithms to calculate the head movement amplitude, and outputs the result to the local server. It also plots the corresponding movement amplitude curve, which is displayed in the visualization module.
- Visualization Module: This module receives the RGB and fused images from the image registration module, as well as the movement amplitude curve from the pose estimation module. It displays them on the interface and receives user input to control the operation of the system.
4. Experiments
4.1. Experiments Setup
4.1.1. Datasets
4.1.2. Exercise Indicators
4.2. Performance
4.2.1. Phantom Experiments
4.2.2. Volunteer Experiments
4.3. Cost Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Informed Consent Statement
Conflicts of Interest
References
- van der Meulen, N. P.; Strobel, K.; Lima, T. V. M. New radionuclides and technological advances in SPECT and PET scanners. Cancers. 2021, 13, 24–6183. [Google Scholar]
- Catana, C. Development of dedicated brain PET imaging devices: recent advances and future perspectives. J. Nucl. Med. 2019, 60, 8–1044. [Google Scholar]
- Filippi, L.; Dimitrakopoulou-Strauss, A.; Evangelista, L.; Schillaci, O. Long axial field-of-view PET/CT devices: are we ready for the technological revolution? Expert Rev. Med. Devices 2022, 19, 10–739. [Google Scholar]
- Berger-Tal, O.; Blumstein, D. T.; Swaisgood, R. R. Conservation translocations: a review of common difficulties and promising directions. Animal Conserv. 2020, 23, 2–121. [Google Scholar]
- Surti, S.; Pantel, A. R.; Karp, J. S. Total body PET: why, how, what for? IEEE Trans. Radiat. Plasma Med. Sci. 2020, 4, 3–283. [Google Scholar]
- Zhang, S.; Wang, X.; Gao, X.; Chen, X.; Li, L.; Li, G.; Liu, C.; Miao, Y.; Wang, R.; Hu, K. Radiopharmaceuticals and their applications in medicine. Signal Transduct. Target. Ther. 2025, 10, 1. [Google Scholar]
- Kyme, A. Z.; Fulton, R. R. Motion estimation and correction in SPECT, PET and CT. Phys. Med. Biol. 2021, 66, 18–18TR02. [Google Scholar]
- Zeng, T.; Zhang, J.; Lieffrig, E. V.; Cai, Z.; Chen, F.; You, C.; Naganawa, M.; Lu, Y.; Onofrey, J. A. Fast Reconstruction for Deep Learning PET Head Motion Correction. Int. Conf. Med. Image Comput. Comput.-Assisted Interv.
- Spangler-Bickell, M. G.; Khalighi, M. M.; Hoo, C.; et al. Rigid motion correction for brain PET/MR imaging using optical tracking. IEEE Trans. Radiat. Plasma Med. Sci. 2018, 3, 4–498. [Google Scholar]
- Henry, D.; Fulton, R.; Maclaren, J.; et al. Close-range feature-based head motion tracking for MRI and PET-MRI. Proceedings of the 2018 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC).
- Borowska-Terka, A.; Strumillo, P. Person independent recognition of head gestures from parametrised and raw signals recorded from inertial measurement unit. Appl. Sci. 2020, 10, 12–4213. [Google Scholar] [CrossRef]
- Han, H.; Jang, H.; Yoon, S. W. Novel wearable monitoring system of forward head posture assisted by magnet-magnetometer pair and machine learning. IEEE Sens. J. 2019, 20, 7–3838. [Google Scholar]
- Chatzidimitriadis, S.; Bafti, S. M.; Sirlantzis, K. Non-intrusive head movement control for powered wheelchairs: A vision-based approach. IEEE Access 2023, 11, 65663–65674. [Google Scholar]
- Elmoujarkach, E.; Seeger, S.; Möller, N.; Schmidt, C.; Rafecas, M. Development and characterization of 3D printed radioactive phantoms for high resolution PET. 2022 IEEE Nuclear Sci. Symp. Med. Imaging Conf. (NSS/MIC).
- Pratt, E. C.; Lopez-Montes, A.; Volpe, A.; Crowley, M. J.; Carter, L. M.; Mittal, V.; Pillarsetty, N.; et al. Simultaneous quantitative imaging of two PET radiotracers via the detection of positron–electron annihilation and prompt gamma emissions. Nat. Biomed. Eng. 2023, 7, 1028–1039. [Google Scholar]
- Brenner, M.; Reyes, N. H.; Susnjak, T.; Barczak, A. L. C. RGB-D and thermal sensor fusion: A systematic literature review. IEEE Access 2023, 11, 82410–82442. [Google Scholar]
- Zanfir, A.; Marinoiu, E.; Sminchisescu, C. Monocular 3D pose and shape estimation of multiple people in natural scenes—the importance of multiple scene constraints. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2148. [Google Scholar]
- Sun, C.; Gu, D.; Lu, X. Three-dimensional structural displacement measurement using monocular vision and deep learning based pose estimation. Mech. Syst. Signal Process. 2023, 190, 110141. [Google Scholar]
- Shao, S.; Pei, Z.; Wu, X.; Liu, Z.; Chen, W.; Li, Z. Iebins: Iterative elastic bins for monocular depth estimation. Adv. Neural Inf. Process. Syst. 2023, 36, 53025–53037. [Google Scholar]
- Mansour, M.; Davidson, P.; Stepanov, O.; Piché, R. Relative importance of binocular disparity and motion parallax for depth estimation: a computer vision approach. Remote Sens. 2019, 11, 17–1990. [Google Scholar] [CrossRef]
- Liu, L.; Liu, Y.; Lv, Y.; Li, X. A novel approach for simultaneous localization and dense mapping based on binocular vision in forest ecological environment. Forests 2024, 15, 1–147. [Google Scholar] [CrossRef]
- Paredes, A. L.; Song, Q.; Heredia Conde, M. Performance evaluation of state-of-the-art high-resolution time-of-flight cameras. IEEE Sens. J. 2023, 23, 13711–13727. [Google Scholar]
- Zhu, J.; Gao, C.; Sun, Q.; Wang, M.; Deng, Z. A Survey of Indoor 3D Reconstruction Based on RGB-D Cameras. IEEE Access 2024. [Google Scholar]
- Nguyen, A.-H.; Ly, K. L.; Lam, V. K.; Wang, Z. Generalized fringe-to-phase framework for single-shot 3D reconstruction integrating structured light with deep learning. Sensors 2023, 23, 4209. [Google Scholar] [CrossRef]
- Zhu, X.; Han, Z.; Song, L.; Wang, H.; Wu, Z. Wavelet based deep learning for depth estimation from single fringe pattern of fringe projection profilometry. Optoelectron. Lett. 2022, 18, 699–704. [Google Scholar] [CrossRef]
- Kareem, O. S. Face mask detection using haar cascades classifier to reduce the risk of COVID-19. Int. J. Math. Stat. Comput. Sci. 2024, 2, 19–27. [Google Scholar]
- Arooj, S.; Altaf, S.; Ahmad, S.; Mahmoud, H.; Mohamed, A. S. N. Enhancing sign language recognition using CNN and SIFT: A case study on Pakistan sign language. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 101934. [Google Scholar] [CrossRef]
- Bakheet, S.; Al-Hamadi, A.; Youssef, R. A fingerprint-based verification framework using Harris and SURF feature detection algorithms. Appl. Sci. 2022, 12, 2028. [Google Scholar] [CrossRef]
- Meng, Z.; Kong, X.; Meng, L.; Tomiyama, H. Lucas-Kanade Optical Flow Based Camera Motion Estimation Approach. 2019 International SoC Design Conference (ISOCC), 78. [CrossRef]
- Shakir, S.; Rambli, D. R. A.; Mirjalili, S. Vision-based human detection techniques: a descriptive review. IEEE Access 2021, 9, 42724–42761. [Google Scholar]
- Khodarahmi, M.; Maihami, V. A review on Kalman filter models. Arch. Comput. Methods Eng. 2023, 30, 727–747. [Google Scholar] [CrossRef]
- Chicco, D. Siamese neural networks: An overview. In Artificial Neural Networks; 2021, pp. 73–94.
- Fang, H.; Liao, G.; Liu, Y.; Zeng, C. Siam-sort: Multi-target tracking in video SAR based on tracking by detection and Siamese network. Remote Sens. 2023, 15, 146. [Google Scholar] [CrossRef]
- Wang, Y.; Kuang, B.; Durazo, I.; Zhao, Y. 3D Reconstruction of Rail Tracks based on Fusion of RGB and Infrared Sensors. 2024 29th Int. Conf. Autom. Comput.
- Kalenberg, K.; Müller, H.; Polonelli, T.; Schiaffino, A.; Niculescu, V.; Cioflan, C.; Magno, M.; Benini, L. Stargate: Multimodal sensor fusion for autonomous navigation on miniaturized UAVs. IEEE Internet Things J. 2024, 11, 21372–21390. [Google Scholar] [CrossRef]
- Ganga, B.; Lata, B. T.; Venugopal, K. R. Object detection and crowd analysis using deep learning techniques: Comprehensive review and future directions. Neurocomputing 2024, 127932. [Google Scholar] [CrossRef]
- Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena 2020, 404, 132306. [Google Scholar] [CrossRef]
- Aydın, M. T.; Menemencioğlu, O.; Orak, İ. M. Face Recognition Approach by Using Dlib and K-NN. Curr. Trends Comput. 2024, 1, 93–103. [Google Scholar]
- Prados-Torreblanca, A.; Buenaposada, J. M.; Baumela, L. Shape preserving facial landmarks with graph attention networks. arXiv Preprint 2022, arXiv:2210.07233. [Google Scholar]
- Bai, Q.; Shehata, M.; Nada, A. Review study of using Euler angles and Euler parameters in multibody modeling of spatial holonomic and non-holonomic systems. Int. J. Dyn. Control 2022, 10, 1707–1725. [Google Scholar]
- Kimathi, S.; Lantos, B. Simultaneous attitude and position tracking using dual quaternion parameterized dynamics. 2024 IEEE 22nd World Symp. Appl. Mach. Intell. Informatics (SAMI), 0003. [Google Scholar]
- Zhong, F.; Liu, G.; Lu, Z.; Han, Y.; Liu, F.; Ye, T. Inverse Kinematics Analysis of Humanoid Robot Arm by Fusing Denavit–Hartenberg and Screw Theory to Imitate Human Motion With Kinect. IEEE Access 2023, 11, 67126–67139. [Google Scholar]
- Zingoni, A.; Diani, M.; Corsini, G. Tutorial: Dealing with rotation matrices and translation vectors in image-based applications: A tutorial. IEEE Aerospace Electron. Syst. Mag. 2019, 34, 2–38. [Google Scholar]
- Aksoy, M.; Forman, C.; Straka, M.; Skare, S.; Holdsworth, S.; Hornegger, J.; Bammer, R. Real-time optical motion correction for diffusion tensor imaging. Magn. Reson. Med. 2011, 66, 2–366. [Google Scholar]










| System Mode | Module | Mean (ms) | Standard Deviation (ms) |
|---|---|---|---|
| Real-time Mode | Image Acquisition Module | 21.3082 | 6.6335 |
| Depth Registration Module | 32.0709 | 2.0099 | |
| Feature Point Recognition Module | 0.0171 | 0.0074 | |
| Pose Calculation Module | 0.0274 | 0.0177 | |
| Total Time | 53.4236 | 6.9313 | |
| Image-read Mode | Total Time | 35.1104 | 3.0520 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).