Submitted:
15 January 2025
Posted:
15 January 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Lack of real-time performance: Special personnel are still required to carry out special measurement work regularly. The measurement frequency is low, and the processing time is relatively long, so it is impossible to reflect the dynamic changes of the construction site in real-time. When facing complex construction site environments, these methods often struggle to meet the needs of efficient and precise management.
- High-cost: The equipment costs required for some solutions are relatively high. In particular, the capital investment required for a terrestrial laser scanner (TLS) exceeds $50,000, and the minimum cost of professional mapping drones (PMDs) is $10,000 [21]. This is still a large investment for most projects, which also limits the popularization of 3D data collection on construction sites.
- Size problem: For monocular depth estimation, due to the size ambiguity of monocular vision [36], although relative depth can be well estimated through a large amount of training, there are still difficulties in absolute depth estimation. The size benchmark for binocular 3D reconstruction is mainly based on the distance between the two cameras, also known as the baseline [33]. For commonly used binocular cameras currently, the baseline range is generally between a few centimeters and 1m, which can only be applied to indoor and short-distance scene reconstruction and cannot be directly applied to large-scale construction sites. In addition, binocular cameras generally require the two cameras to be in the same direction and horizontally aligned to facilitate the calculation of disparity, which makes the layout of surveillance cameras on site more complicated.
- Coordinate problem: The results obtained from 3D reconstruction methods based on images still need to be aligned to the real space. Usually, coordinate transformation is carried out through the calibrated camera internal and external parameters, which puts forward very high requirements for the accuracy of camera calibration. It is difficult to apply to non-fixed cameras, such as surveillance cameras that can be rotated and scaled. In addition, currently, the relevant research on 3D reconstruction based on images in the construction field has few verifications of real-space coordinates. For example, depth estimation mainly focuses on the accuracy of depth datasets, and binocular estimation focuses on the accuracy within the reconstruction range. There is a lack of a general method for the coordinate transformation of the reconstruction results.
- Difficult to deploy directly: Existing 3D reconstruction methods have high requirements for the deployment of acquisition equipment and parameter acquisition. For example, accurate calibration of camera internal and external parameters is a necessary step to convert depth-estimation results into 3D data. For binocular cameras, it is also necessary to adjust to parallel views in the same direction as much as possible. For the already-deployed surveillance camera systems, obtaining relevant accurate data still requires systematic measurement methods and corresponding adjustments.
- A hierarchical framework for point cloud estimation and point cloud registration based on learning is introduced to conduct 3D reconstruction of construction sites. Compared with methods such as those based on LiDAR and SfM, it has lower costs, a higher level of automation, can perform continuous reconstruction, is easier to implement, and does not require the annotation of internal and external parameters of monitoring equipment. Moreover, it does not need new data for transfer learning and can be directly deployed on-site.
- A point cloud registration method based on the results of 2D feature matching is proposed, which can conduct camera registration in the actual space, point cloud dimension restoration, and coordinate system registration. In terms of overall accuracy, it is superior to conventional point cloud registration methods. Compared with mainstream point cloud registration methods, the registration method proposed in this study has better robustness and adaptability.
- The proposed method has been subjected to extensive experiments on construction site datasets. The results show that the proposed 3D reconstruction framework is more efficient in terms of computational efficiency than mainstream reconstruction methods based on SfM-MVS, and the point cloud registration accuracy is higher than that of other point cloud registration methods, which can be further used for downstream tasks.
2. Methods
2.1. The Structure of the Framework
- Sparse point cloud reconstruction of Unmanned Aerial Vehicle (UAV) images: Employ the SfM method to conduct the reconstruction of sparse point cloud so as to obtain the spatial information of the scene;
- Initial point cloud estimation of surveillance images: Through the DUSt3R [37] model, take the images collected by surveillance cameras as inputs to obtain the initial point cloud. At this time, the point cloud is still in local coordinates and with estimated dimensions;
- Dimension restoration and coordinate restoration of the point cloud: Obtain the 3D point cloud with real dimensions and unify it into the same coordinate system. Conduct 2D-2D matching on sequence images and UAV images. Through the respective 2D-3D correspondence situations of the two images, map the matching results to the matching between 3D point clouds. Based on the situation of matching points, calculate the scaling coefficient, and calculate the rotation and translation matrix based on the RANSAC (Random Sample Consensus) [38] algorithm to register the point clouds.
2.2. Sparse Point Cloud Reconstruction
2.3. Initial Point Cloud Estimation
2.4.3. D point cloud Alignment Based on 2D Features (2DFM-RANSAC)
2.4.1. 2D Feature Point Matching
2.4.2. 3D Feature Point Matching
2.4.3. Point Cloud Scaling and Alignment
3. Experiment
3.1. Experiments Setup
Evaluation Metrics
4. Results and Discussion
4.1. Performance of 2DFM-RANSAC
4.2. Computational Efficiency
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Amiri, A.F.; Bausman, D. The Internationalization of Construction Industry-A Global Perspective. International Journal of Engineering Science Invention (IJESI) 2018, 7, 59–68. [Google Scholar]
- Rao, A.S.; Radanovic, M.; Liu, Y.; Hu, S.; Fang, Y.; Khoshelham, K.; Palaniswami, M.; Ngo, T. Real-Time Monitoring of Construction Sites: Sensors, Methods, and Applications. Automation in Construction 2022, 136, 104099. [Google Scholar] [CrossRef]
- Keyvanfar, A.; Shafaghat, A.; Rosley, M.S. Performance Comparison Analysis of 3D Reconstruction Modeling Software in Construction Site Visualization and Mapping. International Journal of Architectural Computing 2022, 20, 453–475. [Google Scholar] [CrossRef]
- Corradetti, A.; Seers, T.; Mercuri, M.; Calligaris, C.; Busetti, A.; Zini, L. Benchmarking Different SfM-MVS Photogrammetric and iOS LiDAR Acquisition Methods for the Digital Preservation of a Short-Lived Excavation: A Case Study from an Area of Sinkhole Related Subsidence. Remote Sensing 2022, 14, 5187. [Google Scholar] [CrossRef]
- Zhao, S.; Kang, F.; Li, J.; Ma, C. Structural Health Monitoring and Inspection of Dams Based on UAV Photogrammetry with Image 3D Reconstruction. Automation in Construction 2021, 130, 103832. [Google Scholar] [CrossRef]
- Braun, A.; Tuttas, S.; Borrmann, A.; Stilla, U. Improving Progress Monitoring by Fusing Point Clouds, Semantic Data and Computer Vision. Automation in Construction 2020, 116, 103210. [Google Scholar] [CrossRef]
- Zhao, S.; Kang, F.; Li, J. Concrete Dam Damage Detection and Localisation Based on YOLOv5s-HSC and Photogrammetric 3D Reconstruction. Automation in Construction 2022, 143, 104555. [Google Scholar] [CrossRef]
- Xiao, J.-L.; Fan, J.-S.; Liu, Y.-F.; Li, B.-L.; Nie, J.-G. Region of Interest (ROI) Extraction and Crack Detection for UAV-Based Bridge Inspection Using Point Cloud Segmentation and 3D-to-2D Projection. Automation in Construction 2024, 158, 105226. [Google Scholar] [CrossRef]
- El-Omari, S.; Moselhi, O. Integrating 3D Laser Scanning and Photogrammetry for Progress Measurement of Construction Work. Automation in Construction 2008, 18, 1–9. [Google Scholar] [CrossRef]
- El-Omari, S.; Moselhi, O. Integrating Automated Data Acquisition Technologies for Progress Reporting of Construction Projects. Automation in Construction 2011, 20, 699–705. [Google Scholar] [CrossRef]
- Xiong, X.; Adan, A.; Akinci, B.; Huber, D. Automatic Creation of Semantically Rich 3D Building Models from Laser Scanner Data. Automation in Construction 2013, 31, 325–337. [Google Scholar] [CrossRef]
- Liu, T.; Wang, N.; Fu, Q.; Zhang, Y.; Wang, M. Research on 3D Reconstruction Method Based on Laser Rotation Scanning. In Proceedings of the 2019 IEEE International Conference on Mechatronics and Automation (ICMA); August 2019; pp. 1600–1604. [Google Scholar] [CrossRef]
- Hosamo, H.H.; Hosamo, M.H. Digital Twin Technology for Bridge Maintenance Using 3D Laser Scanning: A Review. Advances in Civil Engineering 2022, 2022, 2194949. [Google Scholar] [CrossRef]
- Li, J.; Peng, Y.; Tang, Z.; Li, Z. Three-Dimensional Reconstruction of Railway Bridges Based on Unmanned Aerial Vehicle–Terrestrial Laser Scanner Point Cloud Fusion. Buildings 2023, 13, 2841. [Google Scholar] [CrossRef]
- Piekarczuk, A.; Mazurek, A.; Szer, J.; Szer, I. A Case Study of 3D Scanning Techniques in Civil Engineering Using the Terrestrial Laser Scanning Technique. Buildings 2024, 14, 3703. [Google Scholar] [CrossRef]
- Lagüela, S.; Dorado, I.; Gesto, M.; Arias, P.; González-Aguilera, D.; Lorenzo, H. Behavior Analysis of Novel Wearable Indoor Mapping System Based on 3D-SLAM. Sensors 2018, 18, 766. [Google Scholar] [CrossRef] [PubMed]
- Xu, L.; Feng, C.; Kamat, V.R.; Menassa, C.C. A Scene-Adaptive Descriptor for Visual SLAM-Based Locating Applications in Built Environments. Automation in Construction 2020, 112, 103067. [Google Scholar] [CrossRef]
- Lu, T.; Tervola, S.; Lü, X.; Kibert, C.J.; Zhang, Q.; Li, T.; Yao, Z. A Novel Methodology for the Path Alignment of Visual SLAM in Indoor Construction Inspection. Automation in Construction 2021, 127, 103723. [Google Scholar] [CrossRef]
- Feng, C.-Q.; Li, B.-L.; Liu, Y.-F.; Zhang, F.; Yue, Y.; Fan, J.-S. Crack Assessment Using Multi-Sensor Fusion Simultaneous Localization and Mapping (SLAM) and Image Super-Resolution for Bridge Inspection. Automation in Construction 2023, 155, 105047. [Google Scholar] [CrossRef]
- Yarovoi, A.; Cho, Y.K. Review of Simultaneous Localization and Mapping (SLAM) for Construction Robotics Applications. Automation in Construction 2024, 162, 105344. [Google Scholar] [CrossRef]
- Zhang, N.; Lan, X. Everyday-Carry Equipment Mapping: A Portable and Low-Cost Method for 3D Digital Documentation of Architectural Heritage by Integrated iPhone and Microdrone. Buildings 2025, 15, 89. [Google Scholar] [CrossRef]
- Li, Z.; Chen, Z.; Liu, X.; Jiang, J. DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation. Mach. Intell. Res. 2023. [Google Scholar] [CrossRef]
- Bhat, S.F.; Birkl, R.; Wofk, D.; Wonka, P.; Müller, M. ZoeDepth: Zero-Shot Transfer by Combining Relative and Metric Depth. arXiv 2023. [Google Scholar] [CrossRef]
- Yang, L.; Kang, B.; Huang, Z.; Xu, X.; Feng, J.; Zhao, H. Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); June 2024; pp. 10371–10381. [Google Scholar] [CrossRef]
- Shen, J.; Yan, W.; Qin, S.; Zheng, X. A Self-Supervised Monocular Depth Estimation Model with Scale Recovery and Transfer Learning for Construction Scene Analysis. Computer-Aided Civil and Infrastructure Engineering 2022, 38, 1142–1161. [Google Scholar] [CrossRef]
- Shen, J.; Jiao, L.; Zhang, C.; Peng, K. Monocular 3D Object Detection for Construction Scene Analysis. Computer-Aided Civil and Infrastructure Engineering 2024, 39, 1370–1389. [Google Scholar] [CrossRef]
- Kim, D.; Liu, M.; Lee, S.; Kamat, V.R. Remote Proximity Monitoring between Mobile Construction Resources Using Camera-Mounted UAVs. Automation in Construction 2019, 99, 168–182. [Google Scholar] [CrossRef]
- Xu, B.; Liu, C. A 3D Reconstruction Method for Buildings Based on Monocular Vision. Computer-Aided Civil and Infrastructure Engineering 2022, 37, 354–369. [Google Scholar] [CrossRef]
- Qiu, W.-X.; Han, J.-Y.; Chen, A.Y. Measuring In-Building Spatial-Temporal Human Distribution through Monocular Image Data Considering Deep Learning–Based Image Depth Estimation. Journal of Computing in Civil Engineering 2021, 35, 04021014. [Google Scholar] [CrossRef]
- Pfitzner, F.; Braun, A.; Borrmann, A. From Data to Knowledge: Construction Process Analysis through Continuous Image Capturing, Object Detection, and Knowledge Graph Creation. Automation in Construction 2024, 164, 105451. [Google Scholar] [CrossRef]
- Zhong, W.; Dong, X. Camera Calibration Method of Binocular Stereo Vision Based on OpenCV. In Proceedings of the AOPC 2015: Image Processing and Analysis; SPIE, October 8 2015; Vol. 9675; pp. 571–576. [CrossRef]
- Liu, Y.; Li, C.; Gong, J. An Object Reconstruction Method Based on Binocular Stereo Vision. In Proceedings of the Intelligent Robotics and Applications; Springer International Publishing: Cham, 2017; pp. 486–495. [Google Scholar] [CrossRef]
- Lin, X.; Wang, J.; Lin, C. Research on 3D Reconstruction in Binocular Stereo Vision Based on Feature Point Matching Method. In Proceedings of the 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE); September 2020; pp. 551–556. [Google Scholar] [CrossRef]
- Sung, C.; Kim, P.Y. 3D Terrain Reconstruction of Construction Sites Using a Stereo Camera. Automation in Construction 2016, 64, 65–77. [Google Scholar] [CrossRef]
- He, J.; Li, P.; An, X.; Wang, C. A Reconstruction Methodology of Dynamic Construction Site Activities in 3D Digital Twin Models Based on Camera Information. Buildings 2024, 14, 2113. [Google Scholar] [CrossRef]
- Yin, W.; Zhang, C.; Chen, H.; Cai, Z.; Yu, G.; Wang, K.; Chen, X.; Shen, C. Metric3D: Towards Zero-Shot Metric 3D Prediction from A Single Image. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV); October 2023; pp. 9009–9019. [Google Scholar] [CrossRef]
- Wang, S.; Leroy, V.; Cabon, Y.; Chidlovskii, B.; Revaud, J. DUSt3R: Geometric 3D Vision Made Easy. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); June 2024; pp. 20697–20709. [Google Scholar] [CrossRef]
- Yang, S.-W.; Wang, C.-C.; Chang, C.-H. RANSAC Matching: Simultaneous Registration and Segmentation. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation; May 2010; pp. 1905–1912. [Google Scholar] [CrossRef]
- Moulon, P.; Monasse, P.; Perrot, R.; Marlet, R. OpenMVG: Open Multiple View Geometry. In Proceedings of the Reproducible Research in Pattern Recognition; Kerautret, B., Colom, M., Monasse, P., Eds.; Springer International Publishing: Cham, 2017; pp. 60–74. [CrossRef]
- Schönberger, J.L.; Frahm, J.-M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); June 2016; pp. 4104–4113. [Google Scholar] [CrossRef]
- Ng, P.C.; Henikoff, S. SIFT: Predicting Amino Acid Changes That Affect Protein Function. Nucleic Acids Research 2003, 31, 3812–3814. [Google Scholar] [CrossRef] [PubMed]
- Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. LIFT: Learned Invariant Feature Transform. In Proceedings of the Computer Vision – ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, 2016; pp. 467–483. [Google Scholar] [CrossRef]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision; November 2011; pp. 2564–2571. [Google Scholar] [CrossRef]
- Arandjelović, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2018, 40, 1437–1451. [Google Scholar] [CrossRef] [PubMed]
- Rusu, R.B.; Blodow, N.; Beetz, M. Fast Point Feature Histograms (FPFH) for 3D Registration. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation; May 2009; pp. 3212–3217. [Google Scholar] [CrossRef]
- Zhou, Q.-Y.; Park, J.; Koltun, V. Fast Global Registration. In Proceedings of the Computer Vision – ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, 2016; pp. 766–782. [Google Scholar] [CrossRef]
- Besl, P.J.; McKay, N.D. Method for Registration of 3-D Shapes. In Proceedings of the Sensor Fusion IV: Control Paradigms and Data Structures; SPIE, April 30 1992; Vol. 1611, pp. 586–606. [CrossRef]
- Rusinkiewicz, S.; Levoy, M. Efficient Variants of the ICP Algorithm. In Proceedings of the Proceedings Third International Conference on 3-D Digital Imaging and Modeling; May 2001; pp. 145–152. [Google Scholar] [CrossRef]
- ICP Registration - Open3D 0.19.0 Documentation. Available online: https://www.open3d.org/docs/release/tutorial/t_pipelines/t_icp_registration.html (accessed on 14 January 2025).













| Need initial alignment | time/s | mean/m | std/m | median/m | RMSE/m | |
|---|---|---|---|---|---|---|
| FPFH-RANSAC | ✗ | 25.929 | 1.568 | 1.037 | 1.400 | 1.915 |
| FPFH-FGR | ✗ | 26.457 | 3.052 | 2.008 | 2.863 | 3.663 |
| ICP-point to plane (0.05) | ✓ | 4.163 | 0.837 | 0.408 | 0.849 | 0.933 |
| ICP-point to plane (0.25) | ✓ | 4.198 | 0.313 | 0.257 | 0.251 | 0.408 |
| ICP-point to point (0.05) | ✓ | 4.192 | 0.844 | 0.409 | 0.857 | 0.940 |
| ICP-point to point (0.25) | ✓ | 4.182 | 0.361 | 0.282 | 0.298 | 0.460 |
| 2DFM-RANSAC (ours) | ✗ | 2.474 | 0.268 | 0.230 | 0.220 | 0.358 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).