Submitted:
25 May 2026
Posted:
27 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Works
2.1. Learning-Based Methods for 3D-PCPR
2.2. Foundation Models for 2D and 3D Understanding
2.3. Point Cloud Sampling
2.4. Heterogeneous Point Cloud
3. Problem Definition
4. Method
Grouping
Encoding
Optimal Transport
4.1. Combined Rank-Triplet Loss
5. Experiments
5.1. Datasets and Evaluation Metric
5.2. Knowledge Transferring
5.3. Density Variation
5.4. Uni-PCPR
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Schubert, S.; Neubert, P.; Garg, S.; Milford, M.; Fischer, T. Visual Place Recognition: A Tutorial [Tutorial]. IEEE Robot. Autom. Mag. 2023, 31, 139–153. [Google Scholar] [CrossRef]
- Luo, K.; Yu, H.; Chen, X.; Yang, Z.; Wang, J.; Cheng, P.; Mian, A. 3D point cloud-based place recognition: A survey. Artif. Intell. Rev. 2024, 57, 83. [Google Scholar] [CrossRef]
- Yang, F.; Ismail, N.A.; Pang, Y.Y.; Kebande, V.R.; Al-Dhaqm, A.; Koh, T.W. A systematic literature review of deep learning approaches for sketch-based image retrieval: Datasets, metrics, and future directions. IEEE Access 2024, 12, 14847–14869. [Google Scholar] [CrossRef]
- Wu, S.; Xiong, Y.; Cui, Y.; Wu, H.; Chen, C.; Yuan, Y.; Huang, L.; Liu, X.; Kuo, T.W.; Guan, N.; et al. Retrieval-augmented generation for natural language processing: A survey. arXiv 2024, arXiv:2407.13193. [Google Scholar]
- Komorowski, J. Minkloc3d: Point cloud based large-scale place recognition. In Proceedings of the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021; pp. 1790–1799. [Google Scholar]
- Tang, H.; Liu, Z.; Li, X.; Lin, Y.; Han, S. Torchsparse: Efficient point cloud inference engine. Proc. Mach. Learn. Syst. 2022, 4, 302–315. [Google Scholar]
- Graham, B.; Engelcke, M.; Van Der Maaten, L. 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018; pp. 9224–9232. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3dssd: Point-based 3d single stage object detector. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020; pp. 11040–11048. [Google Scholar]
- Dey, T.K.; Wang, Y. Computational topology for data analysis; Cambridge University Press, 2022. [Google Scholar]
- Carlsson, G.; Vejdemo-Johansson, M. Topological data analysis with applications; Cambridge University Press, 2021. [Google Scholar]
- Wu, C.; Wan, Y.; Fu, H.; Pfrommer, J.; Zhong, Z.; Zheng, J.; Zhang, J.; Beyerer, J. SAMBLE: Shape-Specific Point Cloud Sampling for an Optimal Trade-Off Between Local Detail and Global Uniformity. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025; pp. 1342–1352. [Google Scholar]
- Wu, C.; Zheng, J.; Pfrommer, J.; Beyerer, J. Attention-based point cloud edge sampling. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023; pp. 5333–5343. [Google Scholar]
- Arief, H.A.; Arief, M.; Bhat, M.; Indahl, U.G.; Tveite, H.; Zhao, D. Density-Adaptive Sampling for Heterogeneous Point Cloud Object Segmentation in Autonomous Vehicle Applications. In Proceedings of the CVPR Workshops, 2019; pp. 26–33. [Google Scholar]
- Jung, M.; Yang, W.; Lee, D.; Gil, H.; Kim, G.; Kim, A. HeLiPR: Heterogeneous LiDAR dataset for inter-LiDAR place recognition under spatiotemporal variations. Int. J. Robot. Res. 2024, 43, 1867–1883. [Google Scholar] [CrossRef]
- Maddern, W.; Pascoe, G.; Linegar, C.; Newman, P. 1 year, 1000 km: The oxford robotcar dataset. Int. J. Robot. Res. 2017, 36, 3–15. [Google Scholar] [CrossRef]
- Guan, T.; Muthuselvam, A.; Hoover, M.; Wang, X.; Liang, J.; Sathyamoorthy, A.J.; Conover, D.; Manocha, D. CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023; pp. 11335–11344. [Google Scholar]
- Uy, M.A.; Lee, G.H. Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018; pp. 4470–4479. [Google Scholar]
- Arandjelovic, R.; Zisserman, A. All about VLAD. In Proceedings of the Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2013; pp. 1578–1585. [Google Scholar]
- Radenović, F.; Tolias, G.; Chum, O. Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1655–1668. [Google Scholar] [CrossRef] [PubMed]
- Jung, M.; Jung, S.; Gil, H.; Kim, A. HeLiOS: Heterogeneous LiDAR Place Recognition via Overlap-based Learning and Local Spherical Transformer. arXiv 2025, arXiv:2501.18943. [Google Scholar]
- Lai, X.; Chen, Y.; Lu, F.; Liu, J.; Jia, J. Spherical Transformer for LiDAR-based 3D Recognition. In Proceedings of the CVPR, 2023. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International conference on machine learning. PMLR, 2021; pp. 8748–8763. [Google Scholar]
- Keetha, N.; Mishra, A.; Karhade, J.; Jatavallabhula, K.M.; Scherer, S.; Krishna, M.; Garg, S. Anyloc: Towards universal visual place recognition. IEEE Robotics and Automation Letters, 2023. [Google Scholar]
- Zhu, X.; Zhang, R.; He, B.; Guo, Z.; Zeng, Z.; Qin, Z.; Zhang, S.; Gao, P. Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023; pp. 2639–2650. [Google Scholar]
- Hess, G.; Tonderski, A.; Petersson, C.; Åström, K.; Svensson, L. Lidarclip or: How i learned to talk to point clouds. In Proceedings of the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024; pp. 7438–7447. [Google Scholar]
- Zeng, Y.; Jiang, C.; Mao, J.; Han, J.; Ye, C.; Huang, Q.; Yeung, D.Y.; Yang, Z.; Liang, X.; Xu, H. CLIP2: Contrastive language-image-point pretraining from real-world point cloud data. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023; pp. 15244–15253. [Google Scholar]
- Liu, M.; Shi, R.; Kuang, K.; Zhu, Y.; Li, X.; Han, S.; Cai, H.; Porikli, F.; Su, H. Openshape: Scaling up 3d shape representation towards open-world understanding. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
- Xue, L.; Yu, N.; Zhang, S.; Panagopoulou, A.; Li, J.; Martín-Martín, R.; Wu, J.; Xiong, C.; Xu, R.; Niebles, J.C.; et al. Ulip-2: Towards scalable multimodal pre-training for 3d understanding. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024; pp. 27091–27101. [Google Scholar]
- Zhou, J.; Wang, J.; Ma, B.; Liu, Y.S.; Huang, T.; Wang, X. Uni3d: Exploring unified 3d representation at scale. arXiv 2023, arXiv:2310.06773. [Google Scholar] [CrossRef]
- Deitke, M.; Schwenk, D.; Salvador, J.; Weihs, L.; Michel, O.; VanderBilt, E.; Schmidt, L.; Ehsani, K.; Kembhavi, A.; Farhadi, A. Objaverse: A universe of annotated 3d objects. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023; pp. 13142–13153. [Google Scholar]
- Chang, A.X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H.; et al. Shapenet: An information-rich 3d model repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
- Fu, H.; Jia, R.; Gao, L.; Gong, M.; Zhao, B.; Maybank, S.; Tao, D. 3d-future: 3d furniture shape with texture. Int. J. Comput. Vis. 2021, 129, 3313–3337. [Google Scholar] [CrossRef]
- Collins, J.; Goel, S.; Deng, K.; Luthra, A.; Xu, L.; Gundogdu, E.; Zhang, X.; Vicente, T.F.Y.; Dideriksen, T.; Arora, H.; et al. Abo: Dataset and benchmarks for real-world 3d object understanding. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022; pp. 21126–21136. [Google Scholar]
- Moenning, C.; Dodgson, N.A. Fast marching farthest point sampling. Technical report, 2003. [Google Scholar]
- Yu, X.; Tang, L.; Rao, Y.; Huang, T.; Zhou, J.; Lu, J. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022; pp. 19313–19322. [Google Scholar]
- Cavanna, N.J.; Jahanseir, M.; Sheehy, D.R. A geometric perspective on sparse filtrations. arXiv 2015, arXiv:1506.03797. [Google Scholar] [CrossRef]
- Lang, I.; Manor, A.; Avidan, S. SampleNet: Differentiable Point Cloud Sampling. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020; pp. 7578–7588. [Google Scholar]
- Qian, Y.; Hou, J.; Zhang, Q.; Zeng, Y.; Kwong, S.; He, Y. Mops-net: A matrix optimization-driven network fortask-oriented 3d point cloud downsampling. arXiv 2020, arXiv:2005.00383. [Google Scholar]
- Wang, P.S. Octformer: Octree-based transformers for 3d point clouds. ACM Trans. Graph. (TOG) 2023, 42, 1–11. [Google Scholar] [CrossRef]
- Jack, D.; Maire, F.; Denman, S.; Eriksson, A. Sparse convolutions on continuous domains for point cloud and event stream networks. In Proceedings of the Proceedings of the Asian Conference on Computer Vision, 2020. [Google Scholar]
- Liu, S.; Zhang, M.; Kadam, P.; Kuo, C. 3D Point cloud analysis; Springer, 2021. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2017; pp. 652–660. [Google Scholar]
- Xue, L.; Gao, M.; Xing, C.; Martín-Martín, R.; Wu, J.; Xiong, C.; Xu, R.; Niebles, J.C.; Savarese, S. Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023; pp. 1179–1189. [Google Scholar]
- Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst. 2013, 26. [Google Scholar]
- Izquierdo, S.; Civera, J. Optimal transport aggregation for visual place recognition. In Proceedings of the Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2024; pp. 17658–17668. [Google Scholar]
- Komorowski, J. Improving point cloud based place recognition with ranking-based loss and large batch training. In Proceedings of the 2022 26th international conference on pattern recognition (ICPR); IEEE, 2022; pp. 3699–3705. [Google Scholar]
- Hermans, A.; Beyer, L.; Leibe, B. In defense of the triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737. [Google Scholar] [CrossRef]
- Yin, H.; Xu, X.; Lu, S.; Chen, X.; Xiong, R.; Shen, S.; Stachniss, C.; Wang, Y. A survey on global lidar localization: Challenges, advances and open problems. Int. J. Comput. Vis. 2024, 1–33. [Google Scholar] [CrossRef]
- Zhang, H.; Lyu, M.; He, C.; Ao, Y.; Lin, Y. Trimtokenator: Towards adaptive visual token pruning for large multimodal models. arXiv 2025, arXiv:2509.00320. [Google Scholar] [CrossRef]
- Wen, C.; Yu, B.; Tao, D. Learnable skeleton-aware 3d point cloud sampling. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023; pp. 17671–17681. [Google Scholar]
- Zhang, Y.; Hu, Q.; Xu, G.; Ma, Y.; Wan, J.; Guo, Y. Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022; pp. 18953–18962. [Google Scholar]




| Oxford | R.A. | B.D. | ||||
|---|---|---|---|---|---|---|
| Method | AR@1% | AR@1 | AR@1% | AR@1 | AR@1% | AR@1 |
| SOLID | 21.4 | 11.4 | 33.2 | 21.8 | 28.3 | 22.4 |
| PointNetVLAD | 46.4 | 31.6 | 61.6 | 51.0 | 60.9 | 53.8 |
| MinkLoc3Dv2 | 78.0 | 62.3 | 90.7 | 85.7 | 89.5 | 85.9 |
| CASSPR | 44.3 | 30.2 | 49.3 | 40.9 | 48.7 | 43.0 |
| CrossLoc3D | 74.8 | 55.7 | 94.0 | 65.6 | 91.5 | 65.6 |
| HeLiOS | 80.8 | 65.9 | 97.5 | 93.9 | 96.2 | 92.7 |
| Uni-PCPR | 95.7 | 89.0 | 95.5 | 90.3 | 92.7 | 88.4 |
| Uni3D-base | 84.6 | 72.1 | 91.3 | 84.3 | 87.8 | 82.4 |
| Uni3D-tuned | 91.8 | 81.7 | 90.3 | 83.1 | 86.6 | 81.9 |
| U.S. | CS3D | |||
|---|---|---|---|---|
| Method | AR@1% | AR@1 | AR@1% | AR@1 |
| SOLID | 44.2 | 26.5 | 56.7 | 36.4 |
| PointNetVLAD | 69.4 | 55.9 | 68.8 | 44.6 |
| MinkLoc3Dv2 | 92.4 | 85.0 | 65.7 | 38.2 |
| CASSPR | 55.5 | 44.8 | 58.3 | 37.2 |
| CrossLoc3D | 96.9 | 66.8 | 58.0 | 45.8 |
| HeLiOS | 98.2 | 92.2 | 65.2 | 51.8 |
| Uni-PCPR | 98.3 | 92.4 | 63.6 | 50.2 |
| Uni3D-base | 95.6 | 86.4 | 57.8 | 45.0 |
| Uni3D-tuned | 94.7 | 87.9 | 59.1 | 48.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).