Preprint
Article

This version is not peer-reviewed.

AttnLink: Enhancing Cross-Modal Fusion for Robust Image-to-PointCloud Place Recognition

Submitted:

07 January 2026

Posted:

14 January 2026

You are already at the latest version

Abstract
Image-to-PointCloud (I2P) place recognition is crucial for autonomous systems, facing challenges from modality discrepancies and environmental variations. Existing feature fusion strategies often fall short in complex real-world scenarios. We propose AttnLink, a novel framework that significantly enhances I2P place recognition through a sophisticated attention-guided cross-modal feature fusion mechanism. AttnLink integrates an Adaptive Depth Completion Network to generate dense depth maps and an Attention-Guided Cross-Modal Feature Encoder, utilizing lightweight spatial attention for local features and a context-gating mechanism for robust semantic clustering. Our core innovation is a Multi-Head Attention Fusion Network, which adaptively weights and fuses multi-modal, multi-level descriptors for a highly discriminative global feature vector. Trained end-to-end, AttnLink demonstrates superior performance on KITTI and HAOMO datasets, outperforming state-of-the-art methods in retrieval accuracy, efficiency, and robustness to varying input quality. Detailed ablation studies confirm the effectiveness of its components, supporting AttnLink's reliable deployment in real-time autonomous driving applications.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated