Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Sparse Mix-Attention Transformer for Multispectral Image and Hyperspectral Image Fusion

Version 1 : Received: 3 November 2023 / Approved: 3 November 2023 / Online: 6 November 2023 (11:30:19 CET)

A peer-reviewed article of this Preprint also exists.

Yu, S.; Zhang, X.; Song, H. Sparse Mix-Attention Transformer for Multispectral Image and Hyperspectral Image Fusion. Remote Sens. 2024, 16, 144. Yu, S.; Zhang, X.; Song, H. Sparse Mix-Attention Transformer for Multispectral Image and Hyperspectral Image Fusion. Remote Sens. 2024, 16, 144.

Abstract

Multispectral image (MSI) and hyperspectral image (HSI) fusion (MHIF) aims to address the challenge of acquiring high-resolution (HR) HSI images. This field combines a low-resolution (LR) HSI with an HR-MSI to reconstruct HR-HSI. Existing methods directly utilize transformers to perform feature extraction and fusion. Despite the demonstrated success, there exist two limitations: 1) Employing the entire transformer model for feature extraction and fusion fails to fully harness the transformer’s potential in integrating the spectral of the HSI and spatial information of the MSI. 2) HSI has a strong spectral correlation and exhibits sparsity in the spatial domain. Existing transformer-based models do not optimize this physical property, which makes their methods prone to spectral distortion. To accomplish these issues, this paper introduces a novel framework for MHIF called Sparse Mix-Attention Transformer (SAMformer). Specifically, to fully harness the advantages of the Transformer architecture, we propose a Spectral Mix Attention Block (SMAB), which concatenates the keys and values extracted from LR-HSI and HR-MSI to create a new multi-head attention module. This design facilitates the extraction of detailed long-range information across spatial and spectral dimensions. Besides, to address the spatial sparsity inherent in HSI, we incorporated a sparse mechanism within the core of SMAB called Sparse Spectral Mix Attention Block (SSMAB). In the SSMAB, we compute attention maps from queries and keys and select the K highly correlated values as the sparse attention map. This approach enables us to achieve a sparse representation of spatial information while eliminating spatially disruptive noise. Extensive experiments conducted on three benchmark datasets, namely Cave, Harvard, and Pavia Center, demonstrate the SMAformer method outperforms state-of-the-art methods.

Keywords

hyperspectral imaging super-resolution; image fusion; transformer; remote sensing

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.