Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Stereo Matching Method for Remote Sensing Images Based on Attention and Scale Fusion

Version 1 : Received: 14 November 2023 / Approved: 15 November 2023 / Online: 15 November 2023 (09:24:21 CET)

A peer-reviewed article of this Preprint also exists.

Wei, K.; Huang, X.; Li, H. Stereo Matching Method for Remote Sensing Images Based on Attention and Scale Fusion. Remote Sens. 2024, 16, 387. Wei, K.; Huang, X.; Li, H. Stereo Matching Method for Remote Sensing Images Based on Attention and Scale Fusion. Remote Sens. 2024, 16, 387.

Abstract

With the development of remote sensing satellite technology for Earth observation, remote sensing stereo images have been used for three-dimensional reconstruction in various fields, such as urban planning and construction. However, remote sensing images often include noise, occluded regions, weakly textured areas and repeated textures, which can lead to reduced accuracy in stereo matching and affect the quality of the 3D reconstruction results. To reduce the impact of complex scenes in remote sensing images on stereo matching and to ensure both speed and accuracy, we propose a new end-to-end stereo matching network based on convolutional neural networks (CNNs). The proposed stereo matching network can learn features at different scales from the original images and construct cost volumes with varying scales to obtain richer scale information. Additionally, when constructing the cost volume, we introduce negative disparity to adapt to the common occurrence of both negative and nonnegative disparities in remote sensing stereo image pairs. For cost aggregation, we employ a 3D convolution-based encoder-decoder structure that allows the network to adaptively aggregate information. Before feature aggregation, we also introduce an attention module to retain more valuable feature information, enhance feature representation, and obtain a higher-quality disparity map. By training on the publicly available US3D dataset, we obtain the accuracy that 1.115 pixel in end-point error (EPE) and 5.32% in the error pixel ratio (D1) on the test dataset, and the inference speed is 92 ms. Comparing our model with existing state-of-the-art models, we achieve higher accuracy, and the network is beneficial for the three-dimensional reconstruction of remote sensing images.

Keywords

stereo matching; remote sensing image; deep learning; multiscale; attention

Subject

Environmental and Earth Sciences, Remote Sensing

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.