Generative and Retrieval Image Captioning Towards Automated InSAR Image Analysis

Joe Yazbeck; John B. Rundle

doi:10.20944/preprints202605.0685.v1

Submitted:

09 May 2026

Posted:

12 May 2026

You are already at the latest version

Abstract

Interpreting interferometric synthetic aperture radar (InSAR) imagery is a critical task in monitoring volcanic and seismic activity, yet the process usually requires expert knowledge and manual analysis. As the volume of satellite observations continues to increase, automated methods capable of describing and interpreting these images become increasingly important in order to assist geophysical monitoring efforts. In this work, we investigate the feasibility of automated image captioning for InSAR data using modern vision-language models. We utilize the Hephaestus dataset which is a large collection of annotated interferograms focused on volcanic deformation, and apply a series of preprocessing steps to curate a balanced dataset of deforming and non-deforming images. Two generative image captioning architectures, the Generative Image-to-Text Transformer (GIT) and Bootstrapping Language-Image Pretraining (BLIP), are fine-tuned to output natural language descriptions of the InSAR images. In addition, we implement a retrieval-based model that aligns image and text representations within a shared embedding space and retrieves the most semantically similar caption. The performance of these approaches is evaluated using standard captioning metrics and qualitative inspection of generated descriptions. Our results suggest that pre-trained vision–language models can adapt to specialized scientific imagery despite being trained primarily on natural image datasets. This study represents an initial step towards automated interpretation systems capable of assisting researchers in large-scale InSAR monitoring applications.

Keywords:

natural language processing

;

machine learning

;

InSAR

;

image captioning

;

highperformance computing

;

remote sensing

;

vision-language models

;

transformer models

;

volcanic deformation

;

seismology

Subject:

Physical Sciences - Applied Physics

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Generative and Retrieval Image Captioning Towards Automated InSAR Image Analysis

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe