Preprint
Article

This version is not peer-reviewed.

A SAM2-Driven RGB-T Annotation Pipeline with Thermal-Guided Refinement for Semantic Segmentation in Search-and-Rescue Scenes

Submitted:

05 January 2026

Posted:

06 January 2026

You are already at the latest version

Abstract
High-quality RGB–thermal infrared (RGB-T) semantic segmentation datasets are crucial for search-and-rescue (SAR) applications, yet their development is hindered by the scarcity of annotated ground truth and by the challenges of thermal-camera calibration, which typically depends on heated targets with limited geometric definition. Recent approaches, such as MATT, focus on transferring SAM-based RGB masks to multi-spectral data, but they do not fully address the need for robust cross-modal alignment, quality control, or human-in-the-loop reliability assessment in RGB-T segmentation. To fill this gap, we propose a general annotation methodology that performs geometric alignment of RGB-T pairs, combines model-based proposals with interactive refinement, and incorporates annotation cost and systematic quality checks using inter-annotator agreement. In this methodology, multimodal alignment is ensured through feature-based matching and homography estimation. Annotation integrates automatic proposals and guided refinement, and final masks undergo quantitative cost and quality control before being used in downstream model training. The proposed methodology was evaluated on a SAR-oriented RGB-T dataset comprising 306 image pairs. Consistent cross-modal alignment was achieved via SuperGlue-based matching and homography estimation, enabling the implementation of a SAM2-based semi-automatic annotation pipeline in Label Studio. Results across two annotators show that the proposed approach reduces annotation time by 21% while achieving a high annotation quality mean IoU = 74.9%) and a high inter-annotator agreement (mean pixel accuracy = 88.4%, Cohen's kappa = 83%). The curated labels were then used to benchmark two representative RGB-T segmentation models. These findings demonstrate the practical value of the proposed methodology and establish a reproducible framework for generating reliable RGB-T semantic segmentation datasets, complementing and extending recent multispectral auto-labeling approaches.
Keywords: 
;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated