5. Methodology
This section outlines the proposed methodology for generating a thermally enriched three-dimensional point cloud using a dual-sensor acquisition system. The workflow is structured into seven consecutive stages that enable, in an automated and reproducible manner, the capture of data, reconstruction of 3D geometry, extraction of calibrated thermal information, and its projection onto the 3D model. Each step was designed to minimise manual intervention and to ensure traceability between the captured images, camera parameters, and the thermal values assigned to each point. The following subsections describe each of these stages in detail.
5.1. Optical System Calibration
The first step of the proposed workflow consists of the geometric calibration of the RGB and thermal sensors, with the aim of correcting lens-induced distortions and estimating the intrinsic parameters required for the subsequent stages of image undistortion, spectral registration, and thermal data fusion. This step is critical to ensure accurate correspondence between RGB and thermal images, and to guarantee that thermal information can be coherently projected onto three-dimensional models generated via photogrammetry.
The need for multispectral calibration has been widely discussed in the literature. To achieve precise fusion across different spectral domains, it is essential to have well-defined intrinsic parameters for each optical system, and these parameters must be derived from patterns visible in both spectra [
5]. In this context, several authors have proposed the use of heated checkerboards or materials with differential emissivity as practical solutions for thermal calibration targets [
35,
36].
In this work, a calibration pattern composed of black and white squares arranged as a 9×6 inner-corner checkerboard, with square cells of 40 mm, was designed. The pattern was printed on a rigid surface and placed on a flat base in a controlled temperature environment. It was heated using an infrared lamp, producing sufficient thermal contrast to make it visible in both the RGB and thermal domains. This strategy, recommended in prior cross-calibration studies [
37], enabled detection of the same pattern in both modalities without the need for special sensors or active markers.
Image acquisition was performed using the DJI Mavic 3T UAV system, keeping the pattern centred within the frame and at a fixed distance from the sensors. Multiple photographs were taken from different perspectives to ensure sufficient coverage in terms of orientation and angle of view [
38], following the recommendations of the pinhole camera model implemented in OpenCV. In the case of the thermal camera, it was necessary to invert the grayscale levels of the image, as heating reverses the expected contrast: black squares tend to absorb more radiation and appear hotter (and therefore brighter), while white squares reflect more and remain cooler.
Once the images were loaded into Agisoft Metashape within a single chunk configured as a Multi-Camera System, the software automatically identified the RGB–Thermal image pairs, assigning the sensors based on import order and dataset structure. The user can access the first step of the workflow through the Metashape interface by navigating to Scripts, 3D Thermal Mapping, STEP 1 - Calibrate Sensors, which opens an interactive window allowing confirmation of the primary (RGB) and secondary (Thermal) cameras, as well as the definition of the calibration pattern dimensions. By default, a 9×6 checkerboard with 40 mm square cells is used, although these parameters can be adjusted depending on the experimental setup. The system automatically stores the calibration results within the active project. Agisoft Metashape allows the export of the intrinsic matrices and distortion coefficients to external files for further use or independent validation.
Prior to the calibration computation, specific preprocessing techniques were applied to maximise pattern detectability. In RGB images, contrast normalisation and local adaptive histogram equalisation (CLAHE) were applied, while thermal images underwent greyscale inversion, automatic binary segmentation using Otsu’s method, and local thermal contrast enhancement. These operations proved particularly beneficial for improving pattern visibility in low-resolution thermal sensors.
The pattern detection process in each image involved a geometric analysis to locate the internal corners of the checkerboard and refine their positions using sub-pixel adjustment techniques. This was performed independently for both RGB and thermal images, ensuring high accuracy in both spectral domains. Once multiple pattern-image correspondences were identified across the dataset, individual calibration was carried out for each sensor, estimating the intrinsic matrix, optical distortion coefficients, and the rotation and translation vectors describing the pattern’s position relative to the camera. The entire procedure was implemented using OpenCV tools, which are widely validated for computer vision and multiview calibration tasks.
The results were integrated into the Metashape chunk to be used in subsequent stages of the pipeline. Additionally, Agisoft Metashape allows the export of the calibration parameters in .xml format for external traceability or for reuse in future projects using the same dual-sensor system.
The full calibration routine was automated through Python scripts, directly integrated into the Metashape API, enabling a repeatable, traceable, and replicable execution of the process, from image loading to parameter export.
Figure 2 shows the flowchart corresponding to this procedure, as well as the Metashape interface used for its execution.
Reprojection errors were also computed individually per image, allowing for a quantitative assessment of the geometric calibration quality and enabling the selection of the most accurate views for the following stages of the workflow. When both cameras successfully detected the pattern in each capture, a joint calibration was performed, generating a stereoscopic configuration that includes the intrinsic parameters of each sensor and their relative spatial relationship.
Figure 3 illustrates the results of this process, showing the original image of each sensor and the detected pattern in both modalities.
Figure 3 illustrates the results of the calibration step, showing the original and the detected chessboard on each pair.
5.2. Image Undistortion
Once both sensors have been calibrated and their respective intrinsic parameters, including lens distortion coefficients, have been obtained, the next step involves undistorting all captured images. This procedure aims to remove lens-induced deformations and produce images that conform to the pinhole projection model, which is essential for accurate geometric computations across views. In particular, prior undistortion improves both the numerical stability and accuracy of homography estimation, as point correspondences are established using distortion-free coordinates, a principle widely demonstrated in foundational computer vision studies [
37].
The undistortion process is based directly on the camera matrices and distortion coefficients obtained in
Section 5.1 (Sensor Calibration). These matrices enable the reconstruction of idealised image representations in which pixels correspond to linear projections on the image plane. The procedure was implemented through a Python script that automates the processing of RGB–Thermal image pairs. This script can be executed independently or integrated as an auxiliary module within Agisoft Metashape, allowing seamless incorporation into existing workflows.
Within the Metashape environment, this step can be initiated by navigating to
Scripts, 3D Thermal Mapping, STEP 2 - Undistort Cameras. This action opens an interactive dialog that allows the user to select and launch the undistortion process. Internally, the script retrieves the calibration data stored in the current project and applies the transformation to each RGB–Thermal image pair. The resulting undistorted images are stored in a temporary chunk, which preserves the camera orientation and projection parameters, and can be directly used in the subsequent stage for homography computation.
Figure 4 presents a flowchart of this process alongside its implementation within the Metashape environment.
The process leverages functions from the OpenCV library to generate undistorted images by correcting both radial and tangential distortion. Input and output resolutions are preserved to ensure consistency in scale and field of view.
Figure 5 presents visual examples comparing original and undistorted images from both the RGB and thermal channels. Notable improvements include the correction of peripheral curvature and the straightening of linear structures, which significantly enhance the reliability of feature matching in subsequent stages of the pipeline.
Although technical in nature, this step has a direct impact on the accuracy of downstream computations, particularly in multispectral contexts where geometric discrepancies between sensors tend to amplify registration errors. Ensuring that all images conform to the same projection model provides a reliable foundation for homography computation and the RGB-thermal registration described in the following section.
5.3. Cross-Spectral Feature Matching and Homography Computation
Precise registration between the RGB and thermal sensors is essential for the accurate transfer of radiometric information onto the 3D model. Since these sensors operate in distinct spectral domains, registering their respective images poses specific challenges, particularly in the detection and matching of keypoints. To address this issue, a two-stage strategy was implemented: an initial benchmarking phase involving feature matching algorithms, followed by the computation of a single homography matrix for the entire system.
During the evaluation phase, a representative set of ten RGB–Thermal image pairs, captured during the calibration process (see
Section 5.1), was selected based on clear visibility of the calibration pattern in both channels. For each image pair, a series of manually annotated control points was recorded to serve as reference data.
Each image pair was processed using twelve state-of-the-art feature matching algorithms, selected for their proven performance in multispectral or multimodal contexts: DUSt3R, MASt3R, EfficientLoFTR, XoFTR, LiftFeat, RDD, LightGlue, DeDoDe, GIM, RoMa, OmniGlue and SuperGlue. These methods were assessed based on their ability to detect robust keypoints correspondences, generate sufficient match points, and enable stable projective transformation computation.
For each algorithm, a homography matrix was computed from the detected correspondences and applied to project thermal image points onto the RGB image plane. The projected points were then compared to the manually annotated control points, and the average Euclidean distance per image pair was calculated. A global mean error was subsequently computed across the ten pairs.
The algorithm yielding the lowest alignment error was selected as the optimal method and adopted as the default throughout the remainder of the pipeline, due to its robustness, accuracy, and efficiency under real-world acquisition conditions.
In the second stage, once the optimal method had been defined, it was applied to the newly acquired dataset. For each undistorted RGB-Thermal image pair, the selected algorithm was used to detect and match features, and a single homography matrix was computed between the sensors. This matrix encapsulates the geometric transformation between both sensors and is used in subsequent stages to project thermal data onto the 3D model.
The use of a single homography matrix, rather than one per image pair, is justified by the controlled acquisition conditions, fixed sensor mounting, consistent flight altitude, and the relatively planar nature of the observed scenes. These conditions make it possible to model the acquisition system as if all images were projected onto a shared planar surface, validating the use of a single global projective transformation [
36]. This strategy significantly simplifies subsequent processing and avoids cumulative errors resulting from heterogeneous local estimations, as demonstrated in recent studies on multispectral fusion with co-mounted sensors [
37].
To initiate the homography estimation step within Agisoft Metashape, the user must navigate to Scripts, 3D Thermal Mapping, STEP 3 - Calculate Homography. This action opens a dedicated dialog that guides the configuration and execution of the homography computation process. The interface allows the user to select the chunk containing the undistorted images generated in the previous step. By default, the script automatically selects the temporary chunk produced during the undistortion process; however, this can be manually modified to point to any compatible chunk within the current project.
Additionally, the user can choose the feature matching algorithm to be used for establishing point correspondences between the RGB and thermal images. By default, the system uses LightGlue, but alternative algorithms can be selected as needed. The dialog also exposes a set of algorithm-specific parameters that can be tuned to optimise detection thresholds, descriptor matching strategies, or filtering criteria, depending on the selected method.
Figure 6 presents a flowchart outlining this procedure, along with a screenshot of the interface implemented in Metashape for user interaction.
A quantitative comparison of the accuracy achieved by each algorithm is presented in
Section 7, along with an analysis of thermal accuracy and the final enriched 3D model.
5.4. Thermal Data Extraction
The next step in the pipeline involves extracting calibrated thermal information from each thermal image captured by the UAV system. This process aims to retrieve, for every pixel in the thermal image, a radiometrically corrected temperature value, adjusted according to ambient conditions and the physical properties of the observed surface.
To achieve this, a Python script was developed using DJI’s official SDK, with ExifTool employed as an auxiliary utility for accessing embedded image metadata. The script can be executed independently or integrated directly within the Agisoft Metashape environment, allowing seamless incorporation into photogrammetric workflows.
The procedure begins by reading the original thermal images, which are then processed through radiometric decoding using the parse_dirp2 function provided by the DJI SDK. This function outputs a temperature matrix with the same dimensions as the input image, where each cell represents the apparent surface temperature at the corresponding pixel. The resulting data is saved in compressed .npz format using the NumPy library, ensuring efficient storage and rapid access in downstream stages.
To improve thermal accuracy, the script incorporates several physical and environmental parameters that influence thermal data interpretation:
ambient temperature, measured in situ during the field campaign.
relative humidity, recorded under the same environmental conditions.
material emissivity, defined according to standard reference tables.
sensor-to-object distance, estimated from flight metadata.
These factors allow for correction of reflected temperature and compensate for systematic deviations due to external conditions, following established calibration principles for uncooled thermal sensors. The tool currently supports thermal images at 640×512 resolution and produces floating-point matrices ready for 3D mapping in the subsequent projection step.
To initiate this process within Metashape, the user must navigate to Scripts, 3D Thermal Mapping, STEP 4 - Extract Thermal Information. This action opens a dialog that allows the user to select the thermal sensor, which is automatically assigned by Metashape based on the project configuration but can be manually adjusted if necessary. The interface also enables the user to input the measured field conditions, including ambient temperature, relative humidity, material emissivity, and sensor-to-object distance, previously described in detail.
These values are then used to generate a temperature matrix for each thermal image, ensuring that the radiometric data are physically consistent with the acquisition context. The thermal outputs are stored and linked to the corresponding undistorted images, preparing them for integration with the RGB model in the final stage of the pipeline.
Figure 7 presents a flowchart describing this procedure, along with a screenshot of the script interface implemented in Agisoft Metashape.
This process ensures that each thermal image is associated with a matrix of physically meaningful data, prepared for integration with the RGB point cloud. In doing so, it enables precise projection of temperature values onto the reconstructed 3D model during the final stage of the workflow.
5.5. Photogrammetric Generation of High-Resolution RGB Point Cloud
Data acquisition for point cloud generation was carried out using a DJI Mavic 3T UAV platform, equipped with both RGB and thermal sensors. For the photogrammetric capture, the system’s wide-angle optical camera was used exclusively, as it offers higher spatial resolution compared to the telephoto camera, an essential factor for achieving a dense and accurate three-dimensional reconstruction. The selection of this camera maximises the geometric quality of the point cloud, leveraging its wider field of view, which is better suited to SfM and MVS reconstruction tasks.
The data acquisition flight was planned using an automated path divided into three grids: one for nadir images and two for oblique shots at 45 degrees, flown in orthogonal directions. To ensure robust coverage and reliable reconstruction, the RGB images were captured with 85% forward overlap and 80% side overlap. These values also provide good coverage for the thermal dataset, which features a narrower field of view compared to the RGB sensor. The flight speed was limited to 2 m/s to prevent an excessive shooting frequency that could overwhelm the RGB camera processor, potentially forcing it to reduce image resolution. This configuration ensures sufficient redundancy for multi-view triangulation and robust keypoint detection.
Figure 8 shows the flight planning interface used during this phase.
Throughout the survey, a constant flight altitude was maintained to minimise parallax errors between cameras and to ensure that the homography conditions between image planes remained stable across the dataset.
This homogeneous acquisition condition is fundamental for later applying a single transformation matrix between sensors, without needing to recalculate it for each image pair. For this purpose, a technology based on a digital terrain model (DTM) with a resolution of 5 meters, obtained from LIDAR data, was used.
A Real-Time Kinematic (RTK) module was integrated into the drone in network mode to receive corrections from nearby permanent stations, thereby enhancing the positional accuracy of the captured images. This differential correction helps to reduce systematic geolocation errors and enhances geometric consistency among RGB images. While not essential to the proposed pipeline, the inclusion of accurate positioning supports more efficient photogrammetric orientation and can be relevant in applications requiring absolute georeferencing. To ensure maximum accuracy in the survey operations, 15 artificial checkerboard targets were uniformly distributed across the area of interest, ten of which were used as Ground Control Points (GCPs) and five as Check Points (CPs). The coordinates of these points were acquired using a GNSS antenna operating in network RTK (nRTK) mode. In accordance with the European INSPIRE directive (Technical Guidelines Annex I - D2.8.I.1), the geodetic reference system used for these operations was ETRS89, based on the GRS80 ellipsoid, which in Italy corresponds to the ETRF2000 realization (epoch 2008.0).
During the image acquisition, key environmental variables were also recorded for subsequent thermal processing. Ambient temperature and relative humidity were measured on site, as these parameters directly influence the calibration of thermal data. The estimated emissivity of the predominant material in the scene was also identified, based on reference tables, along with the average distance between the sensor and the object. These factors influence the apparent temperature measured by the thermal sensor. All this information was integrated into subsequent stages to refine radiometric interpretation and ensure that thermal values were physically meaningful.
The set of RGB images was processed using Agisoft Metashape Professional, following a standard photogrammetric workflow that included image orientation, structure optimisation, and dense image matching. The intrinsic and distortion matrices previously obtained during sensor calibration were incorporated into this process to correct optical deformations before reconstruction.
Figure 9 presents a flowchart summarising the overall procedure followed during this stage.
Only the images from the wide-angle RGB camera were used in this stage. The resulting point cloud provides a high-resolution three-dimensional representation, which will later be enriched with thermal information. This reconstruction also allows for the extraction of camera projections needed to determine which images were observed at each point in the cloud, which is essential for thermal value assignment in the following steps. An example of the outcome of this stage is shown in
Figure 10.
5.6. Exporting Point Cloud Reprojections
Following the generation of the dense point cloud described in the previous section, the next step involves exporting the image-space reprojections of each 3D point onto the set of calibrated RGB images. This process is essential for the subsequent thermal mapping stage, as it establishes a geometric correspondence between each 3D point in the model and its 2D reprojections in the original undistorted images.
To execute this step within Agisoft Metashape, the user must navigate to Scripts, 3D Thermal Mapping, STEP 6 - Export Reprojections. This opens a dedicated dialog where the user can choose whether to include all cameras in the export or restrict the output to a subset of selected cameras. Although the original dataset contains both RGB and thermal imagery, this step exclusively considers the RGB cameras, as they define the reference coordinate space for projection and colour mapping.
For each 3D point in the dense cloud, the script computes the corresponding (x,y) coordinates in each image where the point is visible. Additionally, the script records two quality metrics for each point-camera pair:
the reprojection error, which quantifies the discrepancy between the observed and projected point positions, providing a measure of orientation reliability.
the visibility state, which indicates whether the point is occluded or lies within the image frame and field of view.
These two metrics are retained for use in the next step of the pipeline, where they inform the selection of valid image observations for assigning thermal values to each 3D point.
The results are exported in a structured HDF5 file, saved automatically within the project directory. This file encodes the point IDs, projection coordinates, camera references, occlusion, and reprojection errors in an efficient format that supports fast access and large-scale processing.
Figure 11 presents a schematic of this step, showing the mapping between the 3D dense cloud and the 2D reprojections across selected RGB images, along with a screenshot of the interface in Metashape.
5.7. Thermal Enrichment of Point Cloud
The final stage of the proposed workflow consists of enriching the three-dimensional point cloud with calibrated thermal information, assigning a temperature value to each 3D point previously reconstructed via multi-view photogrammetry. This step synthesises and connects all the previous components: geometric calibration, image undistortion, spectral matching, and thermal data extraction.
The procedure was fully implemented in Python and designed to run either as a standalone application or embedded within the Agisoft Metashape environment via its scripting interface. This allows automation of the entire process, minimising manual errors and maintaining traceability between geometry and radiometric data. To launch the final stage of the workflow within Metashape, the user must navigate to
Scripts, 3D Thermal Mapping, STEP 7 - Thermal Enrich Point Cloud. This action opens a simple dialog that allows the user to select the temperature value smoothing technique and initiate the thermal mapping process. No additional input is required at this stage, as all necessary data (undistorted imagery, thermal matrices, point reprojections, homography matrix, and camera parameters) have been prepared in previous steps.
Figure 12 shows a flowchart of the implemented algorithm along with a partial view of its integration within the Agisoft interface.
The process relies on the camera projection data (calculated on the previous step,
Section 5.6) stored in Agisoft Metashape, which makes it possible to determine, for each point in the cloud, which RGB images observed it and what its corresponding image-plane coordinates are. Using the RGB intrinsic and distortion matrices obtained during calibration (
Section 5.1), the projected coordinates are corrected, yielding their exact position in the undistorted RGB image.
Next, the homography matrix between RGB and thermal (calculated in
Section 5.3) is applied to project the RGB coordinates into the thermal image system. If the transformed point lies within the valid boundaries of the thermal image, it is first reprojected from the undistorted coordinate system back into the original form (using the camera’s intrinsic and distortion matrices calculated on
Section 5.1), only then, the corresponding value in the calibrated temperature matrix (
Section 5.4) is retrieved and assigned to the 3D point as an additional scalar attribute.
It is important to note that a single 3D point in the dense cloud may be projected onto multiple RGB images and, by extension, onto several corresponding thermal images via homography. To resolve this ambiguity and ensure the consistency of the thermal attribution, a visibility filtering and weighting strategy was implemented. This mechanism leverages the occlusion state and reprojection error values computed during the reprojection export stage (see
Section 5.6) to determine whether a given point should be considered visible and geometrically reliable in a particular view. Reprojections flagged as occluded or exhibiting high reprojection error are automatically excluded from the fusion process to avoid erroneous temperature assignments.
For the remaining valid reprojections, where multiple views are available for a single point, a weighted aggregation of temperature values is performed. Each temperature is weighted based on the Euclidean distance between the image centre and the projected pixel coordinates (x,y). This approach favours pixels near the optical axis, where lens distortion is minimal and radiometric accuracy is typically higher. By applying this distance-based weighting, the process reduces peripheral artefacts and improves the spatial coherence of the thermal enrichment.
This strategy enables robust fusion of thermal data across multiple views, ensuring that each point in the cloud receives a reliable and physically meaningful temperature value, even in scenarios where redundancy or partial occlusion is present.
Furthermore, due to the difference in resolution between cameras, there may be significant divergence in the spatial density of pixels. That is, several RGB pixels might project onto the same thermal cell. To address this discrepancy, various smoothing and thermal interpolation strategies were evaluated, including:
no interpolation, assigns the nearest thermal pixel value directly.
linear interpolation, averages values from nearby pixels.
bilinear interpolation, estimates the local plane using the four closest neighbours.
trilinear interpolation, considers a constrained 3D neighbourhood.
adaptive contour-preserving smoothing, applies a local mask that respects thermal discontinuities.
Each of these strategies offers advantages depending on the required level of accuracy and the regularity of the thermal pattern. In this work, bilinear interpolation was used by default, as it provides a good trade-off between radiometric fidelity and visual smoothness without excessive computational cost.
The outcome of this stage is a dense, thermally enriched point cloud in which each point contains not only RGB colour, 3D position, and normal vector, but also a scalar temperature value. This multispectral 3D model serves as a foundation for inspection tasks, structural pathology monitoring, or energy documentation of built heritage assets.