2. Data and Methods
A. Chandrayaan-2 IIRS level-1 data
The input data consists of IIRS level-1 calibrated radiance products typically provided in the PDS4-compliant QUB format, which is a Band-Sequential (BSQ) multi-band image. This data represents the calibrated radiance values acquired by IIRS. Accompanying each QUB file is an XML metadata file, which contains supporting information like image corner coordinates, which have been used in our algorithm.
B. LRO-WAC global mosaic
The basemap employed for reference in this study is the LRO-WAC global lunar mosaic. This is a publicly available seleno-referenced dataset, provided in GeoTIFF format. For our purpose, the LRO-WAC mosaic serves as the geometric reference standard. The specific Coordinate Reference System (CRS) of the WAC mosaic (i.e., Simple Cylindrical Moon) is extracted directly from the file and used as the target CRS for the seleno-referenced IIRS output.
The core objective of this study is to achieve a significantly high geometric accuracy by directly registering the IIRS image to the corresponding LRO-WAC image using the image-to-image feature-matching technique. It is hypothesized that this feature-based registration will effectively compensate for the accumulated errors in spacecraft pointing, orbit, and sensor modelling, yielding a seleno-referenced IIRS product that exhibits a substantially improved spatial alignment with the LRO-WAC standard.
C. Region of Interest (ROI) definition and pre-processing
A single band is selected from the IIRS hyperspectral data cube for processing. In this case, an ideal band should have a good signal-to-noise ratio (SNR) and surface features that are easily visible. Moreover, the IIRS level-1 datasets have an incorrect spatial orientation of the image array relative to the seleno-graphic area it represents, as defined by the metadata. Prior to feature-matching against the WAC mosaic, the algorithm performs an orientation-check on the IIRS strips using the XML metadata. Wherever necessary, an IIRS strip automatically undergoes an orientation correction, such that misalignments, if any, could be identified and removed by the algorithm.
The extracted IIRS seleno-graphic corner coordinates (longitude and latitude) are then used to extract the pixel coordinates of the corresponding WAC ROI pixel coordinates (column, row). Further, a minimum bounding box (in WAC pixel coordinates) is determined that encompasses all transformed IIRS corners. Padding is then applied to this bounding box, to ensure that the extracted WAC region sufficiently overlaps the IIRS strip. Then, the padded pixel coordinates are clipped to the valid dimensions of the WAC mosaic (0 to RasterXSize-1, 0 to RasterYSize-1) to prevent out-of-bounds requests. The final ROI is defined as a tuple (, , , ) representing the rows and columns to extract from the WAC mosaic.
Both the IIRS band array and the extracted WAC ROI array are divided into smaller, overlapping square patches or tiles. A grid size and an overlap percentage are defined. The coordinates (, , , ) for each tile within its respective full array (rescaled IIRS or WAC ROI) are stored. Tiles < 1/2(grid size) are discarded.
Each IIRS and WAC patch is individually normalized. Invalid pixels are masked out. The 2
nd and 98
th percentiles of the valid pixel values are calculated. These percentiles define the robust minimum (
) and maximum (
) intensity range, mitigating the effects of extreme outliers. Valid pixels are then clipped to this range and then linearly scaled to the floating-point range [0.0, 1.0] using the following equation:
The means and standard deviations of these 0-1 normalized valid pixels are calculated and stored along with the percentile values (stats). Patches with insufficient valid pixels or near-constant values (low percentile range) are skipped.
The initially normalized IIRS patch (in its 0-1 float representation) is adjusted to match the statistics of the normalized WAC patch. Using the calculated means (
,
) and standard deviations (
,
) from the initial normalization step, the value of each valid normalized IIRS pixel (
) is adjusted using the following equation:
If of the IIRS patch ~ 0, the adjustment is skipped, and the IIRS patch that was initially normalized is used.
Both the potentially adjusted and normalized IIRS float patch and the normalized WAC float patch are clipped to the [0.0, 1.0] range and then scaled to [0, 255], converting them to the uint8 format, suitable for many OpenCV feature detectors.
D. The Scale-Invariant Feature Transform (SIFT) algorithm for automated feature-matching
The SIFT algorithm is selected as the primary feature detection and matching algorithm due to its demonstrated robustness in prominent computer vision and remote sensing applications [
6], [
7]. SIFT’s particular invariance to scale, rotation, and illumination changes makes it suitable for matching features between hyperspectral imagery acquired under varying geometric and illumination conditions. Thus, the algorithm’s ability to detect distinctive features across different spatial resolutions is crucial for this application, given the resolution difference between IIRS and LRO-WAC datasets.
Previous studies, such as, [
8], have demonstrated SIFT’s effectiveness in planetary surface feature-matching applications, particularly for lunar terrain where traditional correlation-based methods often fail due to the uniform albedo and subtle topographic variations of the lunar surface. The multi-scale nature of SIFT detection ensures a robust feature identification across the scale differences inherent in multi-sensor lunar datasets [
7].
The implementation employs SIFT’s standard two-stage approach – the initial keypoint detection followed by descriptor computation. Keypoint detection identifies candidate interest points based on scale-space extrema using the Difference of Gaussians (DoG) approach across multiple octaves. The descriptor computation stage calculates 128-dimensional feature vectors for each detected keypoint, potentially refining the keypoint locations and rejecting unstable features. This separation enables the differentiation between detection failures (insufficient features in the imagery) and descriptor computation failures (unstable feature characteristics), facilitating quality assessment and troubleshooting.
Feature correspondence is established using a brute-force matcher with L2 norm and k-nearest neighbour search (k=2) to enable ratio testing for ambiguous match filtering. A Lowe’s ratio test with a 0.75 threshold retains matches only when the nearest-neighbour distance is less than 75% of the second nearest neighbour distance, effectively removing correspondences in repetitive lunar terrain patterns like crater fields. Also, a geometric verification using a RANSAC-based homography estimation is done, which requires a minimum of 8-point correspondences with a 3.0-pixel re-projection threshold to account for sensor resolution and terrain characteristics [
8].
Additionally, the homography quality is assessed to reject degenerate transformations that indicate false correspondences or inappropriate geometric models. This multi-stage filtering approach ensures a robust feature-matching despite the challenging visual characteristics of lunar surface imagery.
SIFT-derived correspondences serve as the foundation for GCP generation. The robustness of the SIFT matching process directly impacts the overall seleno-referencing quality, as the spatial distribution and accuracy of the extracted GCPs determine the fidelity of the final geometric correction applied to IIRS hyperspectral data.
For a given IIRS tile, the WAC tile that yields the highest number of inliers after the filtering steps is considered as the "best-match." If a best match with inliers > 0 is found for an IIRS tile, for each inlier match in the IIRS source coordinate, we get the keypoint location (, ) within the rescaled IIRS tile. We then add the tile’s top-left offset to get the coordinate (, ) within the fully rescaled IIRS array. Subsequently, we invert the initial flip operations. In the case of LRO-WAC, for each inlier match in the WAC target coordinate, we obtain keypoint locations , within the WAC tile, then add the WAC tile’s top-left offset within the ROI to get the coordinate within the extracted WAC ROI array. Then, we store the pair (, , longitude, latitude) as a potential GCP.
E. Output
All the generated GCPs (, , longitude, latitude) from the successful tile matches are aggregated into a list. This list is written to a CSV file in the output directory. The CSV file contains the header (, , longitude, latitude) matching the format expected by the subsequent seleno-referencing script.
F. Georeferencing transformation
A minimum distance of 15-20 pixels is specified, which acts as a filter to ensure a more even spatial distribution of the GCPs across the IIRS image extent. The minimum and maximum ( and ) coordinates are determined from the input GCPs. A grid is conceptually overlaid on the IIRS pixel-space, with cell sizes equal to minimum distance. For each grid cell containing one or more GCPs, the Euclidean distance of each point within the cell to the cell’s centre is calculated. Only the GCP closest to the centre of each occupied cell is retained. This effectively removes the clustered points while preserving spatial coverage.
Then we establish the number of GCPs that are available for each image. We then apply the condition:
and check if a sufficient number of GCPs are available (> 20). This statistical filter is then applied to remove those points that deviate significantly from the general geometric trend. The Euclidean distance (residual error) between the actual target coordinates and the predicted target coordinates are calculated for each GCP. The means and the standard deviations of these residual errors are then computed. The Z-score for each GCP’s residual error is also calculated from (3), such that:
GCPs with a Z-score exceeding a defined threshold are considered as outliers and are removed from the list. A geometric transformation model, such as a polynomial, is selected for warping. Higher-order polynomials can model more complex distortions but require more well-distributed GCPs, which risk overfitting.
The gdal.warp operation creates the final georeferenced GeoTIFF file directly. The final output file contains the IIRS data warped to the WAC mosaic’s CRS at the spatial resolution of IIRS. Using the algorithm described above, the time required to process a single IIRS strip can vary between ~two to ~five minutes, depending on the length of the strip. Additionally, the output generates a GeoTIFF image containing 250 hyperspectral bands. The methodology described to implement the seleno-referencing process has been summarized in a flowchart (Figure 1).
Figure 1.
A flowchart depicting the methodology applied to seleno-reference an IIRS image using the SIFT algorithm.
Figure 1.
A flowchart depicting the methodology applied to seleno-reference an IIRS image using the SIFT algorithm.
G. Validation
After the application of the steps mentioned above, the resultant seleno-referenced IIRS GeoTIFF file is overlaid onto the LRO-WAC mosaic within a GIS environment. A careful visual inspection of the spatial alignment of prominent, unambiguous features such as crater rims, central peaks, rilles, and ridges across the entire extent of the IIRS strip is carried out. Care is taken to check for any systematic offsets, rotations, or local distortions.
A set of well-defined Independent Check Points (ICPs) that are clearly visible and precisely locatable are manually identified in both the original IIRS image and the WAC mosaic. A good distribution of ICPs across the IIRS strip is crucial.
3. Results and Discussion
A total of ~200 IIRS strips were processed using a SIFT-based algorithm, which is described in Section II (E). The large pool of IIRS strips allowed us to create a database using which we were able to quantitatively assess the accuracy of the seleno-referenced dataset and estimate the associated errors. Our analysis of the processed IIRS dataset reveals that the algorithm has worked well on the IIRS images and has taken care of the different solar incidence angles that each IIRS strip possesses. We chose only those IIRS strips whose acquisition times ranged between the lunar sunrise (~07:00 hours local lunar time) and the lunar sunset (~19:00 hours local lunar time), as well as between the latitudes of ±55°. The 12-hour time-window has allowed us to analyze IIRS strips across different times of a lunar day and assess whether the time of acquisition (and consequently, the solar incidence angles) has any impact on the performance of the algorithm.
Figure 2 shows an example of a sample of four separate and processed IIRS strips that pass primarily over the lunar maria (a-d). RMS errors (RMSE_total) calculated (in metres) for each strip have been shown in
Table 1. The estimated RMS errors for the strips passing over the maria range between ~67.5 m to ~82.5 m, indicating that for three of the strips (Figure 2 a, b, d), the RMS errors are within the resolution of a single IIRS pixel (~80 m), which further highlights the efficacy of the algorithm. Similarly, Figure 3 shows four processed IIRS strips that pass primarily over parts of the lunar highlands (a-d). For the highlands example, the estimated RMS errors were found to range between ~64 m to 66.6 m, two of which are from those IIRS strips which are less than the resolution of a single IIRS pixel (a, b), while the other two (c, d) have values such as 83.6 m and 84.5 m, respectively (
Table 1). The mean RMS error for the eight strips (~73 m) was also found to be below the resolution of an IIRS pixel. The details regarding the RMS errors estimated for the eight IIRS strips are provided in
Table 1 and Figure 4.
A specific cause that could be attributed to the range of the measured RMS errors could not be found. However, one possible cause seems to be more relevant than the other possible causes – a difference in the solar incidence angles. The LRO-WAC dataset (which has been used a reference image in this study) has a range of solar incidence angles (between 48°-72° at the equator) [
10], which are relatively higher than IIRS and hence, generally do not match the IIRS values (~20° to ~43° between the latitudes of 5° S to 28° N). It is important to note that IIRS essentially has comparatively lower incidence angles across most of the ~200 strips that we have analyzed in this study. Consequently, based on our analysis, it was found that the differences create potential conditions for misidentification of the features on the lunar surface for the SIFT algorithm. Such occurrences could lead to an error equivalent to approximately one IIRS pixel (60 m < error < 90 m) near the equator, potentially increasing to up to three IIRS pixels (240 m < error < 300 m) upwards of ±40° latitude. However, in spite of the large differences in the incidence angles of IIRS and WAC, the SIFT algorithm is able to identify surface features from WAC and match them accurately for a majority of the IIRS images.
Figure 2.
Four examples of IIRS strips seleno-referenced using the SIFT algorithm. (a) to (d) show the IIRS strips that are from the lunar maria. Extents of the IIRS strips are indicated by white rectangles. Basemap is LRO-WAC. For the corresponding IIRS strip IDs (a-d), please refer to
Table 1.
Figure 2.
Four examples of IIRS strips seleno-referenced using the SIFT algorithm. (a) to (d) show the IIRS strips that are from the lunar maria. Extents of the IIRS strips are indicated by white rectangles. Basemap is LRO-WAC. For the corresponding IIRS strip IDs (a-d), please refer to
Table 1.
Figure 3.
Four examples of IIRS strips seleno-referenced using the SIFT algorithm. (a) to (d) show the IIRS strips that are from the lunar highlands. Extents of the IIRS strips are shown in white for (a-c), and in black for (d). Basemap is LRO-WAC. For the corresponding IIRS strip IDs, please refer to
Table 1.
Figure 3.
Four examples of IIRS strips seleno-referenced using the SIFT algorithm. (a) to (d) show the IIRS strips that are from the lunar highlands. Extents of the IIRS strips are shown in white for (a-c), and in black for (d). Basemap is LRO-WAC. For the corresponding IIRS strip IDs, please refer to
Table 1.
In order to understand whether latitude plays a role in the seleno-referencing process, three separate IIRS strips were scrutinized in detail (Figure 5). One of the images (CH2_IIR_NCI_20240201T1026227387_D_IMG_D18) extends from ~60° S up to 20° S. RMS errors have been computed at every few degrees in latitude along the extent of the image. The estimated RMSE values ranged between ~1,040 m at 60° S to ~70 m at 47° S and ~65 m at 22° S. This indicates that with increasing latitude (beyond ±55°), RMS errors increase exponentially. This is due to the fact that beyond ±55°, the curvature of the Moon tends to become more prominent, which affects the images captured by sensors like IIRS as well as those captured by LRO, and is therefore a limitation of this study. Similarly, the second image (CH2_IIR_NCI_20240422T1522156952_D_IMG_D18), which spans from 40° N up to 70° N, has similar RMSE values, such that at lower latitudes the value is less than IIRS’s spatial resolution, whereas at its maximum extent, it was found to be ~2,000 m. Therefore, it is evident that at higher latitudes (beyond ±55°), RMS errors are exceptionally higher as compared to the lower latitudes. The third image (CH2_IIR_NCI_20231117T1110268671_D_IMG_D18) extends from ~8° S to ~32° N. The corresponding RMS errors estimated from this image are <80 m for all instances. Hence, according to the values in Figure 5, it was found that latitude does play a role for the current seleno-referencing process which impacts the derived accuracy of the IIRS strips according to their latitudinal extents.
Figure 4.
A plot of RMS errors computed from the eight IIRS strips as shown in
Table 1. For IIRS figure IDs mentioned in the figure above, please refer to the ‘Remarks’ column in
Table 1.
Figure 4.
A plot of RMS errors computed from the eight IIRS strips as shown in
Table 1. For IIRS figure IDs mentioned in the figure above, please refer to the ‘Remarks’ column in
Table 1.
Figure 5.
A plot of RMS errors plotted against different latitudes (60° S to 70° N) using three IIRS strips.
Figure 5.
A plot of RMS errors plotted against different latitudes (60° S to 70° N) using three IIRS strips.