Figure 1.
Comparison of point clouds with 10 m × 10 m footprints from USGS 3DEP aerial LiDAR (2015–2018, orange) and UAV LiDAR (2023–2024, green) over the same locations. UAV LiDAR captures significantly greater structural detail, especially in fine-scale canopy features. The bottom-left example shows clear canopy loss between surveys due to recent disturbance. The top-left shows clear growth
Figure 1.
Comparison of point clouds with 10 m × 10 m footprints from USGS 3DEP aerial LiDAR (2015–2018, orange) and UAV LiDAR (2023–2024, green) over the same locations. UAV LiDAR captures significantly greater structural detail, especially in fine-scale canopy features. The bottom-left example shows clear canopy loss between surveys due to recent disturbance. The top-left shows clear growth
Figure 2.
Locations of the Southern California UAV LiDAR surveys—Sedgwick Reserve–Midland School in the Santa Ynez Valley and Volcan Mountain Wilderness Preserve in the Peninsular Range.
Figure 2.
Locations of the Southern California UAV LiDAR surveys—Sedgwick Reserve–Midland School in the Santa Ynez Valley and Volcan Mountain Wilderness Preserve in the Peninsular Range.
Figure 3.
The first study area combines parts of the Sedgwick Reserve and Midland School property in the San Rafael Mountain foothills, Santa Barbara County. It features diverse vegetation such as oak woodlands, chaparral, grasslands, and coastal sage scrub. Within this area, four UAV LiDAR sites were surveyed, covering a total of 70 hectares.
Figure 3.
The first study area combines parts of the Sedgwick Reserve and Midland School property in the San Rafael Mountain foothills, Santa Barbara County. It features diverse vegetation such as oak woodlands, chaparral, grasslands, and coastal sage scrub. Within this area, four UAV LiDAR sites were surveyed, covering a total of 70 hectares.
Figure 4.
Volcan Mountain study area (197 hectares) in the Peninsular Range of Southern California, encompassing diverse vegetation types across an elevation range of 1220 to 1675 meters. Thirty percent of the site (58 hectares) was set aside for testing/validation in three zones: post-fire chaparral in the north, mixed conifer and riparian vegetation with oak and chaparral in the center, and oak woodland with semi-continuous canopy in the south.
Figure 4.
Volcan Mountain study area (197 hectares) in the Peninsular Range of Southern California, encompassing diverse vegetation types across an elevation range of 1220 to 1675 meters. Thirty percent of the site (58 hectares) was set aside for testing/validation in three zones: post-fire chaparral in the north, mixed conifer and riparian vegetation with oak and chaparral in the center, and oak woodland with semi-continuous canopy in the south.
Figure 5.
The overall multimodal upsampling architecture. Key components include Local–Global Point Attention Blocks, modality-specific imagery encoders for processing optical (NAIP) and radar (UAVSAR) data, and a cross-attention fusion module for combining imagery and LiDAR features.
Figure 5.
The overall multimodal upsampling architecture. Key components include Local–Global Point Attention Blocks, modality-specific imagery encoders for processing optical (NAIP) and radar (UAVSAR) data, and a cross-attention fusion module for combining imagery and LiDAR features.
Figure 6.
Flowchart of the Local–Global Point Attention Block (LG-PAB), the core computational unit used across the feature extraction, expansion, and refinement stages. The block applies local attention via a multi-head Point Transformer, optional upsampling with learned position offsets, and global multi-head Flash Attention for broader spatial context. A feed-forward MLP follows both the local and global attention blocks.
Figure 6.
Flowchart of the Local–Global Point Attention Block (LG-PAB), the core computational unit used across the feature extraction, expansion, and refinement stages. The block applies local attention via a multi-head Point Transformer, optional upsampling with learned position offsets, and global multi-head Flash Attention for broader spatial context. A feed-forward MLP follows both the local and global attention blocks.
Figure 7.
Architecture diagram of the imagery encoders. Image stacks from NAIP (optical) and UAVSAR (radar) pass sequentially through Patch Tokenization, Transformer Encoder blocks for spatial context modeling, and a Temporal GRU-Attention Head for temporal aggregation.
Figure 7.
Architecture diagram of the imagery encoders. Image stacks from NAIP (optical) and UAVSAR (radar) pass sequentially through Patch Tokenization, Transformer Encoder blocks for spatial context modeling, and a Temporal GRU-Attention Head for temporal aggregation.
Figure 8.
Distribution of Chamfer distance reconstruction errors comparing raw input data to 3DEP LiDAR-only baseline model. Horizontal boxplots show median (red line), interquartile range (colored boxes), and 10th-90th percentile whiskers. The baseline model demonstrates substantial improvement, reducing median error from 0.858 m to 0.340 m with a significantly tighter distribution. Outliers beyond the 90th percentile are excluded for clarity.
Figure 8.
Distribution of Chamfer distance reconstruction errors comparing raw input data to 3DEP LiDAR-only baseline model. Horizontal boxplots show median (red line), interquartile range (colored boxes), and 10th-90th percentile whiskers. The baseline model demonstrates substantial improvement, reducing median error from 0.858 m to 0.340 m with a significantly tighter distribution. Outliers beyond the 90th percentile are excluded for clarity.
Figure 9.
A comparison of baseline (3DEP LiDAR only) model output vs input
Figure 9.
A comparison of baseline (3DEP LiDAR only) model output vs input
Figure 10.
Distribution of Chamfer distance reconstruction errors across point cloud upsampling models. Horizontal boxplots show median (red line), interquartile range (colored boxes), and 10th-90th percentile whiskers for models trained with different input modalities. All models demonstrate substantial improvement over the LiDAR-only baseline, with the fused model achieving the lowest median error (0.298 m) and tightest distribution. Outliers beyond the 90th percentile are excluded for clarity.
Figure 10.
Distribution of Chamfer distance reconstruction errors across point cloud upsampling models. Horizontal boxplots show median (red line), interquartile range (colored boxes), and 10th-90th percentile whiskers for models trained with different input modalities. All models demonstrate substantial improvement over the LiDAR-only baseline, with the fused model achieving the lowest median error (0.298 m) and tightest distribution. Outliers beyond the 90th percentile are excluded for clarity.
Figure 11.
Example tile where no major vegetation change occurred between the 3DEP aerial LiDAR (2015–2018) and UAV LiDAR (2023–2024). The optical+SAR fusion model more accurately recovers fine-scale canopy structure compared to the LiDAR-only model, producing a point cloud that more closely matches the UAV LiDAR reference (lower Chamfer Distance).
Figure 11.
Example tile where no major vegetation change occurred between the 3DEP aerial LiDAR (2015–2018) and UAV LiDAR (2023–2024). The optical+SAR fusion model more accurately recovers fine-scale canopy structure compared to the LiDAR-only model, producing a point cloud that more closely matches the UAV LiDAR reference (lower Chamfer Distance).
Figure 12.
Example tile illustrating vegetation structure change between legacy aerial LiDAR (2015–2018) and recent UAV LiDAR (2023–2024). The optical+SAR fusion model accurately reconstructs the canopy loss visible in the UAV LiDAR reference, whereas the LiDAR-only model retains outdated structure from the earlier survey. This highlights the value of multi-modal imagery in correcting legacy LiDAR and detecting structural change.
Figure 12.
Example tile illustrating vegetation structure change between legacy aerial LiDAR (2015–2018) and recent UAV LiDAR (2023–2024). The optical+SAR fusion model accurately reconstructs the canopy loss visible in the UAV LiDAR reference, whereas the LiDAR-only model retains outdated structure from the earlier survey. This highlights the value of multi-modal imagery in correcting legacy LiDAR and detecting structural change.
Figure 13.
Relationship between point cloud reconstruction error (Chamfer distance) and net canopy height change since the original LiDAR survey. Scatter points show individual sample tiles (N=5,687) with LOWESS trend lines for each model; outliers above the 99.5th percentile for both error and height change metrics were excluded for visualization clarity. Negative values represent canopy losses while positive values represent gains. The baseline model (blue) shows substantially higher error rates for large canopy losses compared to models incorporating additional remote sensing data (NAIP optical, UAVSAR radar, and fused approaches), while all models perform similarly for canopy gains.
Figure 13.
Relationship between point cloud reconstruction error (Chamfer distance) and net canopy height change since the original LiDAR survey. Scatter points show individual sample tiles (N=5,687) with LOWESS trend lines for each model; outliers above the 99.5th percentile for both error and height change metrics were excluded for visualization clarity. Negative values represent canopy losses while positive values represent gains. The baseline model (blue) shows substantially higher error rates for large canopy losses compared to models incorporating additional remote sensing data (NAIP optical, UAVSAR radar, and fused approaches), while all models perform similarly for canopy gains.
Figure 14.
Comparison of the standard 2x upsampling model (optical+SAR, 6.8 million parameters) output versus a preliminary high-density 8x model (optical+SAR, 125 million parameters)
Figure 14.
Comparison of the standard 2x upsampling model (optical+SAR, 6.8 million parameters) output versus a preliminary high-density 8x model (optical+SAR, 125 million parameters)
Figure 15.
Example of vegetation growth reconstruction (using the experimental 8x upsampling model for better visual clarity). The model successfully infers new canopy structure (center) that is absent in the input 3DEP LiDAR but present in the recent UAV LiDAR reference.
Figure 15.
Example of vegetation growth reconstruction (using the experimental 8x upsampling model for better visual clarity). The model successfully infers new canopy structure (center) that is absent in the input 3DEP LiDAR but present in the recent UAV LiDAR reference.
Figure 16.
A second example of vegetation growth reconstruction with the 8x model.
Figure 16.
A second example of vegetation growth reconstruction with the 8x model.
Figure 17.
Example of the standard 2x fusion model correctly identifying vegetation loss. The model removes vegetation present in the outdated input LiDAR (left) to match the state shown in the recent UAV LiDAR reference (right), highlighting the value of multi-modal fusion for change detection.
Figure 17.
Example of the standard 2x fusion model correctly identifying vegetation loss. The model removes vegetation present in the outdated input LiDAR (left) to match the state shown in the recent UAV LiDAR reference (right), highlighting the value of multi-modal fusion for change detection.
Table 1.
UAV LiDAR sites used as ground truth for model development
Table 1.
UAV LiDAR sites used as ground truth for model development
| Site |
Hectares |
Location |
Collection Date |
Model Use |
| 1 |
38 |
Sedgwick Reserve |
30 Jun 2023 |
Training |
| 2 |
12 |
Midland School |
23 Oct 2023 |
Training |
| 3 |
9 |
Midland School |
23 Oct 2023 |
Test/Validation |
| 4 |
11 |
Midland School |
28 Sep 2023 |
Test/Validation |
| 5 |
197 |
Volcan Mountain |
25 Oct 2024 |
70% Training
30% Test/Validation |
Table 2.
Summary of remote sensing datasets used in the data stack.
Table 2.
Summary of remote sensing datasets used in the data stack.
| Data Source |
Role in Study |
Details |
Revisit Rate |
| UAV LiDAR |
Reference Data |
>300 pts/m2Acquired: 2023–2024
|
N/A |
| 3DEP Airborne LiDAR |
Sparse Input |
Sedgwick: ∼22 pts/m2 (2018) Volcan: ∼6 pts/m2 (2015–16) |
N/A |
| UAVSAR (L-band) |
Ancillary Input |
6.17 m GSD Coverage: 2014–2024
|
∼2-3 years |
| NAIP (Optical) |
Ancillary Input |
0.6–1 m GSD Coverage: 2014–2022
|
∼2 years |
Table 3.
Timeline of NAIP and UAVSAR acquisitions for both study sites.
Table 3.
Timeline of NAIP and UAVSAR acquisitions for both study sites.
| |
Volcan Mountain |
Sedgwick Reserve |
| Year |
NAIP (GSD) |
UAVSAR (# Looks) |
NAIP (GSD) |
UAVSAR (# Looks) |
| 2014 |
May (1 m) |
Jun (3), Oct (2) |
Jun (1 m) |
Jun (8) |
| 2016 |
Apr (60 cm) |
— |
Jun (60 cm) |
Apr (6) |
| 2018 |
Aug (60 cm) |
Oct (2) |
Jul (60 cm) |
— |
| 2020 |
Apr (60 cm) |
— |
May (60 cm) |
— |
| 2021 |
— |
Nov (2) |
— |
— |
| 2022 |
Apr (60 cm) |
— |
May (60 cm) |
— |
| 2023 |
— |
— |
— |
Sep (6)a
|
| 2024 |
— |
— |
— |
Oct (2) |
|
aPart of the NASA FireSense initiative. |
Table 4.
Dataset preparation
Table 4.
Dataset preparation
| Subset |
Tiles |
| Training |
24 000 original + 16 000 augmented = 40 000 |
| Validation |
3 792 |
| Test |
5 688 |
Table 5.
Model variants evaluated
Table 5.
Model variants evaluated
| Variant |
Active encoders |
Cross-attention on |
Parameters |
| Baseline |
LiDAR only |
None |
4.7M |
| + NAIP |
LiDAR, optical |
NAIP tokens |
5.9M |
| + UAVSAR |
LiDAR, radar |
UAVSAR tokens |
5.8M |
| Fused |
LiDAR, optical, radar |
Concatenated token |
6.8M |
Table 6.
Model-configuration parameters
Table 6.
Model-configuration parameters
| Parameter |
Value |
Notes |
| Core geometry |
| Point-feature dimension |
256 |
— |
| KNN neighbours (k) |
16 |
Used in local attention graph |
| Upsampling ratio () |
2 |
Doubles point density per LG-PAB expansion |
| Point-attention dropout |
0.02 |
Dropout inside global attention heads |
| Attention-head counts |
| Extractor — local / global |
8 / 4 |
Extra local heads help expand feature set |
| Expansion — local / global |
8 / 4 |
Extra local heads aid point upsampling |
| Refinement — local / global |
4 / 4 |
— |
| Imagery encoders |
| Image-token dimension |
128 |
Patch embeddings for NAIP & UAVSAR encoders |
| Cross-modality fusion |
| Fusion heads |
4 |
— |
| Fusion dropout |
0.02 |
— |
| Positional-encoding dimension |
36 |
— |
Table 7.
Training protocol and hardware
Table 7.
Training protocol and hardware
| Setting |
Value |
| Hardware |
4 × NVIDIA L40 (48 GB) GPUs under PyTorch DDP |
| Optimizer |
ScheduleFreeAdamW [54]; base LR , weight-decay , ; no external LR schedule |
| Loss function |
Density-aware Chamfer distance (Eq. 9 in [55]),
|
| Batch size |
15 tiles per GPU |
| Epochs |
100 |
| Gradient clip |
|
| Training time |
≈ 7 h per model variant |
| Model selection |
Epoch with lowest validation loss |
Table 8.
Descriptive Statistics for Chamfer Distance Across All Model Variants (see
Table 5)
Table 8.
Descriptive Statistics for Chamfer Distance Across All Model Variants (see
Table 5)
| Model |
Mean CD (m) |
Median CD (m) |
Std Dev (m) |
IQR (m) |
| Input |
2.568 |
0.858 |
6.852 |
1.540 |
| Baseline |
1.043 |
0.340 |
5.717 |
0.465 |
| NAIP |
0.993 |
0.316 |
5.542 |
0.421 |
| UAVSAR |
0.924 |
0.331 |
5.505 |
0.437 |
| Fused |
0.965 |
0.298 |
5.753 |
0.393 |
Table 9.
RQ1 & RQ2: Impact of Single and Fused Modalities on Reconstruction Error
Table 9.
RQ1 & RQ2: Impact of Single and Fused Modalities on Reconstruction Error
| Comparison |
Median Change (%) |
Effect Size |
| RQ1: Single Modality vs. Baseline |
| NAIP vs Baseline |
0.5 (p<0.001) |
0.088 |
| UAVSAR vs Baseline |
0.3 (p<0.001) |
0.062 |
| RQ2: Fused Modality vs. Best Single Modality |
| Fused vs NAIP |
0.7 (p<0.001) |
0.133 |
Table 10.
RQ3: Correlation Between Reconstruction Error and Canopy Height Change
Table 10.
RQ3: Correlation Between Reconstruction Error and Canopy Height Change
| Model |
Spearman
|
p-value |
| Baseline |
0.650 |
p<0.001 |
| NAIP |
0.612 |
p<0.001 |
| UAVSAR |
0.628 |
p<0.001 |
| Fused |
0.582 |
p<0.001 |
| Baseline vs NAIP (z) |
3.377 |
p<0.001 |
| Baseline vs UAVSAR (z) |
1.999 |
p=0.046 |
| Baseline vs Fused (z) |
5.868 |
p<0.001 |
Table 11.
RQ3 Extended: Correlation Between Reconstruction Error and Canopy Height Changes (Gains vs. Losses)
Table 11.
RQ3 Extended: Correlation Between Reconstruction Error and Canopy Height Changes (Gains vs. Losses)
| Model |
Canopy Gains (N=2423) |
Canopy Losses (N=3264) |
| |
Spearman |
p-value |
Spearman |
p-value |
| Baseline |
0.601 |
p<0.001 |
0.671 |
p<0.001 |
| Naip |
0.586 |
p<0.001 |
0.621 |
p<0.001 |
| Uavsar |
0.597 |
p<0.001 |
0.637 |
p<0.001 |
| Fused |
0.587 |
p<0.001 |
0.580 |
p<0.001 |
| Baseline vs Naip (z) |
0.825 |
p=0.409 |
3.440 |
p<0.001 |
| Baseline vs Uavsar (z) |
0.233 |
p=0.816 |
2.406 |
p=0.016 |
| Baseline vs Fused (z) |
0.768 |
p=0.442 |
6.074 |
p<0.001 |