2. Materials and Methods
2.1. Description of the Study Area
Two experimental sites were selected in Shanxi Province, China: the Yuci Lifang Experimental Station (37°51′N, 112°45′E) and the Shanxi Agricultural University Paotuan Experimental Station (37°25′N, 112°36′E), hereafter referred to as LF and PT, respectively. Located in a temperate continental semiarid climate zone, the two sites lie approximately 60 km apart. The soils are classified as cinnamon soils (Calcaric Fluvisols), with an organic matter content of 1.4–1.6%. The region has an annual precipitation of 400–500 mm, an annual mean temperature of 9.5–10.8 °C, an annual sunshine duration of 2,000–3,000 hours, and an annual evaporation of about 1,500–2,300 mm. The experimental fields lie at elevations of 800–900 m above sea level, with a frost-free period of 120–220 days and moderate to relatively high soil fertility. Maize was planted as the previous crop at both stations, creating favorable residual conditions for foxtail millet cultivation.
A single-year field trial (May–October 2023) was conducted at the PT station, covering an area of 3,100 m². Meanwhile, two consecutive years of field trials (May–October 2023 and May–October 2024) were carried out at the LF station, with a trial area of 2,800 m². The two-year dataset from LF provided critical information for cross-year model validation, while the combined trials at both stations supported the construction and evaluation of cross-regional canopy monitoring models.
2.2. Field Experiment Design
The foxtail millet cultivar “Jingu 21” was selected for this study. Planting was carried out with row spacing of 25 cm and plant spacing of 10 cm, in accordance with local standard production practices. Water and fertilizer management, as well as pest and disease control, followed standard agronomic protocols to ensure normal crop growth.
Observations covered key growth stages, including seedling emergence, jointing, heading, grain filling, and maturity. During each growing season in 2023 and 2024, measurements were conducted approximately eight times at regular intervals. For each measurement, six representative quadrats (each 50 cm × 50 cm) were randomly chosen in the field. Within each quadrat, 6–9 millet plants were selected, and their positions were recorded using a high-precision M9 GPS (manufactured by Shanghai Huace Navigation Technology Co., Ltd. , Shanghai, China) to ensure accurate correspondence between the spectral data and actual phenotypic measurements. For each selected plant, measurements were taken of leaf moisture content, SPAD chlorophyll index, and leaf area index (LAI). At the end of the experiment, a total of 200 valid datasets were obtained from PT 2023, LF 2023, and LF 2024, respectively, resulting in a total of 600 valid spectral datasets paired with manually measured phenotypic data of millet plants.
2.3. UAV-Based Multispectral Data Acquisitions
UAV platform (DJI Mavic 3 Multispectral, manufactured by Shenzhen DJI Technology Co., Ltd., Shenzhen, China) equipped with a 4/3-inch visible CMOS sensor and four multispectral CMOS sensors was employed to acquire imagery in four key bands: red (650 nm, 16 nm bandwidth), green (560 nm, 16 nm bandwidth), red-edge (730 nm, 16 nm bandwidth), and near-infrared (860 nm, 26 nm bandwidth). The flight altitude was set at 65 m, with forward and side overlaps of 70% and 80%, respectively, to ensure comprehensive field coverage and high-resolution data acquisition. All flights were conducted between 9:00 AM and 11:00 AM under clear, low-wind conditions to minimize variations in illumination.
Before and after each flight, images of a gray reference board (approximately 0.3 reflectance) and a white reference board (approximately 0.5 reflectance) were captured under similar lighting conditions to determine the reference reflectance for each spectral band. The gain and offset for each band were then calculated based on these calibration images, and pixel-wise radiometric corrections were applied to align the raw images with the reference reflectance. By comparing calibration data collected from multiple flights on the same day and on different dates, consistency was maintained across diverse regions and years.
To further reduce the impact of environmental light fluctuations, cloud interference, and sensor parameter drift, the raw multispectral images underwent radiometric calibration and Z-score normalization. This process yielded calibrated reflectance data that more closely represent the crop’s intrinsic (i.e., “true”) spectral characteristics, thereby improving the accuracy with which subsequent models capture the crop’s physiological status and ensuring a reliable basis for comparison with ground-based measurements.
Finally, the raw multispectral images were processed in DJI Terra (developed by Shenzhen DJI Technology Co., Ltd., Shenzhen, China) to perform image mosaicking, geometric distortion correction, and orthorectification. By incorporating ground control points (GCPs) or using RTK-GPS assistance, the planar positioning error of the orthomosaic was limited to within 1–2 pixels.
2.4. Ground Truthing and Phenotyping
To obtain accurate phenotypic data for millet plants during the growth period and to align these measurements with UAV remote sensing information, the following major canopy parameters were measured in the field:
LAI was measured using an LAI-2200C canopy analyzer or a comparable scanning method (LAI-2200C manufactured by LI-COR, Inc., Lincoln, Nebraska, USA). Plant density or ground cover was considered to calculate the LAI per unit area, reflecting both the crop’s growth status and photosynthetic potential.
- 3)
Chlorophyll Content (SPAD)
A portable chlorophyll meter (CM 1000 Chlorophyll Meter, Spectrum Technologies, Inc., Aurora, Illinois, USA) was used to measure the top four functional leaves from each selected millet plant. Each measurement was repeated 3–5 times, and the average value was recorded. The SPAD readings indicate the chlorophyll content of the leaves and can be used to assess the plant’s photosynthetic capacity.
- 4)
Canopy Leaf Moisture Content (CLMC)
Simultaneously, the top four functional leaves from each selected millet plant were sampled and immediately sealed in plastic bags. In the laboratory, the fresh weight (Wf) was measured, after which the leaves were placed in an oven at 105 °C for 30 minutes, then dried at 80 °C until a constant weight (Wd) was achieved. Leaf moisture content was calculated using Equation (1):
2.5. Data Preprocessing and Vegetation indices
After radiometric calibration and orthorectification, pixel-level reflectance values were extracted from the four original bands (green, red, red-edge, and near-infrared). Eleven common vegetation indices (Table 1) were then calculated to capture variations in crop chlorophyll content, nitrogen status, and canopy structure.
A total of 15 input variables—including the four multispectral bands plus 11 vegetation indices—were ultimately compiled. Each variable was standardized using the Z-score method to reduce dimensional disparities and improve model stability.
Table 1 presents the formulas and references for the 11 vegetation indices employed in this study.
2.6. Model Construction and Evaluation Metrics
In this study, three types of models were selected: linear and regularized regression, tree-based models, and neural networks. Linear and regularized regression included Lasso regression and Ridge regression, both of which have low computational cost and are straightforward to interpret [
20,
36]. To determine the optimal regularization parameters (e.g., α for Ridge and Lasso), we performed a grid search over a predefined set of values (e.g., 0.01, 0.1, and 1.0) combined with 5-fold cross-validation, selecting the setting that minimized the validation RMSE. The tree-based models included Decision Tree, Random Forest, XGBoost, and LightGBM, which can capture nonlinear features and are easily parallelized [
16,
31]. For these algorithms, key hyperparameters such as maximum tree depth, number of trees, and learning rate (for boosting models) were tuned via grid search and cross-validation. For instance, we tested max_depth from 4 to 10 (in increments of 2), learning_rate values of {0.01, 0.05, 0.1}, and n_estimators of {100, 300, 500}. We then selected the final configuration based on minimizing RMSE and MRE on the validation set. Neural networks primarily used a Multilayer Perceptron (MLP) architecture. In this study, we adopted two hidden layers, each with 64 neurons, using the ReLU activation function and an Adam optimizer[
19,
50]. The batch size (32 or 64) and dropout rate (0.2 or 0.5) were chosen by comparing validation errors under multiple runs, ensuring that the model avoided overfitting in smaller datasets.
The coefficient of determination (R²) quantifies how well the model fits observed data, with values approaching 1 indicating stronger explanatory power. Mean Relative Error (MRE) and Maximum Relative Error (MaxRE) represent the average and maximum deviation between predicted and observed values, respectively. The Root Mean Square Error (RMSE) measures how closely predictions conform to actual values (lower RMSE indicates higher predictive accuracy). Additionally, 1:1 Scatter Plots provide a direct comparison between predicted and observed outcomes, while Cumulative Error Distribution Plots illustrate the distribution of errors over a range of values. By leveraging these metrics, we systematically assessed both the accuracy and robustness of the models for canopy traits such as CLMC, SPAD, and LAI across diverse environments and growing seasons, addressing the need for broad spatial and temporal extrapolation.
2.7. Cross-Location and Cross-Year Experimental Scheme
To thoroughly evaluate the model’s spatial extrapolation capability and temporal robustness, the following multi-level experiments and validation strategies were adopted:
- 1)
Single-Location Modeling
Models were independently trained and evaluated using data from Yuci Lifang (2023), Paotuan (2023), and Yuci Lifang (2024), respectively, to assess performance under site-specific conditions.
- 2)
Cross-Location Extrapolation
A model trained on the 2023 data from the Yuci Lifang site was validated on the 2023 data from the Paotuan site (or vice versa), to evaluate the model’s transferability between different geographic locations.
- 3)
Cross-Year Extrapolation
The 2023 data from the Yuci Lifang site were used for training and validated on the 2024 data from the same site, assessing model robustness across different years in the same region. Alternatively, combined data from Yuci 2023 and Paotuan 2023 can be used to train the model and validated on Yuci 2024, allowing for a comparison of predictive improvements gained by data fusion.
- 4)
Multi-Location and Multi-Year Fusion Modeling
Data from multiple sites and different years (e.g., Paotuan 2023 + Yuci 2023 + Yuci 2024) were merged and uniformly radiometrically corrected to build a “universal model.” Independent tests or cross-validation on each subset were then conducted to examine improvements in model generality and stability contributed by data fusion.