Cross-Domain Transferability of Foliar Nitrogen Prediction in Sugarcane (<em>Saccharum officinarum</em>) Through the Integration of UAV and Simulated Spectral Data

Izabelle de Lima e Lima; Marta Laura de Souza Alexandre; Ana Karla da Silva Oliveira; Rodnei Rizzo; Carlos Augusto Alves Cardoso Silva; Peterson Ricardo Fiorio

doi:10.20944/preprints202604.0003.v1

2.7. Methodological Sequence of the Simulation and Modeling Technique

Figure 4. Flowchart of the spectral calibration process between hyperspectral (FieldSpec) and multispectral (Phantom 4) data. The hyperspectral data undergoes preprocessing involving Savitzky-Golay (SG) filtering and Multiplicative Signal Correction (MSC), followed by spectral convolution using Gaussian curves, DS and PDS were applied, along with cross-sensor calibration and scalar correction to the master, with evaluation based on performance metrics (R², RMSE, MAE, RPIQ).

2.7.1. Spectral Convolution

For each plot, reflectance was represented by the median of the readings, generating a representative hyperspectral signature for each plot. Next, the spectra were cropped to the 400–900 nm range, with a continuous spectral resolution of 1.4 nm, in order to reduce interference outside the region of interest, standardize the analyzed spectral domain, and enable more consistent simulations across sensors. Subsequently, spectral preprocessing procedures were applied to minimize instrumental noise and light interference, using Multiplicative Signal Correction (MSC) and the Savitzky-Golay (SG) filter [27], implemented in RStudio [28] (version 4.5.0). MSC was employed to correct additive and multiplicative variations associated with light scattering among the samples, while the SG filter was used to smooth the spectral signatures, preserving the overall behavior of the reflectance curve. The MSC formulas are presented in Equations (1), (2), and (3).

{Spectra}_{avg} = \frac{\sum_{1}^{n} {Spectra}_{i}}{n}

(1)

Spectrai = Ki x Spectraavg + bi

(2)

SpectraMSC, i = \frac{Spectrai - bi}{ki}

(3)

Where:

Spectra_avg corresponds to the average of all leaf spectra;

n represents the total number of spectral data points;

Spectrai refers to each individual leaf spectrum;

ki and bi are correction coefficients obtained by linear regression from Spectraavg;

SpectraMSC,i corresponds to the spectrum corrected using the MSC method.

Next, spectral convolution was applied to reproduce the response of a multispectral sensor from the hyperspectral data, a procedure widely used in cross-validation and time-series integration [29]. For the DJI Phantom 4 Multispectral (P4M), the simulated reflectance of each band was calculated as a weighted average of the FieldSpec® 3 reflectance using spectral response functions (Equation 4), approximated by Gaussian curves, parameterized by the central wavelengths and full width at half maximum (FWHM) values provided by the manufacturer (Blue 450 ± 16 nm; Green 560 ± 16 nm; Red 650 ± 16 nm; RedEdge 730 ± 16 nm; NIR 840 ± 26 nm).

It should be noted that the spectral response functions (SRFs) of the full P4M bands are not published as detailed normalized curves by the manufacturer; therefore, the Gaussian approximation constitutes a methodological limitation. To mitigate potential biases resulting from this simplification and to align instruments, we applied Direct Standardization (DS) and Window-Based Direct Standardization (PDS) after the simulation.

R_{b} \frac{Σ_{i ​ = 1}^{n} R (λ i ​) SRFb (λ i ​) Δ λ i ​}{Σ_{i ​ = 1}^{n} SRFb ​ (λ i ​) Δ λ i ​}

(4)

Where:

Rb is the simulated reflectance of band;

b, λi are the wavelengths measured by the FieldSpec;

Δλi the spectral resolution (1.4 nm), and the denominator ensures normalization, and

SRFb (λi) corresponds to the spectral response function of band b.

2.7.2. Transfer Calibration Using (DS/PDS)

Cross-calibration between spectral domains was performed using Direct Standardization (DS) and Window Direct Standardization (PDS), classical calibration transfer techniques [30,31]. The DS method is based on establishing a global spectral transfer matrix derived from the mathematical relationship between the spectra of a set of standard samples measured by the master instrument and the spectra measured by the target instrument.

On the other hand, PDS is better at capturing variations dependent on the local wavelength, i.e., by windows. From these coefficients, a transfer matrix is constructed and applied to the spectra of the target instrument in order to maximize similarity with the spectra of the reference (master) instrument.

In the context of Direct Partial Standardization (DPS), the window width (ω) serves as a central hype UAV rameter, responsible for balancing the trade-off between local accuracy associated with narrow windows and numerical robustness favored by wider windows; it is therefore defined empirically [32]. In this work, the methodological strategy consisted of integrating DS and PDS, adopting a window width corresponding to five wavelengths, aligned with the number of evaluated bands.

This choice allowed for consistent local corrections, reduced the dimensionality in each window, and contributed to greater numerical stability. Thus, the spectral segmentation of the PDS maintained the flexibility inherent in the DS, while minimizing the number of hype UAV rameters to be adjusted [33].

In this study, the multispectral sensor mounted on the UAV was defined as the master, since it serves as the spectral reference and the target calibration scale. The actual bands captured by the UAV represent the comparison standard, such that all transformations aim to align the simulated spectra with this domain. Thus, the quality of the standardization process was evaluated based on the correspondence between the corrected data and the original measurements of the master.

The hyperspectral spectrum obtained in the laboratory, after convolution with the estimated spectral response functions (SRFs), was defined as the target, as it represents the source to be adjusted to the master. The sequential application of DS, PDS, and scalar adjustment aims to transform the simulated spectra, reducing instrumental discrepancies and maximizing similarity with the actual bands of the UAV. Thus, the (hyperspectral) target corresponds to the corrected spectral domain, whose performance is continuously evaluated relative to the master (UAV).

2.7.3. Sensor-to-Sensor Calibration

After applying the standardization techniques, sensor-to-sensor calibration was performed as a complementary step to adjust individual scales for each band, using the same anchor samples (plots) employed in the calibration process. This procedure aimed to definitively align the simulated values with the scale of the UAV multispectral sensor, correcting residual differences in both offset and scale. As a result, the transformed spectra showed greater quantitative agreement with the actual UAV measurements.

2.7.4. Nitrogen-Sensitive Vegetation Indices

The spectral bands extracted from the sugarcane orthomosaic were used to calculate spectral indices (SI), defined as transformations obtained by combining two or more multispectral bands. In this study, the selection of these indices was guided by their higher correlation with foliar nitrogen content (FNC), as they can capture variations related to pigmentation, photosynthetic activity, and canopy structure [34,35].

To ensure comparability between the datasets, the spectral indices were calculated based on the spectral bands extracted from the buffer of each experimental plot. For the simulated data, the same spatial and spectral representation was adopted; that is, the values were organized to correspond to the same plots and the same set of bands considered in the observed data. This ensured standardization of the unit, allowing for consistent comparisons. Furthermore, the same indices were calculated for both domains (Table 2), which ensured methodological uniformity and greater robustness in the evaluation of TNF.

2.8. Spearman Correlation

To investigate the relationship between the bands obtained by UAV, the simulated bands, and the spectral indices derived from both sources, we used Spearman’s correlation coefficient, a widely used nonparametric technique for measuring the strength and direction of monotonic associations between variables. This coefficient is particularly suitable when the data do not meet the assumptions of normality or when the relationship between variables is not strictly linear, and it is calculated based on the ranking of the observed values. Its values range from –1 to 1, where positive coefficients indicate direct monotonic associations, negative coefficients indicate inverse associations, and absolute values closer to 1 represent stronger relationships between the analyzed variables [40,41].

2.9. Modeling of Nitrogen Content Using Machine Learning

2.9.1. Linear Regression

Linear regression is one of the simplest and most widely used statistical techniques, designed to model the relationship between a dependent variable (IE) and an independent variable (TFN). This technique assumes that there is a linear relationship between the variables and is commonly tested using the p-value associated with the model’s slope coefficient, where values of p<0.001 indicate that this relationship is statistically significant [42]. Linear regression provides a simple and effective basis for estimating foliar nutrients based on spectral data, allowing for the quantification of the strength and direction of the association between (IE) and TNF.

2.9.2. Partial Least Squares Regression (PLSR)

The Partial Least Squares Regression (PLSR) machine learning technique was used with the NIPALS algorithm. This technique was used to estimate nutritional content based on the reflectance of multispectral (canopy) and simulated (foliar) data. PLSR is widely used in predictive analyses with spectral data, as it projects the original variables into a reduced-dimensional space composed of latent variables (or factors). During the calibration process, the model simultaneously integrated information from correlated independent variables (spectral bands and indices) and dependent variables (N content), seeking to identify the minimum number of factors necessary to explain the variability in the responses. Furthermore, PLSR stands out for its ability to mitigate multicollinearity among independent variables. This characteristic is especially relevant in hyperspectral data, where the strong correlation between adjacent bands compromises the application of conventional methods. In addition to overcoming this challenge, PLSR extracts latent components that condense the variability of the spectra and simultaneously explain multiple variables of interest, resulting in robust and stable models, as has already been demonstrated in various studies [4,43].

Excessive use of latent components may lead the model to interpret noise as relevant signal, resulting in inferior predictive performance, with increasing overfitting effects as more factors are included than necessary [44]. Thus, a well-balanced model should combine quality and predictive power, making it suitable for use in remote sensing studies focused on plant nutrition. To determine the optimal number of factors, K-fold cross-validation was applied with the aim of minimizing the “Root Mean PRESS” statistic. In addition, a variable importance in projection (VIP) analysis was performed, calculated based on the loading weights and the variance explained by the selected components, allowing for the identification of the variables that contributed most to nitrogen prediction in the model’s latent space.

2.9.3. Random Forest (RF)

The Random Forest (RF) model was applied; it is widely used in high-dimensional data processing and is one of the most commonly adopted methodologies for handling hyperspectral data [35]. RF is an algorithm developed by Breiman [45], frequently used independently for regression and classification problems. This method employs a bootstrap aggregation or bagging process, in which each tree is trained on a subsample of the initial dataset. Thus, each tree contributes an individual prediction. The combination of these predictions resulted in the final estimate [46]. During the prediction process, the model used the average or majority of the predictions, ensuring a balance in the error of each tree. This improved the model’s generalization ability [47].

The RF model operates based on the definition of several main parameters: the number of trees (ntree), the number of predictor variables considered for each tree (mtry), and the minimum size of terminal nodes (nodesize) [48,49]. In this study, some of these parameters were manually defined as follows: number of trees (ntree = 1000), number of variables considered at each split (mtry = 11), minimum terminal node size (min.node.size = 20), and split criterion based on variance reduction (splitrule = "variance"). These hypeUAVrameters were defined according to the criteria suggested by Chen et al. [44] in order to avoid unsatisfactory model performance.

2.9.4. Performance Indicators

The quality of the models for predicting TNF in sugarcane was assessed by comparing the observed values with those predicted by the models. For these analyses, validation metrics such as the coefficient of determination (R²), root mean square error (RMSE), mean absolute error (MAE), and the ratio of performance to interquartile range (RPIQ) were used, whose mathematical expressions are presented in Equations 5, 6, 7, and 8. According to Rodrigues et al. [50], R² classifies prediction models as: poor (≤0.50), moderate (0.50–0.65), good (0.65–0.80), very good (0.80–0.90), and excellent (≥0.90).

R ² = \frac{[Σ (γ p - \underline{γ} p) . (γ o - \underline{γ} o)] ²}{[Σ (γ p - \underline{γ} p) ² . (γ o - \underline{γ} o) ²]}

(5)

RMSE = \sqrt{\frac{\sum_{i = 1}^{n} ({\hat{y_{i}} - y_{i})}^{2}}{n}}

(6)

MAE = \frac{1}{n} \sum_{i = 1}^{n} |X_{i} - X|

(7)

RPIQ = \frac{{IQR}_{obs}}{RMSE}

(8)

To ensure the robustness and generalization capability of the models, two complementary validation strategies were adopted. The first employed k-fold cross-validation (k = 10), in which the dataset of 240 observations was divided into ten balanced subsets; in each iteration, nine subsets were used for training and one for validation, ensuring that all data were tested at least once. The second strategy applied the hold-out method, splitting the data into 70% for training and 30% for independent testing. This combined approach enables a comprehensive evaluation of predictive performance while mitigating the risk of bias associated with data partitioning.

3. Results

3.1. Leaf Nitrogen Content (LNC) and Precipitation Throughout the Season

Figure 5 shows precipitation throughout the 2024/2025 sugarcane growing season, along with nitrogen data for the respective samples collected at 220, 290, and 350 days after planting. The phenological cycle of sugarcane consists of four stages: (i) pre-emergence from 0 to 90 days, (ii) vegetative development from 90 to 180 days, (iii) vegetative growth from 180 to 270 days, and (iv) maturation from 270 to 360 days. Among these stages, vegetative development is the one with the highest nutrient concentrations in the plantation [24]. A water deficit was observed during the initial months of the 2024 harvest season, from June to October, corresponding to a phase of extreme importance for the crop’s vegetative development.

Precipitation patterns throughout the sugarcane growing cycle were consistent with the crop’s physiological demands at each phenological stage. Although a slight water deficit was observed at the beginning of the growing season, the crop showed satisfactory development in later stages. This behavior is consistent with the variation in foliar nitrogen content across sampling periods, since, during vegetative growth, increased water availability promotes photosynthetic activity, canopy expansion, and N uptake [17].

In contrast, the reduction in leaf N content at 290 and 350 DAC, at the onset of maturation, may reflect both physiological changes inherent to the progression of the growth cycle and the internal redistribution of the nutrient, with a decrease in its concentration in the leaves and greater allocation of assimilates to the stems [51].

3.2. Spearman Correlation Between Original and Simulated Bands and Indices in Relation to LNC

In this study, Spearman’s correlation coefficient (r) was used to assess the association between N content and the reflectance values of the original and simulated bands, as well as the spectral indices (SI) calculated from these bands. The analysis focused on the Blue, Green, Red, RedEdge, and NIR spectral regions due to their greater interaction with chlorophyll and pigments directly involved in the photosynthetic process and, consequently, closely related to TNF. Thus, the DAC dataset was considered, which enabled a more robust assessment of the relationship between the variables under different conditions, as illustrated in Figure 6.

The spectral bands and (IE) showed consistent associations with foliar nitrogen (N) content, with negative correlations in the visible spectrum and positive correlations in the near-infrared. The spectral bands and (SI) showed associations in different directions, with negative correlations in the visible spectrum and positive correlations in the near-infrared. In the bands obtained by UAV, for example, the “r” values were −0.32, −0.49, −0.54, and −0.22 for Blue, Green, Red, and RedEdge, respectively, while in the NIR, r = 0.63.

The simulated bands reproduced a similar correlation pattern, but with more pronounced magnitudes (B1SIM r = −0.56; B2SIM r = −0.58; B3SIM r = −0.58; B4SIM r = −0.55; B5SIM r = 0.63), indicating consistency between the simulation and the original data. This result indicates a stronger association between chlorophyll content and higher photosynthetic activity, which increases absorption in the VIS region, particularly in the Blue, Green, and RedEdge regions. In contrast, the signal in the NIR is more influenced by multiple scattering linked to the internal structure of the leaf and canopy architecture, which may support the positive correlation observed in this spectral region.

The (SI) indices showed a similar correlation with TNF in the assessed domains, where for NDVI, r = 0.72 and simulated r = 0.79, for VARI, r = 0.77 and simulated r = 0.78; for ChlRe, r = 0.73 and simulated r = 0.84; and for ENDVI (r = 0.62; simulated r = 0.61). Furthermore, the indices derived from the simulated bands generally showed slightly higher correlations than those obtained with the original data, which is consistent with the origin of the bands: since they are constructed from narrow-band spectral information, they tend to better preserve subtle spectral features and, consequently, respond to small variations related to chlorophyll and N.

Finally, the comparison between indices calculated from actual UAV data and those derived from simulated data revealed the following correlations: NDVI vs. simulated (r = 0.78), VARI vs. simulated (r = 0.84), ChlRe vs. simulated (r = 0.82), and ENDVI vs. simulated (r = 0.85). Taken together, these results indicate high agreement between the indices obtained from the bands measured by the UAV and those generated from the simulated bands, suggesting that the simulation adequately preserved the spectral behavior necessary to reproduce the patterns captured by the indices and, consequently, reinforcing the consistency and usefulness of both datasets.

3.3. Exploratory Estimation of Nitrogen Concentration in Sugarcane Using Vegetation Indices from UAV Data and Derived from the Simulated Dataset

Figure 7 shows the linear regression between N concentration (%) and the spectral indices (SI) used in this study (NDVI, VARI, ChlRe, and ENDVI), calculated from actual UAV data and their corresponding simulated (SIM) values in relation to their phenological stage at the DACs. These SIs summarize differences in reflectance in specific regions of the spectrum and, therefore, serve as compact descriptors of the biophysical and biochemical state of the vegetation.

The indices calculated from the canopy’s spectral response were statistically significant (P < 0.001) for TNF estimation, with performance ranging from “good” to “moderate.” The NDVI, for example, showed R² = 0.68 and RMSE = 1.04, followed by the other indices (VARI, R² = 0.67, RMSE = 1.06; ChlRe, R² = 0.65, RMSE = 1.08; ENDVI, R² = 0.51, RMSE = 1.27). The EIs obtained from the spectral response of the simulated data reproduced the trend observed in the EIs derived from UAV data, but with more clearly defined associations regarding leaf N content and the different DACs. This behavior can be explained by the nature of the simulated data, whose origin favors a more sensitive spectral response and one less subject to canopy interference, allowing for the capture of variations related to the leaf’s nutritional status with greater discrimination. In this dataset, NDVI (R² = 0.71 and RMSE = 0.98), VARI (R² = 0.72 and RMSE = 0.97), and ChlRe (R² = 0.74 and RMSE = 0.92) performed satisfactorily, while ENDVI showed lower performance (R² = 0.61; RMSE = 1.14).

3.4. Modeling of Nitrogen Content Using a UAV-Derived Spectra and Simulated Data

The sugarcane TNF prediction model was developed using original multispectral data and simulated spectral data obtained during the stump development phase. For modeling, two widely established algorithms in spectral analysis were employed: Partial Least Squares Regression (PLSR) [52] and Random Forest (RF) [53].

In this context, both models demonstrated good predictive capability for the UAV data (Figure 8B). The PLSR model showed “good” performance, with an R² of 0.75, an RMSE of 0.92 g kg⁻¹, an MAE of 0.73, and an RPIQ of 3.15. Similarly, the RF model also showed satisfactory performance, with an R² of 0.76, an RMSE of 0.89 g kg⁻¹, an MAE of 0.70, and an RPIQ of 3.26.

Similar results were also observed for the models calibrated using the simulated data, where PLSR yielded R² = 0.75, RMSE = 0.90 g kg⁻¹, MAE = 0.72, and RPIQ = 3.21, while RF generated R² = 0.74, RMSE = 0.92 g kg⁻¹, MAE = 0.75, and RPIQ = 3.14 (Figure 9C and 9B).

Overall, the performance of the models calibrated using data collected from the crop canopy via UAV and data simulated from leaf spectral curves showed similar values across all indicators and performed well, indicating that the methodology employed was able to satisfactorily reproduce the spectral behavior observed in the real data.

3.5. Variable Importance in Projection

Analysis of the Variables of Importance in Projection (VIP) values for the PLSR model (Figure 10A and B) revealed the variables that contributed most to the prediction of TFN. In the original canopy data, the Red, Blue, and RedEdge bands stood out in that order of importance, in addition to the ChlRe and VARI indices. For the simulated data, the most relevant variables include the Blue, RedEdge, and NIR bands, as well as the VARI index, in that order. In both datasets, these variables exceeded the heuristic threshold of 0.8, indicating an above-average contribution to explaining TFN variability.

In contrast, the importance of variables by permutation was estimated by randomizing the values of each predictor, followed by an assessment of the impact of this change on the model’s overall performance, as shown in (Figure 11 A and B). In the Random Forest (RF) model, this approach revealed a more concentrated distribution of predictive relevance across a smaller number of variables. In the UAV dataset, the NDVI, Clre, and VARI indices stood out, in addition to the Red band. In the simulated dataset, however, the greatest contributions were observed for the ENDVI_SIM and NDVI_SIM indices, as well as for the RedEdge and Green bands (B4 and B3_SIM), as shown in Figure 9C and 9D. This behavior is consistent with the structural differences between the evaluated models.

3.6. Independent Validation

In this step, the hold-out validation technique was employed, in which 70% of the data was used for training and 30% for validation on an independent dataset. This procedure was used to analyze the transferability of the models between the UAV (original) and simulated domains in predicting TFN (Figure 12), comparing the PLSR and Random Forest (RF) algorithms.

Panels A and B represent the models calibrated with the UAV bands and spectral indices and subsequently applied to the simulated bands and spectral indices. In turn, panels C and D present the reverse arrangement, in which the models were trained in the simulated domain and evaluated in the UAV domain, while maintaining the validation criterion.

The transferability of TNF models between domains (UAV and SIM), comparing PLSR and Random Forest in both transfer directions, yielded “moderate” results in scenarios (A, B, and C), with R² > 0.64, demonstrating that there is a stable relationship between the domains, though with a loss of accuracy associated with the change in spectral representation. Notably, the RF trained on SIM data and tested on UAV data showed the best performance in terms of explanation and error (R² = 0.70; RMSE = 0.99 g kg⁻¹; MAE = 0.78; RPIQ = 2.85), demonstrating greater robustness of the nonlinear model.

3.7. Spatialization and Mapping of Leaf Nitrogen Content

When comparing different modeling strategies for estimating TNF, it was found that the validated Random Forest (RF) model exhibited the best predictive performance, even when applied to a reduced dataset. Due to its robustness and flexibility in handling high-dimensional spectral data and spatial variability, the RF model was selected for the spatialization stage. Previous studies show that RF provides more accurate predictions than traditional linear methods, in addition to allowing the integration of multiple spectral bands and environmental variables [54].

In addition to demonstrating statistical stability, the model proved capable of spatializing foliar N content, which strengthens its applicability in pixel-by-pixel prediction of multispectral images. Using the global model, applied to each scenario (220, 290, and 350 DAC), it was possible to generate a map of the spatial distribution of N content in the experimental area, considering different phenological stages, as illustrated in Figure 13.

As shown in Figure 23, the estimated TFN in the leaf is represented by a color gradient, in which red tones indicate lower concentrations and green tones indicate higher concentrations, following the concentration scale. It is noted that N values decrease gradually throughout the growth stages, stem elongation, and the maturation stage; for example, at 220 DAC, the leaf N content ranges from 11.07 to 15.41 (%), while at 350 DAC, the leaf N content ranges from 9.01 to 12.94 (%). Furthermore, the spatial pattern obtained indicates good model performance, which maintained consistent predictions even when using simulated data, suggesting the ability to generalize across domains. This behavior is particularly relevant when the model is applied to high-resolution images, in which intra-plot variability tends to be more evident and, therefore, requires greater stability in the estimates.

4. Discussion

4.1. Analysis of the Correlations Between Original and Simulated Spectral Bands and Indices, and of the Linear Regression Applied to Spectral Indices for Predicting TFN

The relationships between spectral bands and spectral indices obtained at the canopy scale using a multispectral sensor mounted on a UAV, and those generated from simulated leaf data derived from hyperspectral data, were analyzed across different phenological stages. This approach is relevant, since spectral bands form the basis for the construction of mathematical indices, and their relevance stems from their association with specific absorption characteristics of photosynthetic pigments, especially chlorophyll [55].

The negative correlations observed for the Blue, Green, Red, and RedEdge bands, as well as the positive correlation for the NIR band (Figure 8), were verified in both datasets and reflect the spectral behavior of the sugarcane canopy, in which higher chlorophyll and foliar nitrogen contents are associated with greater absorption in the visible region and higher reflectance in the near-infrared [35]. However, since the sensor used is multispectral and operates with relatively broad bands (466, 576, 666, and 746 nm), the spectral response obtained by UAV tends to integrate and smooth out narrower spectral features. Therefore, in studies using hyperspectral data on sugarcane, the regions most sensitive to nitrogen are typically identified more specifically in the Green band, near 550 nm, and in the RedEdge region, between 680 and 750 nm [15,17].

Linear regression of spectral indices derived from the multispectral sensor showed similar performance in estimating TNF. Among these, the NDVI stood out, with the highest coefficient of determination (R² = 0.68) and the lowest error (RMSE = 1.04 g kg⁻¹). NDVI combines red and near-infrared (Red and NIR) bands and is sensitive to “green” intensity, canopy vigor, and leaf nitrogen status [56,57,58]. Authors such as Kumarasiri et al. [18] also reported good performance of the NDVI in predicting TNF (R² = 0.77), corroborating this finding; Li et al. [59] showed that incorporating ES from multispectral UAV data improves predictive models. Taken together, these results reinforce the potential of EIs, associated with N in the canopy.

Among the simulated indices, ChlRe showed the best fit for TNF (R² = 0.74; RMSE = 0.92 g kg⁻¹), followed by VARI (R² = 0.72; RMSE = 0.97 g kg⁻¹) and NDVI (R² = 0.71; RMSE = 0.98 g kg⁻¹). These findings corroborate those of Martins et al. [60], in which the authors highlight the high sensitivity of indices derived from hyperspectral data for sugarcane in predicting TNF. Complementarily, in a study conducted by Hassani et al. [61], evaluating corn and sorghum with re-calibrated spectral data, the NDVI derived from simulated UAV data yielded R² = 0.72 and RMSE = 0.22 g kg -¹ for sorghum, while for corn, ChlRe derived from ASD data yielded R² = 0.85 and RMSE = 0.18 g kg⁻¹, and NDVI derived from Landsat 8 OLI data yielded R² = 0.81 and RMSE = 0.21 g kg⁻¹. Taken together, these results reinforce that the use of indices such as ChlRe, VARI, and NDVI constitutes robust alternatives for nutritional prediction, especially when applied to simulated data.

4.2. Model Performance in Nitrogen Prediction Using a UAV and Simulated Data (PLSR and Random Forest)

The leaf is the primary light-absorbing organ and plays a central role in nitrogen assimilation; thus, its composition and structure directly influence the crop’s spectral response [62,63]. This influence is most evident in the bands associated with chlorophyll absorption, which tend to respond to variations in photosynthetic activity and, consequently, in the plant’s nutritional status [56]. In the present study, this relationship was initially observed in the correlation analysis (Figure 6) and subsequently confirmed by the VIP values of the PLSR model (Figure 10). In the original canopy data, the Red (r = -0.54) and Blue (r = -0.32) bands stood out both for the magnitude of the correlation and for their contribution to the model. Similarly, in the simulated data, this pattern was most evident for the Blue (r = -0.56) and RedEdge (r = -0.55) bands. This convergence between correlation and VIP reinforces the relevance of the visible and RedEdge regions for predicting foliar nitrogen content, in addition to indicating that, even under different acquisition conditions, the N-related spectral response was preserved [17,64].

This behavior helps explain the similar performance obtained by the models in both domains. PLSR yielded R² = 0.75, RMSE of 0.92 g kg⁻¹, MAE of 0.73, and RPIQ of 3.15 for the UAV data (Figure A), and R² = 0.75, RMSE of 0.90 g kg⁻¹, MAE of 0.72, and RPIQ of 3.21 for the simulated data (Figure C). The similarity between these metrics indicates that spectral simulation preserved the information necessary for nitrogen prediction without significant loss of performance [65]. These results corroborate previous studies that also reported good performance in estimating foliar nitrogen from spectral data. Li et al. [59], for example, obtained R² = 0.79 and RMSE = 0.11 g kg⁻¹ using PLSR on sugarcane based on LiDAR images. Lu et al. [66] observed similar results in winter wheat, with R² = 0.65 and RMSE = 0.13 g kg⁻¹. Hassani et al. [61], however, reported even better performance using simulated UAV data derived from a field hyperspectral sensor, with R² = 0.81 and RMSE = 0.21 g kg⁻¹. Taken together, these findings reinforce the potential of integrating different platforms and sensors for estimating foliar nitrogen in the same crop.

The Random Forest model showed similar predictive performance across both datasets, with slightly better results for the UAV data, where it achieved R² = 0.76, RMSE = 0.89 g kg⁻¹, MAE = 0.70, and RPIQ = 3.26. For the simulated data, the model achieved R² = 0.74, RMSE = 0.92 g kg⁻¹, MAE = 0.75, and RPIQ = 3.14. This slight advantage of the UAV data may be related to the RF’s ability to model nonlinear relationships between spectral variables and the plants’ nutritional status, more efficiently capturing the variability observed directly in the field [67]. The variable importance analysis reinforces this interpretation by indicating that the model concentrated its predictive power on a reduced set of key predictors. In the RF-UAV (Figure 11A), the NDVI and ChlRe indices stood out most prominently, followed by the Red band and the VARI index, highlighting the relevance of regions associated with chlorophyll, vegetative vigor, and visible light absorption for estimating foliar nitrogen content [17]. In RF-SIM (Figure 11B), the greatest contribution was observed for ENDVI_SIMULATED and NDVI_SIMULATED, followed by the B4_SIMULATED, B2_SIMULATED, and B3_SIMULATED bands, indicating that, even after spectral simulation, the model retained sensitivity to the spectral bands most closely related to the crop’s nutritional status.

These findings are consistent with recent literature on sugarcane. Li et al. [59] demonstrated the feasibility of estimating canopy N concentration using multispectral UAV imagery, finding a strong correlation between observed and predicted values in the best-performing model (R² = 0.79; RMSE = 0.11). Furthermore, Picado et al. [24], although using multiple linear regression, also confirmed the potential of remote sensing with the MicaSense RedEdge-P camera for mapping foliar nutrients in sugarcane, achieving 75% accuracy for nitrogen. In contrast, Soltanikazemi et al. [68] reported more moderate RF performance (R² = 0.59; RMSE = 0.08 g kg⁻¹), while Hassani et al. [61], working with simulated UAS data for corn, observed lower agreement in validation (R² = 0.38 ± 0.18; RMSE = 0.40 ± 0.08). This contrast reinforces the consistency of the results obtained in the present study and highlights the potential of spectral simulation as a promising strategy for nitrogen prediction, especially given the still limited number of studies using simulated data applied to sugarcane.

4.3. Validation

When comparing the performance of the models using hold-out validation (Figure 12), it was found that cross-domain transfer depended on the direction of training and testing. In PLSR modeling, scenario (A), calibrated with UAV data and validated in the simulated domain, yielded R² = 0.64, RMSE = 1.10 g kg⁻¹, MAE = 0.85, and RPIQ = 2.58, while scenario (C), fitted with simulated data and evaluated in the UAV domain, resulted in R² = 0.65, RMSE = 1.18 g kg⁻¹, MAE = 0.94, and RPIQ = 2.40. Although performance was similar between the two transfer directions, there was variation in the error metrics, indicating that the model’s generalization is sensitive to the domain used in calibration. This can be explained by the fact that PLSR requires large datasets and, when dealing with smaller or unknown datasets, may have limitations, functioning as a “black box” [69].

This finding is consistent with recent studies that highlight limitations in the transferability of PLSR models across different contexts, including spatial, temporal, and phenological variations [70,71]. Furthermore, studies using imaging spectroscopy have also shown that phenology and differences between domains can critically affect the robustness and applicability of these models outside the conditions under which they were calibrated [72,73].

In RF modeling, scenario (B), calibrated with UAV data and validated in the simulated domain, yielded R² = 0.64, RMSE = 1.09 g kg⁻¹, MAE = 0.85, and RPIQ = 2.59, while scenario (D), adjusted with simulated data and validated in the UAV domain, resulted in R² = 0.70, RMSE = 0.99 g kg⁻¹, MAE = 0.78, and RPIQ = 2.85. The significance of this finding lies in the incorporation of an alternative representation of the dataset through the simulation of UAV bands, contributing to the model calibration process that enhanced its generalization capacity with respect to real data.

Studies conducted by Pan et al. [74] reinforce the view that models built using simulated data derived from hyperspectral reflectance and data actually acquired by UAVs demonstrate that combining these domains can yield more stable models with better performance across different data sources. Similarly, Chen et al. [7], when comparing hyperspectral data obtained by UAVs with simulated multispectral data, including sensors onboard UAVs and satellites, for TNF estimation, observed that simulation constitutes a promising strategy for expanding the operational feasibility of applications, although it also influences predictive performance depending on the configuration of the training and test domains.

In comparison, the models trained on the simulated dataset exhibited better data distribution and, consequently, greater predictive power for TNF, possibly due to the higher sensitivity of the source dataset, as previously reported in studies using hyperspectral sensors [15,17,75]. Furthermore, it is estimated that approximately 75% of foliar nitrogen is allocated to chloroplasts [76,77]. In this context, measurements obtained with fewer external interferences, such as variations in lighting, shading, and leaf angle [78]. This combination tends to represent the relationship between the nutrient, the spectral response, and the aerial image more consistently.

4.4. Spatial Distribution of Nitrogen Content in Sugarcane

The correlation between the simulated spectral responses and TNF demonstrates that the simulated data preserve the spectral sensitivity inherited from the source dataset (hyperspectral). According to Silva et al. [15], hyperspectral data exhibit greater subtlety in the spectral response associated with N. This preserved spectral sensitivity not only enhances the predictive performance of the models but also expands their potential for spatial-scale application, enabling a shift from point-specific estimates of nitrogen content to its spatial distribution across the cultivated area [79,80,81].

The spatial distribution of nutrients is essential for guiding the application of variable-rate fertilizers [68]. Among nutrients, nitrogen monitoring is particularly challenging due to its high dynamics in the soil and rapid losses through volatilization and leaching [82]. In sugarcane, this relationship has been extensively explored through indirect measures of chlorophyll, such as SPAD readings and optical sensors, since chlorophyll tends to track variations in the plants’ nitrogen nutritional status [17,83].

The spatial analysis in this study showed that the behavior of the sugarcane canopy throughout the phenological stages is consistent with the so-called “nitrogen dilution effect,” according to which the concentration of N in tissues tends to decrease as the plant accumulates dry matter and progresses toward senescence [84].

In this context, assessments conducted at 220 DAC tend to represent a physiologically more active canopy, whereas at 290 and 350 DAC, they reflect a reduction in TNF associated with physiological changes in the plant. Thus, this relationship depends not only on N content but also on how canopy reflectance is determined by the interaction between the vegetation’s biochemical and structural attributes such as pigments, leaf moisture, leaf angle, and internal leaf structure which can control a significant portion of radiation absorption and the process of light transmission and scattering [85].

The behavior of the sugarcane canopy throughout the phenological stages is consistent with the so-called nitrogen “dilution effect,” according to which the concentration of N in plant tissues tends to decrease as the plant accumulates dry matter and advances through the cycle toward senescence [84].

5. Conclusions

This study provided valuable information for simulating data derived from hyperspectral spectra, to support the calibration and application of multispectral sensors mounted on UAVs for estimating TNF in sugarcane. The EIs derived from the simulated bands maintained an association with TNF for the respective samples, differing slightly from the EIs calculated directly from the UAV, in addition to showing high agreement among themselves, indicating that the simulation procedure preserves the spectral behavior necessary to reproduce patterns observed in flight.

The application of machine learning techniques, particularly RF, demonstrated predictive capability in both domains (UAV: R² = 0.76; Simulated: R² = 0.74) and consistent performance in independent validation, notably in the transfer scenario where the RF trained on the simulated data and tested on the UAV achieved the best result (R² = 0.70). Taken together, these findings indicate sufficient compatibility between spectral representations from different sources to support robust models, reinforcing the potential of integrating spectral simulation and machine learning as a strategy to expand the applicability of multispectral sensors in the field and guide future research involving multispectral and hyperspectral sensors coupled with UAVs.

Author Contributions

Conceptualization, I.d.L.e.L., M.L.d.S.A., A.K.d.S.O., R.R., C.A.A.C.S., and P.R.F.; methodology, I.d.L.e.L.; M.L.d.S.A., A.K.d.S.O., C.A.A.C.S., software, I.d.L.e.L., M.L.d.S.A.; validation, I.d.L.e.L., M.L.d.S.A., C.A.A.C.S., and R.R.; formal analysis, I.d.L.e.L., A.K.d.S.O., M.L.d.S.A.; investigation, R.R.; resources, P.R.F.; data curation, M.L.d.S.A., R.R and P.R.F.; writing—original draft preparation, I.d.L.e.L.; writing—review and editing, I.d.L.e.L., M.L.d.S.A.; visualization, R.R.; supervision, P.R.F.; project administration, P.R.F.; funding acquisition, P.R.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Luiz de Queiroz Agricultural Studies Foundation—FEALQ, Brazil. Additional support was provided by the São Paulo Research Foundation (FAPESP, grant 2024/10366-7) and by the Coordination for the Improvement of Higher Education Personnel (CAPES; Proc., 88887.027867/2024-00, 88887.146948/2025-00, 88887.993154/2024-00 and 88887.993148/2024-00).

Data Availability Statement

The data will be made available upon request.

Acknowledgments

To the Luiz de Queiroz Agricultural Studies Foundation (FEALQ) for funding the publication of this work. To DMLAB (Dinardo Miranda Agricultural Analysis Laboratory) for performing the chemical analyses. To the São Paulo Research Foundation (FAPESP) for their support.

Conflicts of Interest

The authors declare they are not aware of any conflicts of financial interest or personal relationships that could have influenced the work reported in this article.

References

Barros, P.P.d.S.; Fiorio, P.R.; Demattê, J.A.d.M.; Martins, J.A.; Montezano, Z.F.; Dias, F.L.F. Estimation of leaf nitrogen levels in sugarcane using hyperspectral models. Cienc. Rural. 2022, 52. [Google Scholar] [CrossRef]
Giordano, M.; Petropoulos, S.A.; Rouphael, Y. The Fate of Nitrogen from Soil to Plants: Influence of Agricultural Practices in Modern Agriculture. Agriculture 2021, 11, 944. [Google Scholar] [CrossRef]
Li, X.; Ba, Y.; Zhang, M.; Nong, M.; Yang, C.; Zhang, S. Sugarcane Nitrogen Concentration and Irrigation Level Prediction Based on UAV Multispectral Imagery. Sensors 2022, 22, 2711. [Google Scholar] [CrossRef] [PubMed]
Reyes-Trujillo, A.; Daza-Torres, M.C.; Galindez-Jamioy, C.A.; Rosero-García, E.E.; Muñoz-Arboleda, F.; Solarte-Rodriguez, E. Estimating canopy nitrogen concentration of sugarcane crop using in situ spectroscopy. Heliyon 2021, 7, e06566. [Google Scholar] [CrossRef]
Barros, P.P.d.S.; Fiorio, P.R.; Demattê, J.A.d.M.; Martins, J.A.; Montezano, Z.F.; Dias, F.L.F. Estimation of leaf nitrogen levels in sugarcane using hyperspectral models. Cienc. Rural. 2022, 52. [Google Scholar] [CrossRef]
Segl, K.; Richter, R.; Küster, T.; Kaufmann, H. End-to-end sensor simulation for spectral band selection and optimization with application to the Sentinel-2 mission. Appl. Opt. 2012, 51, 439–449. [Google Scholar] [CrossRef]
Chen, X.; Miao, Y.; Kusnierek, K.; Li, F.; Wang, C.; Shi, B.; Wu, F.; Chang, Q.; Yu, K. Potential of Multi-Source Multispectral vs. Hyperspectral Remote Sensing for Winter Wheat Nitrogen Monitoring. Remote. Sens. 2025, 17, 2666. [Google Scholar] [CrossRef]
Rehman, T.U.; Zhang, L.; Ma, D.; Wang, L.; Jin, J. Calibration transfer across multiple hyperspectral imaging-based plant phenotyping systems: I – Spectral space adjustment. Comput. Electron. Agric. 2020, 176, 105685–105685. [Google Scholar] [CrossRef]
Matese, A.; Czarnecki, J.M.P.; Samiappan, S.; Moorhead, R. Are unmanned aerial vehicle-based hyperspectral imaging and machine learning advancing crop science? Trends Plant Sci. 2023, 29, 196–209. [Google Scholar] [CrossRef]
Atherton, J.; Zhang, C.; Oivukkamäki, J.; Kulmala, L.; Xu, S.; Hakala, T.; Honkavaara, E.; MacArthur, A.; Porcar-Castell.
Zhang, L.; Wang, A.; Zhang, H.; Zhu, Q.; Zhang, H.; Sun, W.; Niu, Y. Estimating Leaf Chlorophyll Content of Winter Wheat from UAV Multispectral Images Using Machine Learning Algorithms under Different Species, Growth Stages, and Nitrogen Stress Conditions. Agriculture 2024, 14, 1064. [Google Scholar] [CrossRef]
Zou, M.; Liu, Y.; Fu, M.; Li, C.; Zhou, Z.; Meng, H.; Xing, E.; Ren, Y. Combining spectral and texture feature of UAV image with plant height to improve LAI estimation of winter wheat at jointing stage. Front. Plant Sci. 2024, 14, 1272049. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Deng, X.; Li, Z.; Xu, J.; Zhou, Y. Monitoring sugarcane growth and nitrogen status using UAV-based multispectral imagery and machine learning. UAVs 2022, v. 6(n. 9), 230. [Google Scholar] [CrossRef]
Li, X.; Ba, Y.; Zhang, M.; Nong, M.; Yang, C.; Zhang, S. Sugarcane Nitrogen Concentration and Irrigation Level Prediction Based on UAV Multispectral Imagery. Sensors 2022, 22, 2711. [Google Scholar] [CrossRef] [PubMed]
Silva, C.A.A.C.; Rizzo, R.; Oliveira, A.K.d.S.; Castro, M.P.P.; Alexandre, M.L.d.S.; Lima, I.d.L.e.; Demattê, A.M.; Fiorio, P.R. Interspecies Prediction of Nitrogen Content in Processed Plant Samples Using Spectroscopic Modeling and Transfer Learning. Food Energy Secur. 2026, 15. [Google Scholar] [CrossRef]
Martins, J.A.; Fiorio, P.R.; Silva, C.A.A.C.; Demattê, J.A.M.; Barros, P.P.d.S. Application of Vegetative Indices for Leaf Nitrogen Estimation in Sugarcane Using Hyperspectral Data. Sugar Tech 2023, 26, 160–170. [Google Scholar] [CrossRef]
Fiorio, P.R.; Silva, C.A.A.C.; Rizzo, R.; Demattê, J.A.M.; Luciano, A.C.d.S.; da Silva, M.A. Prediction of leaf nitrogen in sugarcane (Saccharum spp.) by Vis-NIR-SWIR spectroradiometry. Heliyon 2024, 10, e26819. [Google Scholar] [CrossRef]
Kumarasiri, D. S.; Wijetunga, C.; Gunathilaka, L. D. Use of UAV imagery to predict leaf nitrogen content of sugarcane cultivated under organic fertilizer application. Tropical Agricultural Research 2024, v. 35(n. 1), 67–79. Available online: https://tar.sljol.info/articles/8700.
Alvares, C. A.; et al. Köppen’s climate classification map for Brazil. Meteorologische Zeitschrift 2013, v. 22(n. 6), 711–728. [Google Scholar] [CrossRef]
Lee, M.A.; Huang, Y.; Yao, H.; Thomson, S.J.; Bruce, L.M. Determining the Effects of Storage on Cotton and Soybean Leaf Samples for Hyperspectral Analysis. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2014, 7, 2562–2570. [Google Scholar] [CrossRef]
Tavares, T.R.; Fiorio, P.R.; Seixas, H.T.; Garcia, A.C.; Barros, P.P.d.S. Effects of storage on vis-NIR-SWIR reflectance spectra of Mombasa grass leaf samples. Cienc. Rural. 2020, 50. [Google Scholar] [CrossRef]
ASD - Analytical Spectral Devices FieldSpec® 3 User Manual. 2010. Available online: Http://Www.Asdi.Com/.
AGISOFT LLC. Agisoft Metashape Professional Edition, version 2.0.1; Agisoft LLC: St. Petersburg, Russia, 2022. [Google Scholar]
Picado, E.F.; Romero, K.F.; Heenkenda, M.K. Mapping Spatial Variability of Sugarcane Foliar Nitrogen, Phosphorus, Potassium and Chlorophyll Concentrations Using Remote Sensing. Geomatics 2025, 5, 3. [Google Scholar] [CrossRef]
QGIS DEVELOPMENT TEAM. QGIS Geographic Information System. Version 3.40. [S.l.]: QGIS Association, 2024. Disponível em: qgis.org.
Li, W.; Zhu, X.; Yu, X.; Li, M.; Tang, X.; Zhang, J.; Xue, Y.; Zhang, C.; Jiang, Y. Inversion of Nitrogen Concentration in Apple Canopy Based on UAV Hyperspectral Images. Sensors 2022, 22, 3503. [Google Scholar] [CrossRef]
Savitzky, A.; Golay, M.J.E. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing. 2024. Available online: https://www.r-project.org/.
Burggraaff, O. Biases from incorrect reflectance convolution. Opt. Express 2020, 28, 13801–13816. [Google Scholar] [CrossRef]
Wang, Y.; Kowalski, B.R. Calibration Transfer and Measurement Stability of Near-Infrared Spectrometers. Appl. Spectrosc. 1992, 46, 764–771. [Google Scholar] [CrossRef]
Wang, Y.; Veltkamp, D.J.; Kowalski, B.R. Multivariate instrument standardization. Anal. Chem. 1991, 63, 2750–2756. [Google Scholar] [CrossRef]
Mishra, P.; Nikzad-Langerodi, R.; Marini, F.; Roger, J.M.; Biancolillo, A.; Rutledge, D.N.; Lohumi, S. Are standard sample measurements still needed to transfer multivariate calibration models between near-infrared spectrometers? The answer is not always. TrAC Trends Anal. Chem. 2021, 143. [Google Scholar] [CrossRef]
Ji, W.; Viscarra Rossel, R.A.; Shi, Z. Improved estimates of organic carbon using proximally sensed vis-NIR spectra corrected by piecewise direct standardization. Eur. J. Soil Sci. 2015, 66, 670–678. [Google Scholar] [CrossRef]
Xue, J.; Su, B. Significant remote sensing vegetation indices: A review of developments and applications. J. Sens. 2017, 2017, 1353691. [Google Scholar] [CrossRef]
Chen, X.; Miao, Y.; Kusnierek, K.; Li, F.; Wang, C.; Shi, B.; Wu, F.; Chang, Q.; Yu, K. Potential of Multi-Source Multispectral vs. Hyperspectral Remote Sensing for Winter Wheat Nitrogen Monitoring. Remote. Sens. 2025, 17, 2666. [Google Scholar] [CrossRef]
Rouse, J.; Haas, R.; Schell, J.; Deering, D.; Harlan, J. Monitoring vegetation systems in the Great Plains with ERTS. In THIRD EARTH RESOURCES TECHNOLOGY SATELLITE-1 SYMPOSIUM; S.l.]: [s.n.], 1973; pp. 309–317. [Google Scholar]
Gitelson, A.A.; Kaufman, Y.J.; Stark, R.; Rundquist, D. Novel algorithms for remote estimation of vegetation fraction. Remote Sens. Environ. 2002, 80, 76–87. [Google Scholar] [CrossRef]
Gitelson, A.A.; Gritz, Y.; Merzlyak, M.N. Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Physiol. 2003, 160, 271–282. [Google Scholar] [CrossRef]
Maxmax. ENDVI. 2015. Available online: http://www.maxmax.com/endvi.htm.
Schober, P.; Boer, C.; Schwarte, L.A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef]
Wei, H.-E.; Grafton, M.; Bretherton, M.; Irwin, M.; Sandoval, E. Evaluation of Point Hyperspectral Reflectance and Multivariate Regression Models for Grapevine Water Status Estimation. Remote. Sens. 2021, 13, 3198. [Google Scholar] [CrossRef]
Anku, K.E.; Percival, D.C.; Lada, R.; Heung, B.; Vankoughnett, M. Remote estimation of leaf nitrogen content, leaf area, and berry yield in wild blueberries. Front. Remote. Sens. 2024, 5, 1414540. [Google Scholar] [CrossRef]
Yin, C.; Lv, X.; Zhang, L.; Ma, L.; Wang, H.; Zhang, L.; Zhang, Z. Hyperspectral UAV Images at Different Altitudes for Monitoring the Leaf Nitrogen Content in Cotton Crops. Remote. Sens. 2022, 14, 2576. [Google Scholar] [CrossRef]
Chen, H.; Tan, C.; Lin, Z.; Wu, T. Classification and quantitation of milk powder by near-infrared spectroscopy and mutual information-based variable selection and partial least squares. Spectrochim. Acta Part A: Mol. Biomol. Spectrosc. 2018, 189, 183–189. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Zhang, J. New Machine Learning Algorithm: Random Forest. In Information Computing and Applications; Springer: Berlin/Heidelberg, Germany, 2012; pp. 246–252. [Google Scholar]
Tang, K.; Qin, M.; Han, B.; Shao, D.; Xu, Z.; Sun, H.; Wu, Y. Identifying the influencing factors of soil nitrous acid emissions using random forest model. Atmospheric Environ. 2024, 339. [Google Scholar] [CrossRef]
Liu, M.; Liu, X.; Liu, D.; Ding, C.; Jiang, J. Multivariable integration method for estimating sea surface salinity in coastal waters from in situ data and remotely sensed data using random forest algorithm. Comput. Geosci. 2015, 75, 44–56. [Google Scholar] [CrossRef]
Friedman, J.H.; Meulman, J.J. Multiple additive regression trees with application in epidemiology. Stat. Med. 2003, 22, 1365–1381. [Google Scholar] [CrossRef]
Rodrigues, M.; Nanni, M.R.; Cezar, E.; dos Santos, G.L.A.A.; Reis, A.S.; de Oliveira, K.M.; de Oliveira, R.B. Vis–NIR spectroscopy: from leaf dry mass production estimate to the prediction of macro- and micronutrients in soybean crops. J. Appl. Remote. Sens. 2020, 14, 044505. [Google Scholar] [CrossRef]
Leite, J.M.; Ciampitti, I.A.; Mariano, E.; Vieira-Megda, M.X.; Trivelin, P.C.O. Nutrient Partitioning and Stoichiometry in Unburnt Sugarcane Ratoon at Varying Yield Levels. Front. Plant Sci. 2016, 7, 466. [Google Scholar] [CrossRef]
Hassan, M.A.; Yang, M.; Rasheed, A.; Yang, G.; Reynolds, M.; Xia, X.; Xiao, Y.; He, Z. A rapid monitoring of NDVI across the wheat growth cycle for grain yield prediction using a multi-spectral UAV platform. 4th IPPN International Plant Phenotyping Symposium; LOCATION OF CONFERENCE, COUNTRYDATE OF CONFERENCE; pp. 95–103.
Zhang, C.; Yang, G.; Li, H.; Tang, F.; Liu, C.; Zhang, Y. Remote sensing inversion of leaf area index of winter wheat based on random forest algorithm. Scientia Agricultura Sin. 2018, 51(5), 855–867. [Google Scholar] [CrossRef]
Osco, L.P.; Ramos, A.P.M.; Pereira, D.R.; Moriya, É.A.S.; Imai, N.N.; Matsubara, E.T.; Estrabis, N.; de Souza, M.; Junior, J.M.; Gonçalves, W.N.; et al. Predicting Canopy Nitrogen Content in Citrus-Trees Using Random Forest Algorithm Associated to Spectral Vegetation Indices from UAV-Imagery. Remote. Sens. 2019, 11, 2925. [Google Scholar] [CrossRef]
Imran, H.A.; Gianelle, D.; Rocchini, D.; Dalponte, M.; Martín, M.P.; Sakowska, K.; Wohlfahrt, G.; Vescovo, L. VIS-NIR, Red-Edge and NIR-Shoulder Based Normalized Vegetation Indices Response to Co-Varying Leaf and Canopy Structural Traits in Heterogeneous Grasslands. Remote. Sens. 2020, 12, 2254. [Google Scholar] [CrossRef]
Narmilan, A.; Gonzalez, F.; Salgadoe, A.S.A.; Kumarasiri, U.W.L.M.; Weerasinghe, H.A.S.; Kulasekara, B.R. Predicting Canopy Chlorophyll Content in Sugarcane Crops Using Machine Learning Algorithms and Spectral Vegetation Indices Derived from UAV Multispectral Imagery. Remote. Sens. 2022, 14, 1140. [Google Scholar] [CrossRef]
Gutman, G.; Skakun, S.; Gitelson, A. Revisiting the use of red and near-infrared reflectances in vegetation studies and numerical climate models. Sci. Remote. Sens. 2021, 4. [Google Scholar] [CrossRef]
Miphokasap, P.; Honda, K.; Vaiphasa, C.; Souris, M.; Nagai, M. Estimating Canopy Nitrogen Concentration in Sugarcane Using Field Imaging Spectroscopy. Remote. Sens. 2012, 4, 1651–1670. [Google Scholar] [CrossRef]
Li, X.; Ba, Y.; Zhang, M.; Nong, M.; Yang, C.; Zhang, S. Sugarcane Nitrogen Concentration and Irrigation Level Prediction Based on UAV Multispectral Imagery. Sensors 2022, 22, 2711. [Google Scholar] [CrossRef]
Martins, J.A.; Fiorio, P.R.; Silva, C.A.A.C.; Demattê, J.A.M.; Barros, P.P.d.S. Application of Vegetative Indices for Leaf Nitrogen Estimation in Sugarcane Using Hyperspectral Data. Sugar Tech 2023, 26, 160–170. [Google Scholar] [CrossRef]
Hassani, K.; Gholizadeh, H.; Taghvaeian, S.; Natalie, V.; Carpenter, J.; Jacob, J. Assessing the impact of spatial resolution of UAS-based remote sensing and spectral resolution of proximal sensing on crop nitrogen retrieval accuracy. Int. J. Remote. Sens. 2023, 44, 4441–4464. [Google Scholar] [CrossRef]
Lv, Z.; Zhao, W.; Kong, S.; Li, L.; Lin, S. Overview of molecular mechanisms of plant leaf development: a systematic review. Front. Plant Sci. 2023, 14, 1293424. [Google Scholar] [CrossRef]
Yoneyama, T.; Suzuki, A. Light-Independent Nitrogen Assimilation in Plant Leaves: Nitrate Incorporation into Glutamine, Glutamate, Aspartate, and Asparagine Traced by ¹⁵N. Plants 2020, 9, 1303. [Google Scholar] [CrossRef]
Miphokasap, P.; Wannasiri, W. Estimations of Nitrogen Concentration in Sugarcane Using Hyperspectral Imagery. Sustainability 2018, 10, 1266. [Google Scholar] [CrossRef]
Biriukova, K.; Celesti, M.; Evdokimov, A.; Pacheco-Labrador, J.; Julitta, T.; Migliavacca, M.; Giardino, C.; Miglietta, F.; Colombo, R.; Panigada, C.; et al. Effects of varying solar-view geometry and canopy structure on solar-induced chlorophyll fluorescence and PRI. Int. J. Appl. Earth Obs. Geoinformation 2020, 89. [Google Scholar] [CrossRef]
Lu, N.; Wang, W.; Zhang, Q.; Li, D.; Yao, X.; Tian, Y.; Zhu, Y.; Cao, W.; Baret, F.; Liu, S.; et al. Estimation of Nitrogen Nutrition Status in Winter Wheat From Unmanned Aerial Vehicle Based Multi-Angular Multispectral Imagery. Front. Plant Sci. 2019, 10, 1601. [Google Scholar] [CrossRef] [PubMed]
Azizabadi, E.C.; El-Shetehy, M.; Cheng, X.; Youssef, A.; Badreldin, N. In-Season Potato Nitrogen Prediction Using Multispectral Drone Data and Machine Learning. Remote. Sens. 2025, 17, 1860. [Google Scholar] [CrossRef]
Soltanikazemi, M.; Minaei, S.; Shafizadeh-Moghadam, H.; Mahdavian, A. Field-scale estimation of sugarcane leaf nitrogen content using vegetation indices and spectral bands of Sentinel-2: Application of random forest and support vector regression. Comput. Electron. Agric. 2022, 200. [Google Scholar] [CrossRef]
Huang, Y.; Chen, W.; Tan, W.; Deng, Y.; Yang, C.; Zhu, X.; Shen, J.; Liu, N. Transfer learning for enhancing the generality of leaf spectroscopic models in estimating crop foliar nutrients across growth stages. Int. J. Appl. Earth Obs. Geoinformation 2025, 139. [Google Scholar] [CrossRef]
Ji, F.; Li, F.; Hao, D.; Shiklomanov, A.N.; Yang, X.; Townsend, P.A.; Dashti, H.; Nakaji, T.; Kovach, K.R.; Liu, H.; et al. Unveiling the transferability of PLSR models for leaf trait estimation: lessons from a comprehensive analysis with a novel global dataset. New Phytol. 2024, 243, 111–131. [Google Scholar] [CrossRef] [PubMed]
Helsen, K.; Bassi, L.; Feilhauer, H.; Kattenborn, T.; Matsushima, H.; Van Cleemput, E.; Somers, B.; Honnay, O. Evaluating different methods for retrieving intraspecific leaf trait variation from hyperspectral leaf reflectance. Ecol. Indic. 2021, 130. [Google Scholar] [CrossRef]
Schiefer, F.; Schmidtlein, S.; Kattenborn, T. The retrieval of plant functional traits from canopy spectra through RTM-inversions and statistical models are both critically affected by plant phenology. Ecol. Indic. 2021, 121. [Google Scholar] [CrossRef]
Chlus, A.; Townsend, P.A. Characterizing seasonal variation in foliar biochemistry with airborne imaging spectroscopy. Remote. Sens. Environ. 2022, 275. [Google Scholar] [CrossRef]
Pan, Y.; Li, J.; Zhang, J.; He, J.; Zhang, Z.; Yao, X.; Cheng, T.; Zhu, Y.; Cao, W.; Tian, Y. Estimating Leaf Nitrogen Accumulation Considering Vertical Heterogeneity Using Multiangular Unmanned Aerial Vehicle Remote Sensing in Wheat. Plant Phenomics 2024, 6, 0276. [Google Scholar] [CrossRef]
Silva, C.A.A.C.; Rizzo, R.; Oliveira, A.K.d.S.; Castro, M.P.P.; Alexandre, M.L.d.S.; Lima, I.d.L.e.; Demattê, A.M.; Fiorio, P.R. Interspecies Prediction of Nitrogen Content in Processed Plant Samples Using Spectroscopic Modeling and Transfer Learning. Food Energy Secur. 2026, 15. [Google Scholar] [CrossRef]
Alharbi, K.; Haroun, S.A.; Kazamel, A.M.; Abbas, M.A.; Ahmaida, S.M.; AlKahtani, M.; AlHusnain, L.; Attia, K.A.; Abdelaal, K.; Gamel, R.M.E. Physiological Studies and Ultrastructure of Vigna sinensis L. and Helianthus annuus L. under Varying Levels of Nitrogen Supply. Plants 2022, 11, 1884. [Google Scholar] [CrossRef]
Mu, X.; Chen, Y. The physiological response of photosynthesis to nitrogen deficiency. Plant Physiol. Biochem. 2021, 158, 76–82. [Google Scholar] [CrossRef]
Alexandre, M.L.d.S.; Lima, I.d.L.e.; Nilsson, M.S.; Rizzo, R.; Silva, C.A.A.C.; Fiorio, P.R. Sugarcane (Saccharum officinarum) Productivity Estimation Using Multispectral Sensors in RPAs, Biometric Variables, and Vegetation Indices. Agronomy 2025, 15, 2149. [Google Scholar] [CrossRef]
Yu, K.-Q.; Zhao, Y.-R.; Li, X.-L.; Shao, Y.-N.; Liu, F.; He, Y. Hyperspectral Imaging for Mapping of Total Nitrogen Spatial Distribution in Pepper Plant. PLOS ONE 2014, 9, e116205–e116205. [Google Scholar] [CrossRef]
Pechanec, V.; Mráz, A.; Rozkošný, L.; Vyvlečka, P. Usage of Airborne Hyperspectral Imaging Data for Identifying Spatial Variability of Soil Nitrogen Content. ISPRS Int. J. Geo-Information 2021, 10, 355. [Google Scholar] [CrossRef]
Verrelst, J.; Rivera-Caicedo, J.P.; Reyes-Muñoz, P.; Morata, M.; Amin, E.; Tagliabue, G.; Panigada, C.; Hank, T.; Berger, K. Mapping landscape canopy nitrogen content from space using PRISMA data. ISPRS J. Photogramm. Remote. Sens. 2021, 178, 382–395. [Google Scholar] [CrossRef]
Bassi, D.; Menossi, M.; Mattiello, L. Nitrogen supply influences photosynthesis establishment along the sugarcane leaf. Sci. Rep. 2018, 8, 2327. [Google Scholar] [CrossRef] [PubMed]
Do Amaral, L. R.; Molin, J. P. Sensor óptico no auxílio à recomendação de adubação nitrogenada em cana-de-açúcar. Pesquisa Agropecuária Brasileira 2011, v. 46(n. 12), 1633–1642. [Google Scholar] [CrossRef]
Santana, A.C.d.A.; de Oliveira, E.C.A.; da Silva, V.S.G.; dos Santos, R.L.; da Silva, M.A.; Freire, F.J. Critical nitrogen dilution curves and productivity assessments for plant cane. Rev. Bras. De Eng. Agricola E Ambient. 2020, 24, 244–251. [Google Scholar] [CrossRef]
Ollinger, S.V. Sources of variability in canopy reflectance and the convergent properties of plants. New Phytol. 2010, 189, 375–394. [Google Scholar] [CrossRef]

Figure 1. Methodological Workflow.

Figure 2. Location map of the experimental area, situated in the municipality of Piracicaba, São Paulo, Brazil, within the research facilities of the CTC.

Figure 3. Methodological schedule for data extraction. .

Figure 5. Dynamics of leaf nitrogen content and precipitation (expected vs. actual) in sugarcane throughout the crop cycle, evaluated at 220, 290, and 350 DAC.

Figure 6. Spearman correlation matrix between nitrogen content and original and simulated vegetation indices. “SIM” refers to simulated data, indices, and bands.

Figure 7. Linear regressions between nitrogen content (N, %) and the NDVI, VARI, ChlRe, and ENDVI indices, comparing values obtained directly by UAV and their corresponding simulated values (SIM). Each panel shows the linear fit, including R², RMSE, and significance (P), and letter codes (a–h).

Figure 8. Performance of predictive nitrogen models using UAV data. (A) PLSR model (UAV), (B) RF model (UAV).

Figure 9. Performance of predictive nitrogen models using simulated data. (C) PLSR model (SIM), (D) RF model (SIM).

Figure 10. Variable importance analysis (VIPs) for the PLSR models in predicting nitrogen content. (A) PLSR model using UAV data. (B) PLSR model using simulated (SIM) data.

Figure 11. Permutation-based variable importance in the RF model for nitrogen prediction. (A) RF model (UAV), (B) RF model (SIM).

Figure 12. Independent validation across domains (original UAV and simulated bands) for predicting leaf nitrogen content (PLSR and RF).

Figure 13. Spatial distribution of nitrogen content predicted by the Random Forest model, validated with simulated data.

Table 1. Characteristics of the multispectral sensor mounted on the UAV, including central wavelength (nm) and bandwidth (nm).

Band (λ)	Central Wavelength (nm)	Bandwidth (nm)
Band (λ)	Central Wavelength (nm)	Bandwidth (nm)	Blue (B)	450	±16
Green (G)	560	±16
Red (R)	650	±16
Red-edge (RE)	730	±16
Near-infrared (NIR)	840	±26
RGB	20 megapixels

Table 2. List of vegetation indices used in the modeling, along with their respective equations and references. The blue, green, red, red-edge, and near-infrared bands are represented by B, G, R, RE, and NIR, respectively.

ID	Vegetation Index	Formula	References
1	Normalized Difference Vegetation Index (NDVI)	(NIR - R) / (NIR + R)	[36]
2	Visible Atmospheric Resistance Index (VARI)	(G - R) / (G + R - B)	[37]
3	Chlorophyll Index - Red-Edge (ChlRe)	(NIR) / (RED)-1	[38]
4	Improved Normalized Difference Vegetation Index (ENDVI)	(NIR - G) - (2 x B) / (NIR - G) + (2 x B)	[39]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Cross-Domain Transferability of Foliar Nitrogen Prediction in Sugarcane (Saccharum officinarum) Through the Integration of UAV and Simulated Spectral Data

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. Study Area and Experimental Conditions

2.2. Field Experimental Design

2.3. Acquisition of Multispectral Images Using a UAV

2.4. Acquisition of Leaf Samples

2.5. Acquisition of Laboratory Hyperspectral Data

2.6. Spectral Data Preprocessing

2.6.1. Calibration and Extraction of UAV Multispectral Data

2.6.2. Standardization of Hyperspectral Data