ARTICLE | doi:10.20944/preprints202112.0138.v2
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Yield mapping; vegetation index; Stepwise; SR; Random Forest; KNN
Online: 9 December 2021 (15:39:34 CET)
The use of machine learning techniques to predict yield based on remote sensing is a no-return path and studies conducted on farm aim to help rural producers in decision-making. Thus, commercial fields equipped with technologies in Mato Grosso, Brazil, were monitored by satellite images to predict cotton yield using supervised learning techniques. The objective of this research was to identify how early in the growing season, which vegetation indices and which machine learning algorithms are best to predict cotton yield at the farm level. For that, we went through the following steps: 1) We observed the yield in 398 ha (3 fields) and eight vegetation indices (VI) were calculated on five dates during the growing season. 2) Scenarios were created to facilitate the analysis and interpretation of results: Scenario 1: All Data (8 indices on 5 dates = 40 inputs) and Scenario 2: best variable selected by Stepwise regression (1 input). 3) In the search for the best algorithm, hyperparameter adjustments, calibrations and tests using machine learning were performed to predict yield and performances were evaluated. Scenario 1 had the best metrics in all fields of study, and the Multilayer Perceptron (MLP) and Random Forest (RF) algorithms showed the best performances with adjusted R2 of 47% and RMSE of only 0.24 t ha-1, however, in this scenario all predictive inputs that were generated throughout the growing season (approx. 180 days) are needed, so we optimized the prediction and tested only the best VI in each field, and found that among the eight VIs, the Simple Ratio (SR), driven by the K-Nearest Neighbor (KNN) algorithm predicts with 0.26 and 0.28 t ha-1 of RMSE and 5.20% MAPE, anticipating the cotton yield with low error by ±143 days, and with important aspect of requiring less computational demand in the generation of the prediction when compared to MLP and RF, for example, enabling its use as a technique that helps predict cotton yield, resulting in time savings for planning, whether in marketing or in crop management strategies.