Preprint Article Version 2 Preserved in Portico This version is not peer-reviewed

Comparison of Machine Learning Techniques in Cotton Yield Prediction Using Satellite Remote Sensing

Version 1 : Received: 8 December 2021 / Approved: 8 December 2021 / Online: 8 December 2021 (14:41:31 CET)
Version 2 : Received: 8 December 2021 / Approved: 9 December 2021 / Online: 9 December 2021 (15:39:34 CET)

How to cite: Morelli-Ferreira, F.; Maia, N.J.C.; Tedesco, D.; Kazama, E.H.; Morlin Carneiro, F.; Santos, L.B.; Seben Junior, G.F.; Rolim, G.S.; Shiratsuchi, L.S.; Silva, R.P. Comparison of Machine Learning Techniques in Cotton Yield Prediction Using Satellite Remote Sensing. Preprints 2021, 2021120138. https://doi.org/10.20944/preprints202112.0138.v2 Morelli-Ferreira, F.; Maia, N.J.C.; Tedesco, D.; Kazama, E.H.; Morlin Carneiro, F.; Santos, L.B.; Seben Junior, G.F.; Rolim, G.S.; Shiratsuchi, L.S.; Silva, R.P. Comparison of Machine Learning Techniques in Cotton Yield Prediction Using Satellite Remote Sensing. Preprints 2021, 2021120138. https://doi.org/10.20944/preprints202112.0138.v2

Abstract

The use of machine learning techniques to predict yield based on remote sensing is a no-return path and studies conducted on farm aim to help rural producers in decision-making. Thus, commercial fields equipped with technologies in Mato Grosso, Brazil, were monitored by satellite images to predict cotton yield using supervised learning techniques. The objective of this research was to identify how early in the growing season, which vegetation indices and which machine learning algorithms are best to predict cotton yield at the farm level. For that, we went through the following steps: 1) We observed the yield in 398 ha (3 fields) and eight vegetation indices (VI) were calculated on five dates during the growing season. 2) Scenarios were created to facilitate the analysis and interpretation of results: Scenario 1: All Data (8 indices on 5 dates = 40 inputs) and Scenario 2: best variable selected by Stepwise regression (1 input). 3) In the search for the best algorithm, hyperparameter adjustments, calibrations and tests using machine learning were performed to predict yield and performances were evaluated. Scenario 1 had the best metrics in all fields of study, and the Multilayer Perceptron (MLP) and Random Forest (RF) algorithms showed the best performances with adjusted R2 of 47% and RMSE of only 0.24 t ha-1, however, in this scenario all predictive inputs that were generated throughout the growing season (approx. 180 days) are needed, so we optimized the prediction and tested only the best VI in each field, and found that among the eight VIs, the Simple Ratio (SR), driven by the K-Nearest Neighbor (KNN) algorithm predicts with 0.26 and 0.28 t ha-1 of RMSE and 5.20% MAPE, anticipating the cotton yield with low error by ±143 days, and with important aspect of requiring less computational demand in the generation of the prediction when compared to MLP and RF, for example, enabling its use as a technique that helps predict cotton yield, resulting in time savings for planning, whether in marketing or in crop management strategies.

Keywords

Yield mapping; vegetation index; Stepwise; SR; Random Forest; KNN

Subject

Biology and Life Sciences, Agricultural Science and Agronomy

Comments (1)

Comment 1
Received: 9 December 2021
Commenter: Luciano Shiratsuchi
Commenter's Conflict of Interests: Author
Comment: Complete author names correction and Figure 01.
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.