Preprint
Article

This version is not peer-reviewed.

A Time-Series Hybrid Multi-Model Machine Learning Framework for Staple Crops Yield Prediction

Submitted:

01 January 2026

Posted:

04 January 2026

You are already at the latest version

Abstract
Agriculture is pivotal for the economy of a country as it is a major source of food, em-ployment and raw materials. However, challenges such as diseases, soil degradation, and water scarcity persist. Technology adoption can address these issues, improving production and quality. Machine learning enables prediction in agriculture. It opti-mizes irrigation, fertilization, and crop selection, aiding decision-making for food se-curity and crop management. This study proposes multi-model machine learning models for eleven staple (Bananas, Maize, Wheat, Cassava, Rice , Soybeans, Barleys, Potatoes, Beans dry, Peas dry and Cocoa beans ) crop yield prediction. The compara-tive results show that the prediction results of the proposed multi-model algorithm are significantly better than linear model. The error trend seasonality-artificial neural network (ETS-ANN) achieved 80% R2 for Cassava crop yield prediction whereas Ba-nanas achieved lowest R2 (20%).
Keywords: 
;  ;  ;  ;  ;  

1. Introduction

Agriculture is essential for achieving global food security and economic stability, particularly in developing countries [1,2]. As the world population is expected to reach 9.7 billion by 2050 [3], enhancing crop productivity is crucial for meeting future food requirements. Crops such as rice, wheat, maize, corn, and soybeans are vital to the world’s food supply and are significant exports [4,5,6,7,8]. Precise crop yield predictions allow governments and organizations to make educated decisions about resource management and trade strategies. Prior yield estimation techniques, such as surveys, expert evaluations, or trend-based models, frequently lack prompt updates and flexibility, particularly when confronted with rapidly changing climate conditions [9]. This has propelled the movement towards data-driven strategies utilizing machine learning (ML), deep learning (ML), internet of things (IoT), and artificial intelligence (AI) to reveal intricate, non-linear connections in crop, soil, and environmental information [10,11,12]. ML models like support vector regression (SVR), k-nearest neighbors (KNN), decision tree (DT), gradient boosting (GB) and random forest (RF) have surpassed traditional regression methods in recent usages [13,14].

1.1. Need of Yield Prediction

Food is a necessity for every human being on this planet. The crop yield prediction system determines which crop needs to be grown in which season and in which year. However, many farmers commit suicide because they are unable to decide which crop yield is suitable for this year. Farmers are facing unfavorable weather conditions and lower yield predictions. The combination of ML, DL, and IoT will help address this critical issue, enabling farmers to predict which crops are suitable for the current season.

1.2. Problem Statement

Accurate crop yield forecasting is critical for agricultural plannings, food distribution formulation, and market stability. Traditional statistical models such as ARIMA and ETS assume linearity and stationarity, which limits their ability to capture nonlinear and nonstationary agricultural yield patterns influenced by climate variability, technological change, and management practices. Recent studies have applied ML and DL models; however, most existing works:
  • Focus on single-model approaches.
  • Are restricted to short-term forecasting horizons.
  • Ignore yield instability and volatility characteristics.
  • Lack systematic comparison across statistical, ML, and DL models.
Therefore, there is a need for a comprehensive hybrid-forecasting framework that integrates statistical ML, and DL techniques to improve long-term crop yield forecasting accuracy and robustness.

1.3. Contribution of the Study

This study makes the following key contributions:
  • Proposes a unified hybrid artificial intelligence framework combining statistical ML and DL models for crop yield forecasting.
  • Performs a systematic comparison of classical models (ARIMA, ETS), volatility-based models (Generalized autoregressive conditional heteroscedasticity (GARCH family)), ML models (ANN, support vector machine (SVM), partial least squares (PLS)), and deep learning models (long-short term memory (LSTM)).
  • Introduces signal decomposition-based hybrid models (Wavelet-ANN and empirical mode decomposition (EMD)-ANN) to handle nonstationary yield series.
  • Incorporates yield instability and heteroscedasticity diagnostics using Cuddy–Della Valle Index (CDVI) and autoregressive conditional heteroscedasticity- lagrange multiplier (ARCH-LM) tests.
  • Evaluates model performance using various performance evaluation metrics root mean squared error (RMSE), mean absolute error (MAE), r-squared (R²), and mean absolute percentage error (MAPE).
  • Generates long-term (20-year) forecasts for multiple crops, which are rarely addressed in existing literature.

2. Literature Review

Researchers employed various ML approaches to estimate crop yields. [13] developed a web application for crop recommendation and yield estimation. They used ML classification and regression techniques for crop recommendations and yield estimation. [15] proposed a novel system for crop yield prediction that utilizes crop yield data, meteorological data, and ML techniques. They evaluated KNN, GB, and multivariate logistic regression, achieving a remarkable R² of 99.99%. [16] used AI in agriculture for crop optimization and yield prediction with KNN, SVM, and XGBoost algorithms. [14] had proposed machine learning models for yield prediction and crop classification, attaining a high accuracy of 99.7% for classification and 99.9% for yield prediction. [17] utilized regression models such as random forest, decision tree, XGBoost, and deep learning (convolution neural network (CNN), LSTM) for predicting crop yields in India. [18] had developed a crop recommendation framework based on XAI, utilizing historical crop yields and soil nutrients. [19] Additionally, a crop recommendation framework based on XAI was introduced, achieving an R2 value of 0.94152. [20] used a reinforcement learning model to predict the yield of the crop. [21] forecasted the crop yield using ensemble learning on synthetic data generated by Prophet, used ET as a meta model, and XGB, RF, KNN, and decision tree (DT) as weak learners. [22] employed Terra\MODIS-derived metrics to estimate crop yield in Canadian Prairies, especially when vegetation indices such as EVI2 are applied. Barley, canola, and spring wheat yields were better predicted by EVI2 than by NDVI, with relative root mean square error (RRMSE) values at the Census Agriculture Region level ranging from 14% to 20%. [23] proposed a framework for crop yield prediction named Flyer, which was based on edge computing and federated learning for agriculture 5.0. [24] used a Bayesian-optimized gated recurrent unit (GRU) for the enhancement of crop yield prediction. When compared to conventional methods, experimental results demonstrate the significant improvement in yield forecasting achieved by the proposed IQ-GRU framework for both mid-season and season-end predictions. The goal of [25] was to determine the optimal crop prediction model to assist farmers in selecting crops based on soil nutrients and climate. The study compared the RF, DT, and KNN algorithms using the Gini and Entropy criteria, and found that RF achieved the highest accuracy. The CrYP tool, developed by [26], operates within the google earth engine (GEE) framework, enabling spatially detailed predictions of crop yields. This open-source application enhances the accuracy of yield forecasts, which is crucial for ensuring food security and effective agricultural management. [27,28,29,30,31] studied corn yield prediction. [27] used ML and statistical tools for improving the corn Nitrogen recommendation tool by utilizing soil and weather data information. The study by [31] focused on a parametric yield model and semiparametric neural networks for forecasting corn crop yield. [28] Utilized neural network methods for predicting the yield of corn, achieving an accuracy of 98%. [30] Ensemble different ML models to predict the corn yield and achieved an RRMSE of 9.5% (with an optimized weighted ensemble model). The authors of [29] used leaf area and the normalized difference vegetation index to predict corn yield, obtaining an accuracy of 94%. [8,32,33,34,35] investigated various ML and DL techniques to predict soybean yield. [32] Conducted the study on corn and soybean yield prediction in Canada using ANN MLP (multi-layer perceptron). [8] designed a deep learning CNN_LSTM model to predict the yield of soybean using CONUS country-level data from 2003 to 2015. [33] used deep learning and multimodal data fusion techniques to predict soybean crop yield data collected from UAVs. [35] had combined weather data and ML techniques to improve the soybean crop yield prediction in southern Brazil. [34] Employed a CNN and an LSTM model to predict the yield of soybeans in the USA. The dataset used in this study was collected from NASA’s and GEE. The authors of [36] predicted soybean yield using a CNN LSTM-ViT (vision-transformer) model by utilizing remotely sensed data and earth observations. [37] enhanced maize yield predictions in the Kaffa Zone of Southwestern Ethiopia by utilizing geographic information systems (GIS) and remote sensing technology data. These approaches provide precise and rapid data, which is essential for farmers’ and governments’ informed decision-making. Research has indicated a strong association between maize production and NDVI, with correlations as high as 89% when rainfall data is included. Strong prediction abilities have been demonstrated by models created using satellite data, which have an R² of 0.89 and an root mean squared error (RMSE) of 1.54 q/ha. [38] demonstrated a statistical framework for corn and soybean yield prediction using ensemble ML models. The obtained result shows an average accuracy of 80% for corn and 81% for soybean. [39] predicted the cotton crop yield using RF, SVR, MLR, and LightGBM models. The RF regressor achieved a highest accuracy of 97.75%. [40] used weather parameters to predict the cotton crop yield from 2005 to 2020. The proposed model random forest extreme gradient achieved RMSE of 0.05. [4] presented a unique method (Gaussian kernel regression) for rice yield prediction using SAR and optimal imaginary data. [41] proposed a new LSTM-based model for rice yield prediction. The developed model was known as target aware yield prediction (TAYP). The TAYP model achieved an accuracy of 95%. [42] used wheat crop for yield prediction using the climate NDVI data fusion technique. [5] presented a framework for wheat yield prediction using least absolute shrinkage and selection operator (LASSO), RF, and XGB models of ML based on remote sensing data. [43] Predicted the yield of the Sugarbeet crop using a temporal graph neural network and a multimodal meta-transfer model. [44] focused on a comprehensive review of Palm oil yield prediction using ML models. [45] employed data fusion and deep learning to predict the yield of the tea crop, achieving an R-squared of 0.99. The comparative analysis of staple crops are demonstrated in Table 1.

3. Methods and Materials

3.1. Dataset Description

The study uses secondary time-series data on crop yields collected from officially published agricultural statistics (https://ourworldindata.org/grapher/key-crop-yields). The dataset consists of annual yield observations (tons per hectare) for multiple major crops over several decades.
  • Temporal coverage: Multiple decades (annual frequency)
  • Spatial scope: International-level aggregated crop yields
  • Data type: Time-series
  • Source: Government agricultural databases - UNFAO (2025)
  • Dependent Variable: Crop yield (tonnes per hectare)
  • Independent Variables:
    • Lagged crop yield values (time-lag features)
    • Decomposed components (trend and residuals from Wavelet and EMD)
    • Volatility measures (for GARCH-based models)
      • Climatic and economic variables are not explicitly included; instead, their effects are implicitly captured through temporal yield dynamics.

3.2. Preprocessing Steps

  • Handling missing values using interpolation
  • Normalization for machine learning and deep learning models
  • Train–test split (80:20) preserving temporal order

3.3. Proposed Solution

This study proposes a hybrid forecasting framework that:
  • Combines linear, nonlinear, and deep temporal representations.
  • Captures trend, volatility, and nonlinear dependencies.
  • Provides robust long-term yield forecasts.
Unlike earlier studies, the framework handles no stationarity explicitly and selects best-performing models’ crop-wise. The methodology employed in this study involved evaluating a diverse set of statistical and machine learning models for forecasting crop yield across multiple crops shown in Figure 1.

3.4. Model Selection

The following time series model are used for crop yield prediction.
ARIMA
The ARIMA (p, d, q) model represents crop yield as a function of its past values and past error terms.
Yₜ = c + φ₁Yₜ₋₁ + φ₂Yₜ₋₂ + … + φₚYₜ₋ₚ
+ θ₁εₜ₋₁ + θ₂εₜ₋₂ + … + θ_qεₜ₋q + εₜ
Where: Yₜ = crop yield at time t, φᵢ = autoregressive parameters, θⱼ = moving average parameters, εₜ = white noise error term, c = constant
ETS
The additive ETS model is expressed as:
Yₜ = lₜ₋₁ + bₜ₋₁ + εₜ
lₜ = lₜ₋₁ + bₜ₋₁ + αεₜ
bₜ = bₜ₋₁ + βεₜ
Where: lₜ = level component, bₜ = trend component, α, β = smoothing parameters, εₜ = error term
ANN
The ANN model predicts crop yield as:
Ŷₜ = f(Σ wᵢXᵢ + b)
Where: Xᵢ = input variables (lagged yields), wᵢ = weights, b = bias, f() = nonlinear activation function, Ŷₜ = predicted yield
LSTM
LSTM captures long-term dependencies using memory cells.
Forget gate: fₜ = σ(W_f[hₜ₋₁, xₜ] + b_f)
Input gate: iₜ = σ(W_i[hₜ₋₁, xₜ] + b_i)
Cell update: Cₜ = fₜCₜ₋₁ + iₜtanh(W_c[hₜ₋₁, xₜ] + b_c)
Output gate: oₜ = σ(W_o[hₜ₋₁, xₜ] + b_o)
Hidden state: hₜ = oₜtanh(Cₜ)
Wavelet–ANN
Wavelet decomposition splits the yield series into components:
Yₜ = Aₜ + Σ Dₜᵢ
Where: Aₜ = approximation component, Dₜᵢ = detail components. The ANN is trained using Aₜ as input to predict future yield.
EMD–ANN
EMD decomposes the series into intrinsic mode functions:
Yₜ = Σ IMFᵢ + rₜ
Where: IMFᵢ = intrinsic mode functions, rₜ = residual trend. The ANN uses the trend component rₜ for forecasting.
ETS–ANN
The hybrid ETS–ANN model is defined as:
Yₜ = Ŷₜᴱᵀˢ + Ŷₜᴬᴺᴺ
Where: Ŷₜᴱᵀˢ = ETS forecast, Ŷₜᴬᴺᴺ = ANN forecast of ETS residuals
PLS
PLS projects predictors and response into latent space:
X = TPᵀ + E
Y = UQᵀ + F
Where: T, U = latent scores, P, Q = loading matrices, E, F = residuals

3.5. Performance Evaluation Metrics

The evaluation metrics used for crop yield prediction is:
Mean Absolute Error (MAE):
MAE = (1/n) Σ |Yₜ − Ŷₜ|
Root Mean Square Error (RMSE):
RMSE = √[(1/n) Σ (Yₜ − Ŷₜ)²]
Coefficient of Determination (R²):
R² = 1 − [Σ(Yₜ − Ŷₜ)² / Σ(Yₜ − Ȳ)²]
Mean Absolute Percentage Error (MAPE):
MAPE = (100/n) Σ |(Yₜ − Ŷₜ)/Yₜ|
For yield instability measure Cuddy–Della Valle Index (CDVI) is used.
CDVI = CV × √(1 − R²)
Where: CV = coefficient of variation, R² = goodness of fit from trend regression
The objective was to assess predictive performance comprehensively across linear time-series models, nonlinear machine learning architectures, and hybrid methods that combine elements of both. Among the models considered were ARIMA, LSTM, ANN, EMD-ANN, Wavelet-ANN, ETS-based hybrids such as ETS-ANN, ETS-LSTM, ETS-SVM, as well as Partial Least Squares (PLS). Each model was trained on historical crop yield data and validated using out-of-sample test sets to ensure comparability. Hyperparameter tuning was conducted using appropriate techniques: ARIMA relied on information criterion-based order selection, neural network architectures such as ANN and LSTM were optimized via iterative validation, while hybrid ETS approaches integrated exponential smoothing with neural network or support vector components. Forecasting performance was quantified using multiple error metrics: MAE, RMSE, coefficient of determination (R²), MAPE, Crop-Dependent Variability Index (CDVI), and residual diagnostics for ARCH effects. This multi-metric evaluation ensured that models were compared not only on point forecast accuracy but also on their ability to explain variance, control relative errors, and capture volatility.

4. Results and Analysis

Table 2 provide aggregate results across all crops demonstrate clear differences in model performance. When ranked by mean MAE (mean absolute error) across all crops, PLS emerged as the most accurate with an average MAE of 0.1420 and RMSE of 0.1756, paired with a near-zero R² value of -0.1161. This suggests PLS consistently produced low forecast errors but did not yield positive explained variance values relative to the mean baseline, highlighting the difficulty of achieving high variance explanation in agricultural time-series data. Hybrid exponential smoothing approaches also performed strongly: ETS-ANN, ETS-LSTM, and ETS-SVM all achieved an average MAE of 0.1988 and RMSE of 0.2218, with MAPE around 3.06 percent. These models maintained moderate error rates while providing staple forecasts across crops. LSTM as a standalone neural network achieved an average MAE of 0.2226 and RMSE of 0.2588, with a MAPE of 3.69 percent. Although its R² was negative (-0.7609), it ranked favorably in error-based comparisons, outperforming ARIMA.
ARIMA, as the classical statistical benchmark, produced an average MAE of 0.3437 and RMSE of 0.3786, with MAPE around 4.93 percent. Its R² mean was -2.2886, indicating limitations in capturing variance. Nonetheless, ARIMA outperformed several nonlinear and hybrid models in error terms, underscoring the continued relevance of linear models in yield forecasting. In contrast, models such as ANN, EMD-ANN, and Wavelet-ANN performed poorly across all metrics. ANN recorded a mean MAE of 0.9506 and RMSE of 1.0761 with an extremely negative R² of -39.8991, while EMD-ANN and Wavelet-ANN recorded MAE above 1.0 and RMSE above 1.1, with R² values of approximately -52.59 and -54.90 respectively. Their percentage errors were also high, with MAPE values ranging between 16.84 and 21.10 percent. These results suggest that simple feedforward ANNs and hybrid decomposition-based architectures were not well-suited to this dataset, potentially due to overfitting and lack of effective temporal structure capture.
The best model for each crop yield forecast for next 20 years shown in Figure 2. Residual diagnostics across models indicated no significant ARCH effects, implying that volatility clustering was not a primary concern in this dataset. CDVI values averaged consistently around 20.7 across models, reflecting inherent variability tied to crop characteristics rather than specific model outputs. Thus, variability indices did not differentiate strongly between approaches.
Figure 3 shows the RMSE and R2 score values of all models. The model with the lowest RMSE and the highest R2 is the best performing among all hybrid models. In synthesis, the ranking of models based on overall crop performance shows PLS and ETS-based hybrids as the strongest performers, followed by standalone LSTM. ARIMA achieved moderate performance, while ANN and decomposition-driven hybrids underperformed significantly. The contrast between strong linear or hybrid models and weak standalone nonlinear models highlights that capturing agricultural yield dynamics benefits from either linear dimension reduction (as in PLS) or hybrid smoothing mechanisms (as in ETS-ANN), rather than relying solely on deep or shallow neural networks. The performance of crop yield forecasting for globally used crops is shown in Table 3. The ETS-ANN model achieved highest R2 whereas PLS achieved lowest R2. The yield instability using CDVI of top five crops is shown in Figure 4.
The Table 4 shows the best five models of each crop yield prediction showing ETS-ANN often perform best whereas optimal model depends on crop.

5. Discussion

The study results indicate that hybrid models consistently outperform standalone statistical models. Deep learning models such as LSTM show superior performance for crops exhibiting strong nonlinear temporal dependencies. Wavelet-ANN and EMD-ANN models demonstrate robustness in handling nonstationary yield patterns, confirming the effectiveness of signal decomposition techniques. Volatility-based models reveal significant heteroscedasticity in several crops, justifying the inclusion of GARCH-family models. The findings highlight that no single model is universally optimal, reinforcing the need for a hybrid and comparative framework.

6. Conclusions

The comparative evaluation across crops demonstrates that hybrid and dimension reduction methods are consistently more effective for crop yield forecasting than standalone neural networks or classical linear models. Partial Least Squares achieved the best aggregate error performance, while ETS-ANN, ETS-LSTM, and ETS-SVM hybrids also provided highly competitive forecasts with low percentage errors. Standalone LSTM performed reasonably well and better than ARIMA in error terms, though both maintained negative R² values, suggesting limited explained variance capacity. ARIMA, while not the best model, offered staple performance with lower errors than ANN-based decomposition models, confirming its value as a robust baseline. By contrast, ANN, EMD-ANN, and Wavelet-ANN produced the weakest results, indicating that these models were not well adapted to the dataset. Overall, the study underscores that in agricultural yield forecasting, the integration of smoothing mechanisms or linear dimension reduction offers distinct advantages over raw machine learning models. Future work should investigate ensemble strategies that combine top performers such as PLS and ETS-hybrids to exploit complementary strengths. Furthermore, applying statistical significance tests like the Diebold-Mariano test would provide deeper validation of whether observed error differences are systematic. The findings reinforce that careful methodological choice is critical for developing accurate and reliable forecasting systems in the agricultural domain.

Author Contributions

Conceptualization, Suraj Arya., Anju., and Ankit Yadav.; methodology, Suraj Arya., Anju., and Ankit Yadav.; software, Anju., Ankit Yadav.; formal analysis, Suraj Arya., Anju., and Ankit Yadav.; investigation, Suraj Arya., Anju., and Ankit Yadav.; resources, Suraj Arya., Anju., and Ankit Yadav.; data curation, Anju., and Ankit Yadav.; writing—original draft preparation, Suraj Arya., Anju., and Ankit Yadav.; writing—review and editing, Suraj Arya. Sahimel Azwal Bin Sulaiman., and Dedek Andrian.; visualization, Suraj Arya., Anju., Ankit Yadav., Sahimel Azwal Bin Sulaiman.; supervision, Suraj Arya., Sahimel Azwal Bin Sulaiman., Dedek Andrian.; project administration, Suraj Arya., Sahimel Azwal Bin Sulaiman., Dedek Andrian.; funding acquisition, Sahimel Azwal Bin Sulaiman., and Dedek Andrian. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Research Innovation Department, Universiti Malaysia Pahang Al215 Sultan Abdullah, for the financial support received through the Internal Grant UMPSA (RDU242706/UIC241507).

Institutional Review Board Statement

Not applicable, as this study did not involve human or animal subjects.

Data Availability Statement

Data may be available upon request.

Acknowledgments

All authors thank the Institute for providing valuable resources for conducting this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ANN Artificial Neural Network
ARIMA Autoregressive Integrated Moving Average
EMD-ANN Empirical Mode Decomposition–Artificial Neural Network
ETS-ANN Error–Trend–Seasonality–Artificial Neural Network
ETS-LSTM Error–Trend–Seasonality–Long Short-Term Memory
ETS-SVM Error–Trend–Seasonality–Support Vector Machine
LSTM Long Short-Term Memory
PLS Partial Least Squares
Wavelet-ANN Wavelet Transform–Artificial Neural Network

References

  1. Norton, G. W.. Jeffery. Alwang, and W. A.. Masters, Economics of agricultural development: world food systems and resource use; Routledge, Taylor & Francis Group, 2022. [Google Scholar]
  2. Pawlak, K.; Kołodziejczak, M. The role of agriculture in ensuring food security in developing countries: Considerations in the context of the problem of sustainable food production. Sustainability (Switzerland) 2020, vol. 12(no. 13). [Google Scholar] [CrossRef]
  3. Bahar, N. H. A. Meeting the food security challenge for nine billion people in 2050: What impact on forests? Global Environmental Change 2020, vol. 62, 102056. [Google Scholar] [CrossRef]
  4. Alebele, Y. Estimation of crop yield from combined optical and SAR imagery using Gaussian kernel regression. IEEE J Sel Top Appl Earth Obs Remote Sens 2021, vol. 14, 10520–10534. [Google Scholar]
  5. Shafi, U. Tackling food insecurity using remote sensing and machine learning-based crop yield prediction. IEEE Access 2023, vol. 11, 108640–108657. [Google Scholar] [CrossRef]
  6. Erenstein; Jaleta, M.; Sonder, K.; Mottaleb, K.; Prasanna, B. M. Global maize production, consumption and trade: trends and R&D implications; Springer Science and Business Media B.V, 01 Oct 2022. [Google Scholar] [CrossRef]
  7. Zhou, Y.; Ma, S.; Zhang, H.; Aakur, S. Enhancing corn yield prediction: Optimizing data quality or model complexity? Smart Agricultural Technology 2024, vol. 9, 100671. [Google Scholar] [CrossRef]
  8. Sun, J.; Di, L.; Sun, Z.; Shen, Y.; Lai, Z. County-level soybean yield prediction using deep CNN-LSTM model. Sensors 2019, vol. 19(no. 20), 4363. [Google Scholar]
  9. Bali, N.; Singla, A. Emerging Trends in Machine Learning to Predict Crop Yield and Study Its Influential Factors: A Survey. Archives of Computational Methods in Engineering 2022, vol. 29(no. 1), 95–112. [Google Scholar] [CrossRef]
  10. Kumar, B. Varun; Rao, P. V. Gopi Krishna. An effective hybrid attention model for crop yield prediction using IoT-based three-phase prediction with an improved sailfish optimizer. Expert Syst Appl 2024, vol. 255, 124740. [Google Scholar] [CrossRef]
  11. Yewle, A. D.; Mirzayeva, L.; Karakuş, O. Multi-modal data fusion and deep ensemble learning for accurate crop yield prediction. Remote Sens Appl 2025, vol. 38, 101613. [Google Scholar] [CrossRef]
  12. Ben Ayed, R.; Hanana, M. Artificial Intelligence to Improve the Food and Agriculture Sector. In Hindawi Limited; 2021. [Google Scholar] [CrossRef]
  13. Ashwitha, A.; Latha, C. A. Crop Recommendation and Yield Estimation Using Machine Learning. Journal of Mobile Multimedia 2022, vol. 18(no. 3), 861–884. [Google Scholar] [CrossRef]
  14. Badshah; Yousef Alkazemi, B.; Din, F.; Zamli, K. Z.; Haris, M. Crop Classification and Yield Prediction Using Robust Machine Learning Models for Agricultural Sustainability. IEEE Access 2024, vol. 12, 162799–162813. [Google Scholar] [CrossRef]
  15. Jiabul Hoque, M. D. Incorporating Meteorological Data and Pesticide Information to Forecast Crop Yields Using Machine Learning. IEEE Access 2024, vol. 12, 47768–47786. [Google Scholar] [CrossRef]
  16. Screpnik; Zamudio, E.; Gimenez, L. Artificial Intelligence in Agriculture: A Systematic Review of Crop Yield Prediction and Optimization. IEEE Access, 2025. [Google Scholar]
  17. Sharma, P.; Dadheech, P.; Aneja, N.; Aneja, S. Predicting Agriculture Yields Based on Machine Learning Using Regression and Deep Learning. IEEE Access 2023, vol. 11, 111255–111264. [Google Scholar] [CrossRef]
  18. Kumar, S.; Kumar, M. Developing an XAI-Based Crop Recommendation Framework Using Soil Nutrient Profiles and Historical Crop Yields. In IEEE Transactions on Consumer Electronics; 2025. [Google Scholar] [CrossRef]
  19. Shams, M. Y.; Gamel, S. A.; Talaat, F. M. Enhancing crop recommendation systems with explainable artificial intelligence: a study on agricultural decision-making. Neural Comput Appl 2024, vol. 36(no. 11), 5695–5714. [Google Scholar] [CrossRef]
  20. Elavarasan; Durairaj Vincent, P. M. Crop Yield Prediction Using Deep Reinforcement Learning Model for Sustainable Agrarian Applications. IEEE Access 2020, vol. 8, 86886–86901. [Google Scholar] [CrossRef]
  21. Waqar, M.; Kim, Y. W.; Byun, Y. C. A Stacking Ensemble Framework Leveraging Synthetic Data for Accurate and Staple Crop Yield Forecasting. In IEEE Access; 2025. [Google Scholar] [CrossRef]
  22. Liu, J. Crop Yield Estimation in the Canadian Prairies Using Terra/MODIS-Derived Crop Metrics. IEEE J Sel Top Appl Earth Obs Remote Sens 2020, vol. 13, 2685–2697. [Google Scholar] [CrossRef]
  23. Dey, T.; Bera, S.; Mukherjee, A.; De, D.; Buyya, R. FLyer: Federated Learning-Based Crop Yield Prediction for Agriculture 5.0. IEEE Transactions on Artificial Intelligence vol. 6(no. 7), 1943–1952, 2025. [CrossRef]
  24. Osibo, B. K. Enhancing Crop Yield Estimation Through Iterative Querying and Bayesian-Optimized Gated Networks. IEEE Geoscience and Remote Sensing Letters 2025, vol. 22. [Google Scholar] [CrossRef]
  25. Rao, M. S.; Singh, A.; Reddy, N. V. S.; Acharya, D. U. Crop prediction using machine learning. In Journal of Physics: Conference Series; IOP Publishing Ltd, Jan 2022. [Google Scholar] [CrossRef]
  26. Crecco, L.; Bajocco, S.; Bregaglio, S. CrYP: An open-source Google earth engine tool for spatially explicit crop yield predictions. Comput Electron Agric 2025, vol. 237. [Google Scholar] [CrossRef]
  27. Ransom, J. Statistical and machine learning methods evaluated for incorporating soil and weather into corn nitrogen recommendations. Comput Electron Agric 2019, vol. 164, 104872. [Google Scholar] [CrossRef]
  28. Panda, S. S.; Ames, D. P.; Panigrahi, S. Application of vegetation indices for agricultural crop yield prediction using neural network techniques. Remote Sens (Basel) 2010, vol. 2(no. 3), 673–696. [Google Scholar] [CrossRef]
  29. Lykhovyd, P. Sweet corn yield simulation using normalized difference vegetation index and leaf area index. Journal of Ecological Engineering 2020, vol. 21(no. 3). [Google Scholar] [CrossRef]
  30. Shahhosseini, M.; Hu, G.; Archontoulis, S. V. Forecasting corn yield with machine learning ensembles. Front Plant Sci 2020, vol. 11, 1120. [Google Scholar] [CrossRef]
  31. Crane-Droesch, A. Machine learning methods for crop yield prediction and climate change impact assessment in agriculture. Environmental Research Letters 2018, vol. 13(no. 11), 114003. [Google Scholar] [CrossRef]
  32. Kross. Using artificial neural networks and remotely sensed data to evaluate the relative importance of variables for prediction of within-field corn and soybean yields. Remote Sens (Basel) 2020, vol. 12(no. 14), 2230. [Google Scholar] [CrossRef]
  33. Maimaitijiang, M.; Sagan, V.; Sidike, P.; Hartling, S.; Esposito, F.; Fritschi, F. B. Soybean yield prediction from UAV using multimodal data fusion and deep learning. Remote Sens Environ 2020, vol. 237, 111599. [Google Scholar] [CrossRef]
  34. Terliksiz, A. S.; Alt\`ylar, D. T. Use of deep neural networks for crop yield prediction: A case study of soybean yield in lauderdale county, alabama, usa. 2019 8th international conference on Agro-Geoinformatics (Agro-Geoinformatics), 2019; pp. 1–4. [Google Scholar]
  35. Schwalbert, R. A.; Amado, T.; Corassa, G.; Pott, L. P.; Prasad, P. V. V.; Ciampitti, I. A. Satellite-based soybean yield forecast: Integrating machine learning and weather data for improving crop yield prediction in southern Brazil. Agric For Meteorol 2020, vol. 284, 107886. [Google Scholar] [CrossRef]
  36. MirhoseiniNejad, S. M.; Abbasi-Moghadam, D.; Sharifi, A. ConvLSTM-ViT: A deep neural network for crop yield prediction using Earth observations and remotely sensed data. IEEE J Sel Top Appl Earth Obs Remote Sens, 2024. [Google Scholar]
  37. Debalke, B.; Abebe, J. T. Maize yield forecast using GIS and remote sensing in Kaffa Zone, South West Ethiopia. Environmental Systems Research 2022, vol. 11(no. 1). [Google Scholar] [CrossRef]
  38. Pei, J. Downscaling Administrative-Level Crop Yield Statistics to 1 km Grids Using Multisource Remote Sensing Data and Ensemble Machine Learning. IEEE J Sel Top Appl Earth Obs Remote Sens 2024, vol. 17, 14437–14453. [Google Scholar] [CrossRef]
  39. Mitra, A. Cotton Yield Prediction: A Machine Learning Approach With Field and Synthetic Data. IEEE Access 2024, vol. 12, 101273–101288. [Google Scholar] [CrossRef]
  40. Haider, S. T. An Ensemble Machine Learning Framework for Cotton Crop Yield Prediction Using Weather Parameters: A Case Study of Pakistan. IEEE Access 2024, vol. 12, 124045–124061. [Google Scholar] [CrossRef]
  41. Chang, Y. J.; Lai, M. H.; Wang, C. H.; Huang, Y. S.; Lin, J. Target-Aware Yield Prediction (TAYP) Model Used to Improve Agriculture Crop Productivity. IEEE Transactions on Geoscience and Remote Sensing 2024, vol. 62, 1–11. [Google Scholar] [CrossRef]
  42. Ashfaq, M.; Khan, I.; Alzahrani, A.; Tariq, M. U.; Khan, H.; Ghani, A. Accurate wheat yield prediction using machine learning and climate-NDVI data fusion. IEEE Access 2024, vol. 12, 40947–40961. [Google Scholar] [CrossRef]
  43. Sarkar, S. Crop Yield Prediction Using Multimodal Meta-Transformer and Temporal Graph Neural Networks. IEEE Transactions on AgriFood Electronics 2024, vol. 2(no. 2), 545–553. [Google Scholar] [CrossRef]
  44. Rashid, M.; Bari, B. S.; Yusup, Y.; Kamaruddin, M. A.; Khan, N. A Comprehensive Review of Crop Yield Prediction Using Machine Learning Approaches with Special Emphasis on Palm Oil Yield Prediction; Institute of Electrical and Electronics Engineers Inc, 2021. [Google Scholar] [CrossRef]
  45. Ramzan, Z.; Asif, H. M. S.; Yousuf, I.; Shahbaz, M. A Multimodal Data Fusion and Deep Neural Networks Based Technique for Tea Yield Estimation in Pakistan Using Satellite Imagery. IEEE Access 2023, vol. 11, 42578–42594. [Google Scholar] [CrossRef]
  46. Sudhamathi, T.; Perumal, K. Ensemble regression based Extra Tree Regressor for hybrid crop yield prediction system. Measurement: Sensors 2024, vol. 35, 101277. [Google Scholar] [CrossRef]
  47. Mahmud, T. An Approach for Crop Prediction in Agriculture: Integrating Genetic Algorithms and Machine Learning. IEEE Access 2024, vol. 12, 173583–173598. [Google Scholar] [CrossRef]
  48. Saleem, R. M. Internet of Things Based Weekly Crop Pest Prediction by Using Deep Neural Network. IEEE Access 2023, vol. 11, 85900–85913. [Google Scholar] [CrossRef]
  49. Dangi, S.; Mullapudi, S. K.; Raghaw, C. S.; Dar, S. S.; Rehman, M. Z. U.; Kumar, N. A multi-temporal multi-spectral attention-augmented deep convolution neural network with contrastive learning for crop yield prediction. Comput Electron Agric 2025, vol. 239. [Google Scholar] [CrossRef]
  50. Liu, S.; Wang, D.; Guo, H.; Han, C.; Zeng, W. MT-CYP-Net: Multi-task network for pixel-level crop yield prediction under very few samples. International Journal of Applied Earth Observation and Geoinformation 2025, vol. 143. [Google Scholar] [CrossRef]
  51. Vemunuri, J.; Murthy, D. G. Precision agriculture through biochemical urease-aware crop yield prediction using enhanced fuzzy logic and deep learning. Smart Agricultural Technology 2025, vol. 12, 101340. [Google Scholar] [CrossRef]
  52. Mena. Adaptive fusion of multi-modal remote sensing data for optimal sub-field crop yield prediction. Remote Sens Environ 2025, vol. 318, 114547. [Google Scholar] [CrossRef]
  53. Jeong, S.; Ko, J.; Ban, J. oh; Shin, T.; Yeom, J. min. Deep learning-enhanced remote sensing-integrated crop modeling for rice yield prediction. Ecol Inform 2024, vol. 84, 102886. [Google Scholar] [CrossRef]
  54. Subramaniam, L. K.; Marimuthu, R. Crop yield prediction using effective deep learning and dimensionality reduction approaches for Indian regional crops. e-Prime - Advances in Electrical Engineering, Electronics and Energy 2024, vol. 8, 100611. [Google Scholar] [CrossRef]
  55. Demirhan. A deep learning framework for prediction of crop yield in Australia under the impact of climate change. Information Processing in Agriculture 2025, vol. 12(no. 1), 125–138. [Google Scholar] [CrossRef]
  56. Rajakumaran, M.; Arulselvan, G.; Subashree, S.; Sindhuja, R. Crop yield prediction using multi-attribute weighted tree-based support vector machine. Measurement: Sensors 2024, vol. 31, 101002. [Google Scholar] [CrossRef]
  57. Yang, M. Seasonal prediction of crop yields in Ethiopia using an analog approach. Agric For Meteorol 2023, vol. 331, 109347. [Google Scholar] [CrossRef]
  58. Iniyan, S.; Varma, V. Akhil; Naidu, C. Teja. Crop yield prediction using machine learning techniques. Advances in Engineering Software 2023, vol. 175, 103326. [Google Scholar] [CrossRef]
  59. Maya Gopal, P. S.; Bhargavi, R. A novel approach for efficient crop yield prediction. Comput Electron Agric 2019, vol. 165, 104968. [Google Scholar] [CrossRef]
  60. Cedric, L. S. Crops yield prediction based on machine learning models: Case of West African countries. Smart Agricultural Technology 2022, vol. 2, 100049. [Google Scholar] [CrossRef]
  61. Gawade, S. D.; Bhansali, A.; Chopade, S.; Kulkarni, U. Optimizing crop yield prediction with R2U-Net-AgriFocus: A deep learning architecture with leveraging satellite imagery and agro-environmental data. Expert Syst Appl 2026, vol. 296, 128942. [Google Scholar] [CrossRef]
Figure 1. Flowchart of proposed methodology of staple crop yield dataset.
Figure 1. Flowchart of proposed methodology of staple crop yield dataset.
Preprints 192542 g001
Figure 2. Best model for crop yield forecast.
Figure 2. Best model for crop yield forecast.
Preprints 192542 g002
Figure 3. RMSE and R2 score.
Figure 3. RMSE and R2 score.
Preprints 192542 g003
Figure 4. Top five crop using CDVI.
Figure 4. Top five crop using CDVI.
Preprints 192542 g004
Table 1. Comparative analysis of prior work.
Table 1. Comparative analysis of prior work.
Sr. No. References Research Subject/Crop ML/DL techniques Accuracy Limitation
1 [13] Karnataka State crop Decision tree, k-NN, XGBoost, SVM, DBSCAN, agglomerative, Random forest, logistic regression, naïve bayes, gradient boosting-means, linear regression, stochastic gradient descent 99.93% (RF) Limited to the Karnataka State crop in India, not globally
2 [46] Cassava, Maize, Plantains and others, Potatoes, Rice, paddy,
Sorghum, Soybeans, Sweet, potatoes, Wheat,Yams
KPCA, LESSO, ER-ETR 95% Model overfitting due to small dataset
3 [14] Mango, papaya, apple, banana, orange, pomegranate, grapes, watermelon, muskmelon, coconut, mung beans, mung bean, chickpea, kidney beans, pigeon peas, black gram, cotton, coffee, jute, and moth beans DT, RF, SVM, KNN, GNB, ETC, LR 99.7% (RF) Limited to regression models only
4 [43] Sugarbeet crop Temporal graph neural network, multimodal meta transfer 97% Only limited to the Sugarbeet crop
5 [47] Pigeon peas, Chickpea, Coffee, Pomegranate, Kidney beans, Apple, Muskmelon, Rice, Black gram, Cotton, Maize, Coconut, Grapes, Moth beans, Banana, Jute, Watermelon, Mung beans, Papaya, Lentil, Orange, and Mango. GA and ML, accuracy = 99.3% 99.3% Focus on dataset of ICAR
6 [48] Crop pests IoT, DL, accuracy = 94% 94% Focus on sensors generated dataset
7 [49] Crop MTMS-YieldNet Sentinel-2 dataset MAPE = 0.331 Focus on Sentinel-1, Sentinel-2, and e Landsat-8 datasets only
8 [50] soybean, maize, rice MT-CYP-Net RMSE = 0.1472, MAE = 0.0706 Focus on satellite Sentinel-2 dataset for soybean, maize, rice
9 [51] Crops (apple, banana, blackgram, chickpea, coconut, coffee, cotton, grapes, jute, kidney beans, lentil, maize, mango, moth beans, mung beans, muskmelon, orange, papaya, pigeonpeas, pomegranate, rice, and watermelon) EL-Fuzzy, TL-SFGRU
MSE = 0.1087, RMSE = 0.3296, and MAE = 0.1057 Limited to the Indian crop only
10 [52] soybean, wheat, rapeseed Multi-modal Gated Fusion (MMGF) R2 = 0.80 Depend on Sentinel-2 satellites, weather data
Limited to Argentina, Uruguay, and Germany area crop
11 [53] Rice crop LSTM, bidirectional LSTM, FFNN, GRU RMSE of 0.101, PBIAS of 0.74, and NSE of 0.9960 Focus only on the rice crop
12 [17] Indian Crop Random forest, XGBoost, decision tree (regression), CNN, LSTM (deep learning) 98.96% (RF) Limited to the Indian crop only
13 [54] South Indian Crop of both season SEKPCA, WTDCNN 98.96% Limited to the South Indian crop only
14 [55] oats, corn, rice, and wheat DNN MAE and RMSE (19-40%) Focus on Australia crops
15 [56] Indian Crop Z-score, GA, PCA, MAWT-SVM, Not specified Overhead due to GA
16 [57] Ethiopia Seasonal Crop CREST, DSSAT R2 = 0.60 (Dangishta sites) Limited to Maize crop of Ethiopia
17 [58] Not Specified LASSO, GB, LSTM, Ridge, MLR, DTR, PLS, Elastic Net 86.3% (LSTM) Unclear generalization of dataset
18 [59] Tamilnadu State crop SVR, RF, ANN, MLR, KNN, MLR-ANN 0.99 (MLR-ANN Limited to the Tamilnadu State crop paddy (rice) in India, not globally
19 [60] West African DT, Logistic regression, KNN R2 = 95.3% (DT) Focus on only country level prediction
20 [61] General Crop R2U-Net-AgriFocus, CNN, VGG16, HHPA MSE = 0.002, MAE = 0.001, NMSE = 0, RMSE = 0.039, MAPE = 0 Model computational complexity is very high
Table 2. Average model performance.
Table 2. Average model performance.
Model MAE_mean MAE_std RMSE_mean RMSE_std R2_mean R2_std MAPE_mean MAPE_std
ANN 0.9506 0.9765 1.0761 1.1006 -39.8991 32.7213 16.8396 6.2405
ARIMA 0.3437 0.5129 0.3786 0.5495 -2.2886 3.0593 4.9255 3.123
EMD-ANN 1.0302 0.9323 1.155 1.0341 -52.5893 32.3121 20.8727 9.2781
ETS-ANN 0.1988 0.2707 0.2218 0.2971 -0.199 1.0167 3.0636 1.3533
ETS-LSTM 0.1988 0.2707 0.2218 0.2971 -0.199 1.0167 3.0636 1.3533
ETS-SVM 0.1988 0.2707 0.2218 0.2971 -0.199 1.0167 3.0636 1.3533
LSTM 0.2226 0.2353 0.2588 0.2784 -0.7609 1.4087 3.6877 1.6926
PLS 0.142 0.1338 0.1756 0.1747 -0.1161 1.3192 2.7283 1.3218
Wavelet-ANN 1.294 1.3847 1.4287 1.5167 -54.9043 30.8017 21.1054 5.7466
Table 3. Model performance for staple crop.
Table 3. Model performance for staple crop.
Crop Model MAE RMSE MAPE
Bananas PLS 0.438 0.572 0.201 2.03
Rice PLS 0.031 0.043 0.762 0.74
Wheat ETS-ANN 0.050 0.066 0.600 1.53
Soybeans LSTM 0.034 0.038 0.686 1.90
Maize ARIMA 0.073 0.094 0.770 1.61
Cassava ETS-ANN 0.147 0.166 0.809 1.20
Table 4. Best Top 5 models crop-wise.
Table 4. Best Top 5 models crop-wise.
Crop Name Rank 1 Model Rank 2 Model Rank 3 Model Rank 4 Model Rank 5 Model
Wheat ETS-ANN ETS-LSTM ETS-SVM PLS LSTM
Rice PLS ARIMA ETS-ANN ETS-LSTM ETS-SVM
Bananas PLS LSTM ARIMA ETS-ANN ETS-LSTM
Maize ARIMA PLS LSTM ETS-ANN ETS-LSTM
Soybeans LSTM ETS-ANN ETS-LSTM ETS-SVM PLS
Potatoes PLS LSTM ETS-ANN ETS-LSTM ETS-SVM
Beans (Dry) ARIMA ETS-ANN ETS-LSTM ETS-SVM PLS
Peas (Dry) ETS-ANN ETS-LSTM ETS-SVM PLS LSTM
Cassava ETS-ANN ETS-LSTM ETS-SVM ARIMA PLS
Cocoa Beans PLS ETS-ANN ETS-LSTM ETS-SVM ARIMA
Barley PLS ETS-ANN ETS-LSTM ETS-SVM LSTM
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated