Submitted:
12 July 2025
Posted:
15 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
2.1. Aim and Objectives
- Crop yield prediction using a hybrid meta-classifier
- Crop recommendation using combined Yeo-Johnson transformer and multinomial logistic regression
- Price forecasting using MLP, RNN, CNN, ARIMA, and Prophet
3. Materials and Methods
- Data preprocessing,
- Model training,
- Evaluation of models,
- Explainable AI (XAI) techniques such as LIME,
- Integration into an agribusiness system
- Experimental results and analysis
3.1. Data Gathering
3.1.1. Price Forecasting Dataset
- Categories include various kinds of sources such as cereals and tubers, meat, fish and eggs, miscellaneous food, vegetables and fruits, oil, fat, pulses, and nuts.
- 12 states with diverse locations are Chin, Kachin, Kayah, Kayin, Magway, Mandalay, Mon, Rakhine, Sagaing, Shan, Tanintharyi, and Yangon.
- The many types of commodities comprise chickpeas, garlic, maize, oil, onions, potatoes, rice, salt, soybeans, tomatoes, and pulses.
- The economies of market places are Ah Nauk Pyin, Ah Nauk Ywe, Ahpauk Wa, Ai Cheng, Alel Than Kyaw, AnnMyo Thit, Aung San, Aung Zaya, Ba Yint Naung, Ban Wai, Barsara, Baw Du Pha, Bhamo, Bhamo Market 2, Bi Kin, Bilin Myo Ma Market, Bokyin, BoneSin, Butar, Buthidaung, Cherrygone, Chipwi, Chying Thung, Dein Aw, Demoso Myoma, Du Kahtawng, Falam, Galeng, Gangaw, Garayang, Gwa Myoma, Hakha Myoma Market, Hlaing Bwe Myo Ma Market, Hnaring, Ho Li, Home Shop.
- The commodity's price is stated in both USD and Myanmar currency.
3.1.2. Remote Sensing Dataset for Crop Recommendation
3.1.2. Yield Prediction Dataset
3.2. Data Preprocessing
3.3. Data Visualization
3.4. Train and Testing Split
3.5. Using a Logistic Transformer to Recommend Crops with Remote Sensing Data
- Equation (1) gives a thorough explanation of the weighted summation of the input variables.
- Equation (2) estimates the probability that unseen data will belong to a specific class.
- Weighted addition of inputs
- Assume the probability of unseen data belonging to the class
- In Equation (3), add 1 to the data (y), raise it to the power of λ, deduct 1, and divide by λ when λ = 0.
- In Equation (4), if λ ≠0: Take the log and add 1 to the data y.
- In Equation (5), if λ is not equal to 2, take 1 from the data and multiply the result by 2 - λ.
- In Equation (6), take the log of the data log, y minus 1, when λ = 2.
3.6. A Hybrid Approach for Yield Prediction
- Step 1: First, import the crop dataset with several 1000 entries.
- Step 2: Put the required libraries and packages.
- Step 3: The data has been preprocessed.
- Step 4: The data is split into trains and test sets to build up the dataset.
- Step 5: Following this, a model is built using hybrid algorithms (Cat Boost Regressor, Gradient Boosting Regressor) and machine learning (Bagging Regressor) techniques, forecasting the ideal yields that ought to be produced.
- Step 6: The testing set evaluates the meta-classifier's execution.
- Step 7: The hybrid returns the Accuracy, Mean Score, MSE, and R2_score.
3.7. Price Forecasting
- MLP (Multi-Layer Perceptron)
- CNN (Convolutional Neural Networks)
- RNN (Recurrent Neural Networks)
- ARIMA model
- Prophet model
3.7.1. Multi-Layer Perceptron (MLP)
3.7.2. Convolutional Neural Networks (CNNs)
- Convolutional layers for feature extraction
- Pooling layers to decrease dimensionality and retain essential information
- Fully connected layers to interpret extracted features
- Additional layers to enhance model performance
3.7.3. Recurrent Neural Networks (RNNs)
3.7.4. ARIMA Model
3.7.5. Prophet Model
- Compared to ARIMA, Prophet is a more flexible and user-friendly model that automatically manages seasonality, changepoints, and missing data. It also offers distinct representations of the model's constituent parts.
- However, ARIMA is more outlier-sensitive and necessitates more manual model parameter adjustment. It might also be more challenging to interpret and necessitate a greater comprehension of the fundamental statistical ideas.
- However, it is more adaptable because ARIMA can be expanded to the SARIMAX model, which permits the inclusion of exogenous variables.
- The performance of the model is contingent upon the type of data and the task in question; it is vital to remember this.
- Time series forecasting problems are successfully solved using Prophet and ARIMA; nevertheless, it's crucial to test various models and approaches, assess each one's performance, and choose the best one for the dataset.
3.8. A Statistical Evaluation of Models
3.9. Challenges
- To achieve optimal outcomes, preprocessing and choosing an appropriate dataset enhance the project's appeal.
- Minimal processing power is required for training regression models.
- The gathering of various large datasets for prediction analysis presents a challenge.
4. Results
4.1. Recommendation of Crops
4.1.1. Analysis of Crop Recommendation Results
- Size of the Dataset: The model using remote sensing data has access to a much larger dataset (325,834 data points), likely contributing to its higher accuracy. The crop data model uses 2,200 entries.
- Data Quality and Features: Remote sensing data likely provides richer, more detailed information on environmental and crop conditions, whereas crop data may be more limited in scope or not as diverse, affecting model performance.
- Overfitting and Underfitting: The crop data model could be underfitting due to a smaller dataset, while the remote sensing model has enough data to capture patterns without overfitting.
4.2. Yield Prediction
| Model | Accuracy | Meaning Square Error | R2_score |
|---|---|---|---|
| Hybrid Model | 0.982941 | 123845573.323863 | 0.982941 |
| Comparison of results with the other seven models | |||
| Linear Regression | 0.028421 | 7201484234.197933 | 0.028421 |
| Random Forest | 0.157611 | 6243913909.361324 | 0.157611 |
| Gradient Boost | 0.143967 | 6345039142.634174 | 0.143967 |
| XGBoost | 0.042164 | 7099623711.649448 | 0.042164 |
| KNN | 0.353193 | 4794229490.966104 | 0.353193 |
| Decision Tree | -0.058247 | 7843881514.231441 | -0.058247 |
| Bagging Regressor | 0.158734 | 6235585480.723247 | 0.158734 |
| The accuracy of the Meta Classifier Model Train is 99%. |
| The accuracy of the Meta Classifier Model Test is 98%. |
4.3. Price Forecasting

4.4. LIME (Local Interpretable Model-agnostic Explanations)
- Gather agricultural data such as crop types, product prices, yield predictions, and classifications.
- Preprocess the data to clean and prepare it by filling in the missing values. Train models to generate predictive models.
- Instance Selection: Pick a particular prediction situation that needs to be explained.
- Feature variation: Change the feature values of the chosen instance to introduce variations and create a synthetic dataset.
- Model Prediction: Use the trained model to generate predictions for the modified cases to evaluate the impact of feature changes on the outcomes.
- Training Surrogate Models: Create a user-friendly surrogate model.
- Producing Explanations: Examine the surrogate model to identify and highlight the crucial factors impacting the prediction, which will assist users in understanding how the model makes.
4.5. Integrated into an Agribusiness System
5. Discussion
6. Recommendations and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Iniyan, S.; Varma, V.A.; Ch, T.N. Crop Yield Prediction Using Machine Learning Techniques. ScienceDirect, Elsevier Ltd. 2022. [CrossRef]
- Roshini, N.; Kumar, P.G.S.; Venkatesh, P.; Dhanabalan, G. Crop Price Prediction. Int. J. Res. Trends Innov. (JRTI) 2023, 8(4). ISSN 2456-3315.
- Ishita, G.; Ritesh, V.; Rohit, C.; Vidhate, A. An Intelligent Crop Price Prediction Using a Suitable Machine Learning Algorithm. ITM Web Conf. 2021, 40, 03040.
- Ishita, G.; Ritesh, V.; Rohit, C.; Vidhate, A. Crop Recommender System Using Machine Learning Approach. In Proceedings of the 5th Int. Conf. on Computing Methodologies and Communication (ICCMC), IEEE, Erode, India, 8–10 April 2021.
- Fan, M.; Shen, J.; Yuan, L.; Jiang, R.; Chen, X.; Davies, W.J.; Zhang, F. Improving Crop Productivity and Resource Use Efficiency to Ensure Food Security and Environmental Quality in China. J. Exp. Bot. 2011. [CrossRef]
- Borsellino, M.; Lupi, E.; Schimmenti, F.G. Impacts of Climate Change on Global Agri-Food Trade. Elsevier 2023.
- Goud, E.L.; Singh, J.; Kumar, P. Climate Change and Its Impact on Global Food Production. In: Microbiome Under Changing Climate: Implications and Solutions; Elsevier 2022; pp. 415–436.
- Zsuffa, A.; Sipos, A.; Németh, J.; Nyéki, A. Using the CERES-Maize Model to Simulate Crop Yield in a Long-Term Field Experiment in Hungary. MDPI 2022.
- Murtaza, G.; Hussain, N.; Alharbi, B. How and Why to Prevent Over-Fertilization to Get Sustainable Crop Production. In: Sustainable Plant Nutrition; Elsevier 2023; pp. 339–354.
- Miroshnychenko, O.; Pavlovska, V.; Levkivska, N.; Holub, J.; Mykhailov, I. The System of Effective Management of Crop Production in Modern Conditions. BIO Web Conf. 2020, 17, 00027.
- Ganesh, S.K.; Prabhakar, B.V.A.N.S.S. Crop Price Prediction Using Machine Learning. Int. Res. J. Mod. Eng. Technol. Sci. 2021, 3, 3477–3481.
- Mandal, M.K.; Ghosh, P.K.T.; Kar, S. Agricultural Commodity Price Prediction Model: A Machine Learning Framework. Neural Comput. Appl. 2023, 35, 15109–15128.
- Sun, F.; Ma, X.; Zhang, Y.; Wang, Y.; Jiang, H.; Liu, P. Agricultural Product Price Forecasting Methods: A Review. Sustainability 2023, 13(9). [CrossRef]
- Pravallika, K.; Karuna, G.; Anuradha, K.; Srilakshmi, V. A Deep Neural Network Model for Proficient Crop Yield Prediction. E3S Web Conf. 2021, 309, 01031.
- Kumari, P.; Pallavi, P.; Shrilatha, S.; Sushma; Sowmya, S. Crop Yield Forecasting Using Data Mining. Glob. Transit. Proc. 2021, 2, 402–407. [CrossRef]
- Roznik, M.; Boyd, M.; Porth, L. Improving Crop Yield Estimation by Applying Higher-Resolution Satellite NDVI Imagery and High-Resolution Cropland Masks. Remote Sens. Appl. Soc. Environ. 2022, 25, 100693. [CrossRef]
- Ali, A.M.; Abouelghar, M.; Belal, A.A.; Saleh, N.; Yones, M.; Selim, A.I.; Amin, M.E.S.; Elwesemy, A.; Kucher, D.E.; Maginan, S.; Savin, I. Crop Yield Prediction Using Multi Sensors Remote Sensing (Review Article). Egypt. J. Remote Sens. Space Sci. 2022, 25, 711–716. [CrossRef]
- Cornak, A.; Delina, R. Application of Remote Sensing Data in Crop Yield and Quality: Systematic Literature Review. Qual. Innov. Prosper. 2022. ISSN 1335-1745. [CrossRef]
- Jabbar, T.S.A.; Ziboon, A.T.; Albayati, M.M. Crop Yield Estimation Using Different Remote Sensing Data: Literature Review. IOP Conf. Ser. Earth Environ. Sci. 2023, 1129, 012004.
- Suljug, J.; Spisic, J.; Grgic, K.; Zagar, D. A Comparative Study of Machine Learning Models for Predicting Meteorological Data in Agricultural Applications. Electronics 2024, 13, 3284. [CrossRef]


















| Task | Data Size | Model Type |
|---|---|---|
| Crop Recommendation |
2220 rows (ICFA) | combined Yeo-Johnson transformer and multinomial logistic regression + XAI (SHAP) + Real-time |
| 325,834 data points (UCI) | ||
| Yield Prediction | 259815 rows (German yield) | Hybrid Model +XAI (SHAP) + Real-time |
| 19689 entries (FAO) | ||
| Price Forecasting | 171 rows (United Nations) | Deep learning techniques + XAI (SHAP) + Real-time |
| 47223 entries (World Food) |
| date | admin1 | admin2 | market | latitude | longitude | category | commodity | unit | price flag |
price type |
currency | price | USD price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1/15/2008 | Kachin | Myitkyina | Wai Maw | 25.34222 | 97.44001 | cereals and tubers | Rice (low quality) | KG | actual | Retail | MM K |
400 | 63.0785 |
| 1/15/2008 | Kachin | Myitkyina | Wai Maw | 25.34222 | 97.44001 | meat, fish, and eggs | Meat (chicken) | KG | actual | Retail | MMK | 3636.36 | 573.4413 |
| 1/15/2008 | Kachin | Myitkyina | Wai Maw | 25.34222 | 97.44001 | meat, fish, and eggs | Meat (pork) | KG | actual | Retail | MMK | 3636.36 | 573.4413 |
| 1/15/2008 | Kachin | Myitkyina | Wai Maw | 25.34222 | 97.44001 | miscellaneous food | Salt | KG | actual | Retail | MMK | 242.42 | 38.2294 |
| 1/15/2008 | Kachin | Myitkyina | Wai Maw | 25.34222 | 97.44001 | vegetables and fruits | Onions | KG | actual | Retail | MMK | 969.7 | 152.9177 |
| Domain Code |
Domain | Area Code (M49) |
Area | Year Code |
Year | Item Code |
label | Months Code | Months | Element Code |
|---|---|---|---|---|---|---|---|---|---|---|
| CP | Consumer Price Indices |
36 | Australia | 2020 | 2020 | 23013 | Consumer Prices, Food Indices |
7001 | January | 6125 |
| CP | Consumer Price Indices |
36 | Australia | 2020 | 2020 | 23013 | Consumer Prices, Food Indices |
7002 | January | 6125 |
| CP | Consumer Price Indices |
36 | Australia | 2020 | 2020 | 23013 | Consumer Prices, Food Indices |
7003 |
January | 6125 |
| CP | Consumer Price Indices |
36 | Australia | 2020 | 2020 | 23013 | Consumer Prices, Food Indices |
7003 |
January | 6125 |
| CP | Consumer Price Indices |
36 | Australia | 2020 | 2020 | 23013 | Consumer Prices, Food Indices |
7003 |
January | 6125 |
| Label | f1 | f2 | f3 | f4 | f5 | … | f171 | f172 | f173 |
|---|---|---|---|---|---|---|---|---|---|
| 1 | -12.676 | -19.051 | -9.1362 | -12.962 | -10.5520 | … | 0.00000 | -0.00000 | 1.00000 |
| 5 | -12.595 | -19.695 | -14.0660 | -16.776 | -11.8360 | … | 0.33333 | 0.84869 | 0.50617 |
| 4 | -17.026 | -26.219 | -15.6050 | -19.435 | -16.2890 | … | 0.00000 | -0.00000 | 1.00000 |
Numerical Attributes:
|
Categorical Attributes:
|
| Variable | Description |
|---|---|
| Area | Names of nations that cultivate crops. |
| Item | Varieties of crops that were planted. |
| Year | Crop planting periods from 1990 to 2013. |
| average_rain_fall_mm_per_year | Annual rainfall average. |
| pesticides_tonnes | There are many tones for using pesticides. |
| avg_temp | Normal temperature. |
| hg/ha_yield | Value of production from crops expressed in hectograms per hectare (Hg/Ha). |
| Area | Item | Year | hg/ha_ yield |
average_rain_fall_ mm_per_year |
pesticides_ tonnes |
avg_temp |
|---|---|---|---|---|---|---|
| Albania | Maize | 1990 | 36613 | 1485 | 121 | 16.37 |
| Albania | Potatoes | 1990 | 66667 | 1485 | 121 | 16.37 |
| Albania | Rice, paddy | 1990 | 23333 | 1485 | 121 | 16.37 |
| Albania | Sorghum | 1990 | 12500 | 1485 | 121 | 16.37 |
| Albania | Soybeans | 1990 | 7000 | 1485 | 121 | 16.37 |
| district_no | district | nuts_id | year | var | measure | value | outlier |
|---|---|---|---|---|---|---|---|
| 1001 | Flensburg, kreisfreie Stadt | DEF01 | 1979 | ArabLand | area | 891.0 | 0 |
| 1001 | Flensburg, kreisfreie Stadt | DEF01 | 1979 | district | area | 5673.0 | 0 |
| 1001 | Flensburg, kreisfreie Stadt | DEF01 | 1979 | grain_maize | area | NaN | 0 |
| 1001 | Flensburg, kreisfreie Stadt | DEF01 | 1979 | grain_maize | yield | NaN | 0 |
| 1001 | Flensburg, kreisfreie Stadt | DEF01 | 1979 | oats | yield | 42.0 | 0 |
| Hyperparameter | Value/Description |
|---|---|
| Model Type | Logistic Transformer |
| Input Features | optical-radar data, crop types, satellite images |
| Number of Features | f174 |
| variance_inflation_factor | variable, i |
| random_state | 2 |
| Feature Scaling | X_train, X_test |
| cross-validation score | cv=5 |
| Mean cross-validation accuracy | cv_scores.mean() |
| Evaluation Metrics | Accuracy, Precision, Recall, F-Score, |
| Layer (type) dense_14 (Dense) dense_15 (Dense) dense_16 (Dense) |
Output Shape (None, 100) (None, 50) (None, 3) |
Parameters 2500 5050 153 |
| Total parameters: Trainable parameters: Non-trainable parameters: |
7703 7703 0 |
| Step 1 | The input data's form is first specified by defining an input layer. |
| Step 2 | The CNN architecture is then defined by building a sequential model. |
| Step 3 |
This stage involves adding three convolutional layers to the sequential model, each consisting of a convolutional layer, a max-pooling layer, and a flattening layer. |
| Step 4 | Step 4 involves flattening the convolutional layers' output, which turns the 2D matrix data into a 1D vector. Two completely linked layers are subsequently fed this flattened representation. Lastly, the output layer is the final layer. |
| Step 5 | Finally, the inputs and outputs are specified to construct the model. |
|
# Define the RNN model model = Sequential( ) model.add(RNN(50, activation='relu', input_shape=(n_steps, n_features))) model.add(BatchNormalization()) model.add(Dense(n_out)) # Compile and Fit the model model_rnn.compile(loss='mse', optimizer='adam', metrics=['mae', 'mape', RootMeanSquaredError( ), RSquare( )]) hist_rnn = model_rnn.fit (x_train, y_train, validation_data=(x_valid, y_valid), shuffle=False, epochs=100, batch_size=32, verbose=2) # Demonstrate prediction eval_rnn = model_rnn.evaluate(x=x_test, y=y_test, return_dict=True) |
| INPUT: Time series data Determine p, d, and q (ARIMA parameters) - For MA order (q), plot the Autocorrelation (ACF) - Plot the AR order (p) Partial Autocorrelation (PACF) SET p = from PACF or user-defined SET d = total number of applied differences SET q = specified by the user from ACF MODEL: ARIMA (order=(p, d, q)) for time series data FIT model = model fit Assess the model Predict future values Results of output |
| Price Forecasting Models | Hyperparameter Values Explanation |
|---|---|
| ARIMA parameters | p, d, and q |
| p | PACF (Partial Autocorrelation) or user-defined |
| d | total number of applied differences |
| q | specified by the user from ACF (Autocorrelation) |
| Prophet parameters | seasonality, change points, and missing data |
| seasonality mode | additive |
| uncertainty samples | 1000 |
| Changepoint prior scale | 0.05 |
| Proposed Models |
Range Index |
Training Accuracy |
Testing Accuracy |
|---|---|---|---|
| Combining Yeo-Johnson transformer and multinomial logistic regression with Remote Sensing Data |
325,834
data points |
0.986873 | 0.984877 |
| Logistic Transformer with Crop Data | 2200 entries | 0.773000 | 0.713524 |
| Algorithms | Accurateness |
|---|---|
| Recommendation using Remote Sensing Data Recommendation using Crop Data |
98% 71% |
| Multivariate Linear Regression (MLR) [4] Support Vector Machine (SVM) [4] Artificial Neural Network (ANN) [4] K Nearest Neighbor (KNN) [4] Random Forest (RF) [4] |
60% 75% 86% 90% 95% |
| Models | Loss | MAE | MAPE | RMSE | R squared |
|---|---|---|---|---|---|
| MLP | 0.019 | 0.016 | 0.440 | 0.139 | 0.985 |
| CNN | 0.019 | 0.019 | 0.596 | 0.138 | 0.985 |
| RNN | 0.015 | 0.016 | 0.486 | 0.124 | 0.988 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).