1. Introduction
Rice (Oryza sativa L.) is the most valued agricultural product globally. It provides a significant share of global caloric intake and plays a critical role in global food security [
1,
4,
16]. As a staple food for a large proportion of people, rice accounts for up to 50 percent of dietary caloric intake in many Asian countries, strongly linking it to food security. Each year, global rice production reaches approximately 520 million tons. This amounts to approximately 57 kilograms of milled rice per person and provides approximately 20 percent of the world’s calorie needs [
11]. Given its social and economic importance, sustaining and improving rice production is essential for global food security. Longer growing seasons can boost annual yields. However, higher temperatures, especially in tropical and subtropical regions, can reduce crop suitability and yields by limiting fruit sets and accelerating crop development [
15]. More than 3.5 billion people worldwide rely on rice as their staple food. Rice is cultivated as a versatile crop and consumed in forms such as brown rice, polished rice, and rice-derived protein products. In the Philippines, only 1.7 million of the 3.1 million hectares were irrigated, according to the National Irrigation Administration (NIA). While some farmers use alternate wetting and drying as a preferred irrigation technique to minimize water use, this method can result in lower yields [
17].
The soil components cause fluctuations in both qualitative and quantitative features of rice yield. These counteract nitrogen surpluses or deficits, which may affect production. The three most important macronutrients for plants are nitrogen (N), phosphorus (P), and magnesium (Mg). Agricultural producers can make fertilizer practices more environmentally sound. Crop yields are determined by climate, crop type, soil status, and management techniques. Farmers can enhance production and soil health by adjusting fertilization practices to meet their crops’ nutrient requirements. This ensures sustainable agriculture [
8]. Maintaining soil quality by applying organic fertilizer and avoiding excessive use of synthetic fertilizers supports healthy rice growth, optimal yields, and long-term sustainability.
Increased rice production in the government’s agricultural sector plays a vital role in improving the country’s economic status and contributes significantly to both the economy and farmers’ livelihoods [
9,
12,
22]. Future challenges in sustainable rice production require effective management strategies and precise timing, including the adoption of alternate wetting and drying [
17]. The development of climate-resilient rice varieties requires substantial scientific effort to mitigate the effects of climate change, such as heat, cold, and drought [
10]. Enhancing rice’s heat stress tolerance through genetic improvements is particularly crucial; thus, several issues associated with heat stress in crops urgently require attention [
24].
In the Philippines, water scarcity and sustainability are pressing concerns for farmers. Implementing Alternative Wetting and Drying (AWD) is a practical approach that uses intervals of soil drying before re-irrigation, significantly reducing overall water use. Various water-saving methods for estimating rice crop water requirements enable farmers to improve the performance of irrigated rice agronomy [
2]. This method can save up to 30-40
Achieving long-term agricultural sustainability requires collaborative efforts, including farmer education and supportive government policies that promote the adoption of innovative water-saving technologies [
19,
21]. Addressing these challenges through climate-smart agricultural practices, balanced nutrient management, and advanced irrigation technologies can enhance rice yield without compromising global food security [
14,
15]. The integration of machine learning (ML) and new technologies provides valuable insights for agricultural planning. Forecasting crop yield is a complex but essential process, and ML can improve yield prediction depending on various factors [
23].
The Philippines is a leading rice producer in Southeast Asia, with a crucial rice sector that includes many traders and retailers, contributing significantly to the nation’s economy [
9]. The government emphasizes enhancing rice productivity and competitiveness to improve food security [
12]. A study of [
7] indicates that while productivity increases, there are no significant differences in ecotoxicity, eutrophication potential, or nutrient balance between irrigated and rainfed farms, suggesting that environmental impacts and data-driven agricultural methods merit further analysis.
Continuing the discussion on environmental and management influences on crop yields, the geographic trend analysis available [
13] estimates yields in the Philippines using data from PAGASA and the Philippine Statistics Authority. This approach helps faare not yet available at the time of submission, please indicateops, and improve harvesting efficiency. Crop forecasting not only increases production rates but also has the potential to benefit the Philippine economy, raising questions about the global applicability of these data-driven methods.
[
21]. Applying these analytical methods beyond the Philippines, this research seeks to enhance agricultural forecasting in Africa by analyzing five crops and compiling a comprehensive dataset. This data-driven framework simultaneously assesses crop yields across multiple crops, accounting for both positive and negative environmental factors, such as soil properties. For successful ML/AI modeling, it is essential to address overfitting and underfitting to achieve optimal generalization performance. In this regard, lessons from various regions continue to inform advancements in prediction methods.
Similarly, [
6] utilized various machine learning techniques—such as ANNs, DNNs, ELMs, RFs, and SVRs—for agricultural predictions, demonstrating their performance across different locations. The evaluation revealed that the second and third test areas showed competitive or superior performance compared to others. Notably, the best results were linked to methods that processed more information, although SVR struggled in zone A. Additionally, incorporating more features, such as air moisture, generally improved outcomes, especially in zone B. While ANN, RF, and ELM excelled with fewer input features, DNNs were superior when a larger feature set was available, highlighting the importance of tailoring algorithms to specific regions and feature configurations.
2. Materials and Methods
This research selects Spearman’s Rank Correlation as the primary statistical method. It assesses the strength and direction of relationships among soil fertility, nutrient levels, and crop yield in irrigated and rainfed regions. The researcher also aims to categorize the data, identify patterns, and make predictions from secondary data. These data are processed using machine learning techniques to develop models based on either the test or the training dataset.
Sampling Technique
The data come from the PalayStat System, a core initiative of the Philippine Rice Research Institute, overseen by the Socioeconomics Division [
18], and from the Soil Health Assessment National Soil Sampling [
3]. Due to heterogeneity in the dataset, which encompasses various levels of soil fertility, nutrient status, and crop-yield indicators, a stratified sampling method is employed. This technique divides the population into distinct subgroups (strata) based on characteristics such as irrigation type (irrigated and rainfed).
Table 1.
Data Attributes and their Values.
Table 1.
Data Attributes and their Values.
| Attributes |
Values |
| Soil Fertility (Irrigated) |
0.0, 9.59, 6.61 |
| Soil Fertility (Rainfed) |
25.00, 15.41, 18.39 |
| Nitrogen (Irrigated) |
21.79, 30.27, 5.91 |
| Nitrogen (Rainfed) |
22, 2,558, 153 |
| Phosphorus (Rainfed) |
00, 2,566, 152 |
| Phosphorus (Irrigated) |
4,394, 6,826, 5.052 |
| Magnesium (Irrigated) |
0, 0, 0, 81.0 |
| Magnesium (Rainfed) |
17,291, 0, 2, 4{,}447 |
| Rainfed Area (HA) |
24,167.00, 17,812.00 |
| Yield Rainfed Area (MT) |
2.74, 3.53, 3.23, 2.25 |
| Irrigated Area (HA) |
54,639.00, 77,589.00 |
Data Source
The import data was collected from various government offices. The primary source was the PalayStat System, a large project of the Philippine Rice Research Institute [
18]. Additional data came from the Soil Health Assessment National Soil Sampling [
3]. The researcher requested data by emailing the concerned agency. The request adhered to ethical guidelines. After approval, the agency sent an email containing the required research data.
Statistical Analysis
The researcher aims to analyze correlations in rainfed and irrigated rice crop production. Essential characteristics include soil fertility levels, nutrient content (nitrogen, phosphorus, magnesium), and other crop production metrics. The data includes both continuous and possibly ordinal types. To assess the direction and strength of relationships among crop yield, nutrient concentration, and soil fertility, we used Spearman’s Rank Correlation as the primary statistical test.
Pre-Processing
Effective data cleaning is essential for developing precise and credible crop-yield models. This is especially true when dealing with real agricultural data, which may have missing values, outliers, and inconsistencies. For this process, we used the Interquartile Rank, specifically the winsorizing method. This approach identifies extreme values and lessens their influence while keeping the original data points.
Model Selection and Training
The researchers applied several machine learning models, including Decision Trees, Random Forests, Support Vector Regression, and K-Nearest Neighbor. Its accuracy rate determined the best-fit model. We divided the data into 80 percent for training and 20 percent for testing. This approach helps avoid overfitting and improves generalization to new data.
Notation:
Like this,
and like this.
yi: Actual (observed) value of the i-th observation.
yi: Predicted value of the i-th observation.
y: The mean (average) of the observed values.
n: The number of training samples (observations).
To determine which model performs best, we compared values across various computational approaches and examined each model’s interpretability, computational efficiency, and risk of overfitting. This careful assessment provides a clear overview of each model’s strengths and weaknesses, helping guide the decision on the best approach.
Figure 1.
Learning Architecture on Crop Yield Variability.
Figure 1.
Learning Architecture on Crop Yield Variability.
In the final stage, to enhance production and resource estimation under varying environmental conditions, it is essential to evaluate contrasting predictions to assess the model’s effectiveness. This ensures precise, accurate, and dependable agricultural data.
3. Results
Table 2 shows that nitrogen application under irrigated conditions has a strong positive correlation (r = 0.773, p < 0.001), indicating that higher iris strongly positively correlated the higher nit-single-line caption, micronutrient application shows a moderately positive correlation, with greater irrigation associated with higher magnesium content (r = 0.346, p = 0.001). This shows that controlling irrigation positively affects micronutrient application. However, there was no significant correlation between irrigation and crop yield (r = 0.142, p = 0.199), irrigated area (r = 0.116, p = 0.296), or phosphorus application (r = 0.036, p = 0.749). The findings indicate that while irrigation positively impacts micro-nutrients and nitrogen, its effect on crop yield, irrigated area, and phosphorus application is negligible
Table 3.shows that plant density is strongly negatively correlated with both nitrogen fertilization (Rainfed Nitrogen Content, r = -0.562, p < 0.001) and phosphorus fertilization (Rainfed Potassium Content, r = -0.565, p < 0.001). As fertilizer rates rise, rainfed nitrogen and phosphorus concentrations drop, likely due to environmental limits or poor soil fertility. Fertilization was not significantly correlated with micronutrient supplementation (Rainfed Magnesium Content, r = 0.083, p = 0.454), rainfed area (Rainfed Area, r = 0.127, p = 0.251), or crop yield (Rainfed Yield, r = -0.149, p = 0.178). These results highlight the complexity of nutrient management under rainfed conditions and show that macronutrient fertilization has little effect on micronutrient practices.
Figure 2 Illustrates how four regression models, Decision Tree, Random Forest, SVR, and KNN, perform in forecasting rainfed area yield. Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R2) are significant regression measures that have been used to compare the models.
Figure 3 shows how well various regression models—Decision Tree, Random Forest, SVR, and KNN—perform in predicting the yield under irrigated conditions based on R-squared (R2), Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) values.
Table 4. compares model performance in irrigated and rainfed environments using four criteria: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²). The Decision Tree model performed worse in rainfed conditions, with MAE increasing from 0.3272 to 0.4568, MSE from 0.1868 to 0.4206, RMSE from 0.4322 to 0.6486, and R2 dropping from 0.8602 to 0.6648. The Random Forest model remained stable, showing high R2 values (0.8661 to 0.8095) and only modest error increases: MAE (0.3107 to 0.3607), MSE (0.1790 to 0.2391), and RMSE (0.4230 to 0.4890). The K-Nearest Neighbors (KNN) method showed inconsistent performance with MSE, RMSE, and R2 changing from 0.5973 to 0.4927, 0.7728 to 0.7020, and 0.5529 to 0.6074, while MAE increased from 0.5186 to 0.5804. Overall, the Random Forest model showed high accuracy and reliability. This is supported by [
2] and [
23], who note its ability to prevent overfitting and handle unbalanced datasets.
4. Conclusions
Soil nutrients, fertilization, and irrigation are strongly linked in both rainfed and irrigated systems. In irrigated areas, high nitrogen and micronutrient use were tied to adequate water. In rainfed systems, more fertilization was linked to less nutrient availability, likely from water scarcity. These findings suggest adjusting agricultural practices. In irrigated areas, manage water and nutrients together. In rainfed zones, prioritize water saving and soil quality.
The Random Forest model, which combines many decision trees and averages their outcomes, was the most powerful and dependable for both irrigated and rainfed zones. In rainfed areas, the Decision Tree model’s accuracy and fit, referring to how well the model matches observed data, dropped sharply. However, Random Forest still gave strong predictions with only slight changes in performance metrics. The Support Vector Regression (SVR) model, which predicts continuous values by finding the best-fitting line in a high-dimensional space, performed poorly in both cases. The K-Nearest Neighbors (KNN) model, which makes predictions based on the closest data points, showed inconsistent results: it improved in rainfed areas but had higher Mean Absolute Error (MAE), which measures average prediction errors. Based on MAE, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (a measure of prediction quality), Random Forest is the best model for prediction in these settings.
Socioeconomic and policy-oriented research remains essential. Assessing the cost-effectiveness of implementing Random Forest-based decision support systems in agricultural settings can inform strategies for large-scale adoption. Additionally, policy frameworks that promote water-saving technologies and soil health initiatives in rainfed regions facilitate sustainable agricultural development. The development of accessible digital tools that convert complex model outputs into actionable recommendations can enhance the practical application of data science, enabling farmers to make informed decisions in diverse environmental contexts.