The first stage involves the development of a sensor system for data acquisition of soil parameters. This system is designed to measure nitrogen (N), phosphorus (P), potassium (K) content, soil pH, moisture, electrical conductivity, and soil temperature. The collected data serves as the basis for determining real-time soil conditions.
The second stage focuses on simulations using seven machine learning models to train a dataset consisting of the parameters obtained from the sensor measurements, including soil type and rainfall data. The aim of this simulation is to evaluate the performance of each model and identify the best-performing model with high predictive accuracy. The selected model is then deployed in the developed application.
The third stage involves the development of the UG-AgroPlan system with IoT and Machine Learning. This study introduces UG-AgroPlan, a decision support application powered by machine learning algorithms, which provides real-time fertilizer recommendations based on soil sensor data. Unlike static models, UG-AgroPlan dynamically adjusts fertilization recommendations in response to changes in nutrient levels, pH, and environmental parameters. This application utilizes the K-Nearest Neighbors (KNN) algorithm, which has been deployed to support two main functions: first, providing crop recommendations based on soil parameter values acquired by the sensors; and second, offering fertilization recommendations when the parameter values decrease. This ensures that nutrient requirements are optimized to maintain soil fertility and support plant growth effectively.
The last stage is to evaluate the practical applicability of the UG-AgroPlan system, a direct cultivation trial was conducted using the Uzbekistan melon variety. This crop was selected due to its specific nutrient requirements and economic value. The trial aimed to assess the effectiveness of UG-AgroPlan in guiding fertilizer dosage per growth phase and improving overall input efficiency. The results of this field experiment serve as the foundation for discussing the system’s accuracy and its potential impact on sustainable crop management.
2.2.1. Impelementation Sensor System for Detection Soil Nutrient and Parameters
After implementing the prototype, a calibration process was conducted to compare the laboratory sensor’s measurements with those of the sensor system designed by the researchers for detecting soil nutrients. This sensor system is capable of measuring nitrogen (N), phosphorus (P), potassium (K), soil pH, moisture, temperature, and electrical conductivity. The calibration process ensured that the designed sensor system provided accurate and reliable measurements, which are crucial for generating precise crop recommendations for users.
The calibration process between laboratory measurements and the prototype sensor system was a critical step in ensuring the reliability and accuracy of the data used for further analysis. Initially, the prototype sensor exhibited a wide range of accuracy, with values varying between 34% and 95%, depending on the nutrient parameter measured. This variation highlighted the need for a robust calibration process to align the prototype’s readings with laboratory-standard measurements.
The accuracy (
A) for both the laboratory instrument and the prototype sensor was calculated using the formula:
Where:
is the measured output from our prototype sensor instrument.
is the reference standard value obtained from a certified laboratory.
During the calibration process, discrepancies between the prototype sensor’s measurements and laboratory results were systematically analyzed. A regression-based correction algorithm was applied to minimize errors, thereby aligning the prototype’s readings with the laboratory standards. This iterative process refined the sensor’s output for nitrogen (N), phosphorus (P), potassium (K), pH, moisture, and electrical conductivity. As a result of the calibration process, the prototype sensor’s accuracy improved significantly, achieving an overall accuracy ranging from 97.41% to 97.61% across all nutrient parameters (N, P, K) and other soil characteristics. This enhancement demonstrates the sensor’s capability to provide precise and reliable measurements comparable to laboratory standards.
The successful calibration process underscores the importance of integrating mathematical evaluation and laboratory standards in the development of agricultural sensor technologies. It ensures that the sensor system can serve as a dependable tool for data acquisition in precision agriculture applications. Calibration results with the accurary are considered valid, and the data is subsequently used as part of the data training for dataset or data testing. A comparison between the laboratory equipment and the prototype sensor results for soil nutrients and various soil parameters is also conducted to validate the sensor’s performance and reliability for further research.
2.2.2. Evaluation Training Model with 7 Models
Following the completion of the calibration process in the prototype sensor system, the next stage involves training the dataset. This process begins with the collection of data to be used for model training. Subsequently, data preprocessing is performed to prepare the raw data for analysis. At this stage, soil nutrient values (N, P, K) along with other parameters—such as pH, EC, moisture, and temperature—are labeled according to the associated plant types based on the measured nutrient levels and soil conditions.
Once preprocessing is completed, an outlier analysis is conducted to identify and address any anomalies that may compromise the accuracy of the machine learning model. Outlier data is either treated or removed according to predefined criteria to ensure that the resulting model is robust and capable of generating precise and reliable recommendations. This meticulous approach to training and preprocessing ensures the integrity and quality of the dataset, forming a solid foundation for further stages of the research.
The next phase of the training process involves applying machine learning algorithms to generate crop recommendations. The algorithms used include Support Vector Machine (SVM), Logistic Regression, Decision Tree, K-Nearest Neighbors (KNN), Gradient Boosting, Random Forest, and Gaussian Naive Bayes. This stage focuses on analyzing the relationships between soil parameters and nutrient values (N, P, K) to identify the most suitable crops for specific conditions. The process of training dataset simulation concludes with the selection of the most effective model, which accuracy the K-Nearest Neighbors (KNN) are 98%, deployed within the UG-AgroPlan system based on the performance metrics of each algorithm (between 95% until 100%).
The recommendation of crops using the K-Nearest Neighbors (KNN) algorithm is based on the nutrient content and soil parameters measured through sensors. The KNN model utilizes a labeled dataset comprising historical data of soil nutrient levels (N, P, K), soil pH, moisture, and temperature, along with the types of crops that thrive under those specific conditions. When new sensor data is inputted into the UG-AgroPlan system, the KNN algorithm identifies the "k" nearest neighbors in the dataset—those with the most similar soil nutrient profiles and environmental conditions.
The algorithm calculates the Euclidean distance between the input data and each entry in the training dataset. The shortest distances are considered the nearest neighbors, and the majority class (crop type) among these neighbors determines the recommendation. For example, if the sensor readings indicate high nitrogen levels, moderate phosphorus, low potassium, and neutral pH, the algorithm compares these values with historical data to recommend a crop that thrives under such conditions, such as maize or Mango. After the data training phase is finalized, seven models are evaluated according to the accuracy levels they achieve, as presented in
Table 1.
Table 1 presents the accuracy comparison of seven machine learning models applied for crop classification based on soil parameters. Among these, Random Forest and XGBoost achieved perfect accuracy (100%). However, despite the high scores, these models pose potential overfitting risks due to their complexity and lack of interpretability. In contrast, the K-Nearest Neighbors (KNN) algorithm, with an accuracy of 98%, offers a simpler and more interpretable approach. It was therefore selected as the primary model in the UG-AgroPlan system due to its robustness, efficiency, and suitability for real-time classification in resource-constrained environments. Additionally, the KNN model ensures adaptability by updating the dataset with new soil readings and successful crop outcomes, enhancing its accuracy over time. The recommendation process enables farmers to make informed decisions, optimizing soil productivity and crop yield.
The next stage of the process uses the UG-AgroPlan application, developed to assist farmers in receiving crop recommendations tailored to the specific soil conditions of the land they cultivate.
2.2.3. UG-AgroPlan System: Integrated Our Prototype with Artificial Intelligent
In
Table 1, the performance evaluation of various machine learning algorithms for crop recommendation based on soil parameters revealed differences in accuracy. Logistic Regression and Decision Tree models both reached an accuracy of 95.00%, whereas SVM and K-NN achieved higher results at 98.00%, indicating strong effectiveness in modeling complex data patterns. Naïve Bayes further increased the accuracy to 99.00%, and both Random Forest and XGBoost reached a perfect accuracy of 100.00%, highlighting excellent classification strength and outstanding results. Overall, the average accuracy obtained from the seven models was 97.86%.
Several studies emphasize that extremely high accuracy results—such as perfect scores—may indicate potential overfitting, particularly when models are applied to datasets that are limited in size or lack diversity. In our study, this condition is observed in models such as Naïve Bayes, Random Forest, and XGBoost, which achieved exceptionally high accuracy, even reaching 100%. Although these models demonstrate excellent classification performance, such results should be interpreted with caution, as they may compromise the model’s generalizability to unseen data.
Based on findings by [
44,
45], adopting average accuracy and near-perfect but not absolute accuracy values has been considered optimal, as it balances efficiency and accuracy. Similarly, emphasize that this strategy ensures consistent model performance and adaptability, particularly in real-world applications where data variability is significant [
46]. This perspective aligns with the goal of achieving reliable predictions while avoiding overfitting, making it a practical choice for precision agriculture. To ensure generalization, models with accuracy values above the average (97.86%) but below the perfect 100.00% were prioritized for implementation. In the proposed web-based application, the K-Nearest Neighbors (KNN) algorithm as the core model for providing crop recommendations based on input features such as soil parameters, climate data, and historical crop yields. Therefore, this model was selected due to its ability to effectively analyze and classify data based on soil parameters and nutrient values, ensuring accurate and reliable crop suggestions for users. This approach balances high performance with practical reliability, minimizing overfitting risks and maintaining robust predictions.
The subsequent step involves providing crop or fertilizer recommendations based on data collected by soil sensors and processed through the web-based UG-AgroPlan system system, which has already been deployed. Soil sensors, such as the NPK Sensor, detect nutrient levels and soil parameters in a specific agricultural field [
47]. The collected data is transmitted to the UG-AgroPlan system system, with KNN Models where the machine learning system analyzes it to recommend the most suitable crops based on current soil conditions.
The K-Nearest Neighbors (KNN) algorithm is employed to recommend the most suitable crops based on soil nutrient levels and other parameters. The process begins by considering the sensor data, such as nitrogen (N), phosphorus (P), potassium (K) levels, pH, moisture, and temperature. These parameters are used to represent the data point for the agricultural field in a multidimensional space.
The KNN algorithm operates by calculating the distance between new soil data inputs and all data in the dataset. To determine the similarity between current soil conditions and existing crop data, the system uses the Euclidean distance formula in the K-Nearest Neighbors (KNN) algorithm. The distance between the input soil feature vector and each training data point is calculated as:
Where
is the value of the Euclidean distance for each raw data,
n is the total number of parameters,
represent the values of the soil parameters of the new input from the prototype sensor, and
denotes the values of the soil parameters from the data set. The variables are the values of the soil parameter
(for example
for nitrogen (N),
for phosphorus (P),
for potassium (K),
for pH,
for moisture,
for temperature and
for Rain Intensity) in the input and training sample, respectively. The KNN algorithm selects the top-k crops with the smallest distance as the most suitable for the current soil profile.
After calculating the distances, the algorithm identifies the K nearest neighbors, where K is the number of data points with the smallest distance from the new input. The majority class or crop label of these nearest neighbors is then used to determine the crop recommendation. In this study, the system uses , balancing accuracy and computational efficiency. For example, if the new input data (output sensor) closely matches the data for "Corn" in the dataset, the algorithm will recommend "Corn" as the most suitable crop.
The dataset utilized in this study includes soil parameters for various crops under optimal growth conditions. The test results demonstrated that KNN effectively provides accurate crop recommendations by considering the non-linear relationships between soil parameters and crop types. For instance, in a trial scenario, new soil data with parameters: N=50 ppm, P=30 ppm, K=20 ppm, pH=6.5, Moisture=40%, Temperature=25 ∘C, rainfall=120 mm were processed using KNN. The distance results to three crops in the dataset were: Corn: 5.39, Mango: 3.61, Rice: 10.34. With , the nearest neighbors were Mango and Corn, leading the algorithm to recommend "Mango" as the most suitable crop.
The implementation of the KNN algorithm yielded satisfactory results for soil data-based crop recommendation applications. With its simple yet effective capabilities, KNN can accommodate various data variability, making it an ideal choice to support precision agriculture. This study also adopted a rigorous evaluation approach to ensure that the model provides recommendations that are not only accurate but also practically relevant.
This method ensures that the recommendation is tailored to the specific soil conditions of the field. Additionally, if adjustments to nutrient levels are required to optimize the soil for the recommended crop, the system provides detailed guidance for such adjustments. By leveraging KNN, the UG-AgroPlan system system offers accurate and actionable crop recommendations based on real-time soil data.
In addition, the system offers flexibility by providing recommendations for nutrient adjustments when the soil is intended for crops other than those initially suggested by the AI-based machine learning model. If the user chooses to plant alternative crops, the UG-AgroPlan system system generates fertilizer recommendations based on soil data analysis, including nitrogen, phosphorus, potassium (NPK) requirements, and other key parameters such as soil pH, moisture, and temperature. For instance, if a farmer opts to grow a crop that requires higher nitrogen levels than currently available in the soil, the application will recommend appropriate nitrogen supplementation. Similarly, if phosphorus levels are too high for the selected crop, the system will suggest adjustments to optimize soil conditions, ensuring successful cultivation.
Calculating surpluses or deficits of nutrients (also other parameters) begins by defining the standard nutrient requirements of the crop to be cultivated, such as nitrogen (N), phosphorus (P), potassium (K), soil pH, moisture, rainfall, and temperature. These standards serve as benchmarks for comparison. Subsequently, soil sensors collect real-time data on the actual nutrient content and soil conditions in the targeted area. To determine the suitability of current soil conditions for a target crop, the UG-AgroPlan system performs a comparative analysis between sensor readings and the reference average values for each nutrient and parameter. The difference is calculated by subtracting the sensor value from the standard reference (mean) value:
Where
X are mean of represents each parameter (e.g., N, P, K and soil parameters). The interpretation of the result is as follows:
If , this indicates a deficiency, meaning the parameter is below the optimal range and must be increased—for nutrients such as nitrogen, phosphorus, and potassium, this triggers a fertilizer recommendation.
If , this indicates a surplus, meaning the parameter is above the recommended level. In such cases, no fertilizer application is advised, and the system may suggest delaying further nutrient input to prevent over-fertilization.
That rule-based logic allows the UG-AgroPlan system to provide dynamic and precise recommendations, particularly for macronutrients (N, P, K), which have a significant influence on plant growth. Adjustments for other environmental parameters such as pH, temperature, and moisture are noted but not automatically corrected within the current system and require agronomic intervention or separate ecosystem management.
To illustrate the application, consider a set of real soil data with the following values: N = 50 ppm, P = 30 ppm, K = 20 ppm, pH = 6.5, Moisture = 40%, Temperature = 25°C, and Rainfall = 120 mm. When the farmer intends to cultivate rice (Oryza sativa) on this field, it becomes necessary to compare these soil conditions against the optimal nutrient and environmental requirements for rice, as presented in
Table 2. This comparison enables the UG-AgroPlan system to identify which nutrients require adjustment through fertilization, while also highlighting non-nutrient parameters that may limit crop suitability and should be managed through external interventions such as irrigation, drainage, or microclimate modification.
In this study, adjustment of soil conditions is carried out using Equation (
3), for example demonstrate Nitrogen of the nutrient adjustment process, consider the nitrogen (N) content detected by the soil sensor. In this case, the sensor reading shows a nitrogen concentration of 50 ppm, while based on the crop requirement listed in
Table 2, the acceptable nitrogen range for optimal rice cultivation is from 60 ppm to 99 ppm. For standardization in this study, the system uses the average required value as the reference target, calculated as:
The difference between the average target value and the actual sensor reading is then computed to determine the nitrogen deficit by Equation (
3):
The result indicates that the soil is lacking 29.50 ppm of nitrogen compared to the standard requirement for rice. This value is then used by the UG-AgroPlan system to calculate the appropriate fertilizer dosage needed to close the nitrogen gap.
For the crop rice, based on the analyzed data, the following are the recommended adjustments for nutrients and soil parameters for rice cultivation:
The nitrogen (N) level needs to be increased by 29.50 ppm from the current level.
The phosphorus (P) level needs to be increased by 17.50 ppm to reach optimal levels.
The potassium (K) level needs to be increased by 20.00 ppm.
The soil temperature needs to be reduced by -1.51°C to reach optimal conditions.
Soil humidity needs to be increased by 42.55%.
The soil pH needs to be adjusted with a reduction of -0.06 to match the optimal pH for apple cultivation.
Rainfall also needs to be increased by 120.56 mm to meet water requirements.
The focuses on balancing the three primary macronutrients: nitrogen (N), phosphorus (P), and potassium (K). The adjustment process is executed by applying fertilizers specifically formulated to address deficiencies or surpluses of these macronutrients, ensuring the soil meets the nutritional requirements of the intended crop. Other environmental parameters—such as pH, moisture, temperature, and rainfall—are not yet adjusted within the current system framework and would require additional treatment strategies and ecosystem management practices specific to the agricultural land’s characteristics.
To convert the nutrient deficit from ppm to kilograms per hectare (kg/ha), the following general formula is used:
Where the soil depth of 20 cm, representing the effective root zone and a bulk density of 1.3 g/cm³, which is the typical average for mineral soils.
These calculated values represent the recommended amount of nutrients that need to be added to the soil via fertilization in order to meet the average optimal requirements for rice cultivation, and are used by UG-AgroPlan to generate precise fertilizer recommendations in the field.
To address nutrient imbalances, the system provides targeted recommendations. Excess nutrients can be managed by reducing fertilizer application or employing soil leaching techniques, while deficits are addressed by adding specific amounts of fertilizer. For example, if the deficit for nitrogen is 20 kg/ha, and Urea (46% nitrogen content) is used, the required amount of Urea is calculated as 20/0.46≈43.5 kg/ha. This process ensures precise nutrient management, and when integrated into the UG-AgroPlan system system, it automates recommendations for maintaining optimal soil conditions for crop cultivation.