3. Results
Tomato cultivation is one of the most developed agricultural sectors in Turkey. Due to the fact that tomatoes are a staple food widely consumed by the population, ensuring high productivity is of great importance. Therefore, this study aims to detect and classify four common diseases affecting tomato plants. The augmented dataset presented in
Table 3.1 includes the training and testing accuracy rates, as well as the execution times for 21 different deep learning algorithms. Upon examining
Table 4, it is observed that the NasNet-Large algorithm achieved the highest accuracy, with a training accuracy of 88.07% and a testing accuracy of 87.23%. However, this performance came at a significant computational cost, with a training time of 2729 minutes and 39 seconds, indicating a disproportionate time-to-performance ratio.
Considering both time and performance metrics, the ResNet-50 algorithm demonstrated a more balanced profile, achieving a training accuracy of 88.07% and a testing accuracy of 86.85%, with a training duration of 625 minutes and 25 seconds. This indicates that ResNet-50 offers a favorable trade-off between computational efficiency and classification performance.
Moreover, the EfficientNet-b0 algorithm yielded 86.53% training accuracy and 85.76% testing accuracy, while completing its training in just 140 minutes and 51 seconds. This highlights its rapid execution time combined with high classification accuracy, making it a promising candidate for time-sensitive applications.
Based on the evaluations and as illustrated in Figure 5 and Table 4, the CNN algorithm, with a training accuracy of 66.17% and a testing accuracy of 60.23%, and the DarkNet-19 algorithm, with a training accuracy of 62.67% and a testing accuracy of 57.16%, were identified as the two least effective models in terms of classification performance.
Table 4.
Training and test accuracy rates and run times of deep learning algorithms.
Table 4.
Training and test accuracy rates and run times of deep learning algorithms.
| No |
Algorithms |
Training Accuracy Rate |
Test Accuracy Rate |
Time |
| 1 |
Nasecet-Large |
88,07 |
87,23 |
2729min 39sec |
| 2 |
Resecet-50 |
88,07 |
86,85 |
625 min 25sec |
| 3 |
DenseNet-201 |
86,59 |
85,82 |
604min 50sec |
| 4 |
EfficientNet-b0 |
86,53 |
85,76 |
140min 51sec |
| 5 |
Placed365-GoogLeNet |
86,36 |
85,76 |
273 min 55sec |
| 6 |
Inception-v3 |
85,58 |
84,31 |
359min 54sec |
| 7 |
Xception |
84,8 |
83,15 |
733min 17sec |
| 8 |
Nasecet-Mobile |
84,18 |
82,53 |
214min 7sec |
| 9 |
Resecet-101 |
84,1 |
82,18 |
1371min 33sec |
| 10 |
Inception-Resecet-v2 |
84,02 |
80,94 |
594min 8sec |
| 11 |
GoogleNet |
82,93 |
82,17 |
140min 39sec |
| 12 |
ShuffleNet |
82,7 |
80,75 |
60min 2sec |
| 13 |
DarkNet-53 |
80,75 |
77,6 |
566min 44sec |
| 14 |
MobileNet-v2 |
79,58 |
77,14 |
319min 33sec |
| 15 |
AlexNet |
79,27 |
76,37 |
95min 34sec |
| 16 |
SquuzeNet |
77,71 |
74,31 |
110min 39sec |
| 17 |
Resecet-18 |
77,63 |
74,34 |
187min 36sec |
| 18 |
VGG-19 |
75,14 |
72,49 |
368min 19sec |
| 19 |
VGG-16 |
68,28 |
69,60 |
382min 26sec |
| 20 |
CNN |
66,17 |
60,23 |
403 min 13sec |
| 21 |
DarkNet-19 |
62,67 |
57,16 |
262min 26sec |
Figure 5.
Training and test accucary ratios of deep learning algorithms.
Figure 5.
Training and test accucary ratios of deep learning algorithms.
Figure 6 presents the confusion matrices and classification accuracy rates for the four most successful deep learning algorithms—NasNet-Large, ResNet-50, DenseNet201, and EfficientNet-b0—in the context of tomato disease classification. These matrices provide a comparative overview of each model’s ability to distinguish between disease classes and demonstrate their overall predictive effectiveness.
Figure 6.
Disease classification accuracy rates of the top 4 deep learning methods using test comparison matrices (a) NasNet-Large (b) ResNet-50 (c) DenseNet201 (d) EfficientNet-b0.
Figure 6.
Disease classification accuracy rates of the top 4 deep learning methods using test comparison matrices (a) NasNet-Large (b) ResNet-50 (c) DenseNet201 (d) EfficientNet-b0.
Table 5 shows the accuracy rates of 21 deep learning methods in disease classification using test comparison matrices.
As shown in Table 5, Late Blight Disease (Mildiyö) was classified with over 80% accuracy by 17 models, Early Blight by 8 models, Tomato Gray Mold by 9 models, and Healthy Tomato class by 20 models. In contrast, the Bacterial Canker and Spot Disease class was generally classified with relatively lower accuracy across most models. Late Blight achieved the highest classification accuracy of 92.2% using the Inception-ResNet-v2 algorithm, while the lowest accuracy of 34.68% was observed with VGG-16. Early Blight was best classified with accuracies ranging between 87% and 89% by Place365-GoogLeNet, ResNet-50, and EfficientNet-b0, whereas CNN and DarkNet-19 performed poorly, with accuracies of 41.75% and 44.33%, respectively. Tomato Gray Mold was best classified by the VGG-16 model, with accuracy values of 91.54% and 90.05%. Bacterial Canker and Spot Disease reached its highest accuracies with VGG-16 and VGG-19, at 91.54% and 78.61%, respectively; however, for most models, the accuracy in this class remained below 60%. For the Healthy Tomato class, very high accuracy levels exceeding 98% were achieved by Inception-ResNet-v2, DenseNet-201, Xception, and NasNet-Mobile.
Table 5.
Accuracy rates of deep learning methods for disease classification with test data.
Table 5.
Accuracy rates of deep learning methods for disease classification with test data.
| No |
Algorithms |
Mildew Disease |
Early Leaf Blight |
Powdery Mildew |
Tomato Bacterial Cancer and Spot |
Normal Healthy Tomato |
Test Data Accuracy Rate |
| 1 |
NasNet-Large |
86,13 |
86,60 |
89,55 |
76,56 |
97,30 |
87,23 |
| 2 |
ResNet-50 |
89,31 |
89,18 |
85,57 |
73,21 |
97,00 |
86,85 |
| 3 |
DenseNet-201 |
81,21 |
84,02 |
87,06 |
77,99 |
98,80 |
85,82 |
| 4 |
EfficientNet-b0 |
84,68 |
87,11 |
81,09 |
77,99 |
97,90 |
85,76 |
| 5 |
Placed365-GoogLeNet |
81,5 |
89,69 |
83,08 |
77,51 |
97 |
85,76 |
| 6 |
Inception-v3 |
82,66 |
81,44 |
82,09 |
76,56 |
98,80 |
84,31 |
| 7 |
Xception |
83,82 |
80,41 |
77,11 |
75,60 |
98,80 |
83,15 |
| 8 |
NasNet-Mobile |
82,37 |
73,71 |
84,08 |
73,68 |
98,80 |
82,53 |
| 9 |
ResNet-101 |
85,55 |
80,41 |
79,10 |
67,94 |
97,90 |
82,18 |
| 10 |
Inception-ResNet-v2 |
92,2 |
78,87 |
65,17 |
69,38 |
99,1 |
80,94 |
| 11 |
GoogleNet |
75,43 |
78,87 |
90,05 |
68,9 |
97,6 |
82,17 |
| 12 |
ShuffleNet |
83,82 |
79,90 |
78,61 |
64,11 |
97,30 |
80,75 |
| 13 |
DarkNet-53 |
90,46 |
70,10 |
74,63 |
57,89 |
94,89 |
77,6 |
| 14 |
MobileNet-v2 |
81,79 |
74,23 |
74,63 |
58,37 |
96,70 |
77,14 |
| 15 |
AlexNet |
83,53 |
65,46 |
74,63 |
61,24 |
97,00 |
76,37 |
| 16 |
SquuzeNet |
89,31 |
72,16 |
67,16 |
50,72 |
92,19 |
74,31 |
| 17 |
ResNet-18 |
86,42 |
72,16 |
67,16 |
51,67 |
94,29 |
74,34 |
| 18 |
VGG-19 |
75,72 |
60,31 |
78,61 |
52,63 |
95,20 |
72,49 |
| 19 |
VGG-16 |
34,68 |
65,98 |
91,54 |
60,29 |
95,50 |
69,60 |
| 20 |
CNN |
85,84 |
41,75 |
48,26 |
34,93 |
90,39 |
60,23 |
| 21 |
DarkNet-19 |
88,73 |
44,33 |
62,69 |
11,96 |
78,08 |
57,16 |
The graphical representation of classification accuracy rates for each deep learning method is presented in Figure 7.
Figure 7.
Disease classification accuracy rates of the deep learning method.
Figure 7.
Disease classification accuracy rates of the deep learning method.
Based on the 5 most successful algorithms in terms of overall classification, Mildy mildew disease with ResNet-50 algorithm is 89.3%, leaf blight with ResNet-50 algorithm is 89.2% and Placed365-GoogLeNet algorithm is 89.7%, It is seen that NasNet-Large algorithm for Tomato Bacterial Mildew disease is the model with the highest classification rates with 89.5%, DenseNet-201 algorithm for Tomato Bacterial Cancer and Stain disease with 77.99% and DenseNet-201 algorithm for Healthy Tomato disease with 98.8% accuracy rates.
Considering the training and test success rates from Figure 6 and Figure 7, NasNet-Large, ResNet-50, DenseNet-201, EfficientNet-b0, Placed365-GoogLeNet algorithms are the 5 most successful deep learning methods.
Table 6.
Performance comparison of classification results obtained by using 100 features obtained by feature selection methods.
Table 6.
Performance comparison of classification results obtained by using 100 features obtained by feature selection methods.
| Deep feature extraction methods |
MRMR |
Chi2 |
ReliefF |
| Machine Learning Model |
Training Accuracy Rate |
Test Accuracy Rate |
Machine Learning Model |
Training Accuracy Rate |
Test Accuracy Rate |
Machine Learning Model |
Training Accuracy Rate |
Test Accuracy Rate |
|
| NasNet-Large |
Fine KNN |
83,53 |
86,70 |
Fine KNN |
82,40 |
87,40 |
Fine KNN |
84,30 |
86,60 |
|
| Fine Gaussian SVM |
77,70 |
80,80 |
Cubic SVM |
79,50 |
81,90 |
Cubic SVM |
78,40 |
83,20 |
|
| Wide Neural Network |
72,80 |
75,60 |
Wide Neural Network |
73,10 |
75,80 |
Wide Neural Network |
72,60 |
75,40 |
|
| ResNet-50 |
KSubspace KNN |
88,3 |
88,8 |
Subspace KNN |
86,7 |
89,4 |
Subspace KNN |
87,8 |
90,7 |
|
| Cubic SVM |
83,4 |
84,4 |
Cubic SVM |
82 |
83,7 |
Cubic SVM |
83 |
86,2 |
|
| Wide Neural Network |
77,8 |
79,3 |
Wide Neural Network |
77,6 |
78,6 |
Wide Neural Network |
76,8 |
79,1 |
|
| DenseNet-201 |
Subspace KNN |
86,6 |
90,8 |
Subspace KNN |
86,3 |
90,4 |
Subspace KNN |
87,5 |
90,2 |
|
| Cubic SVM |
82,6 |
84,,6 |
Cubic SVM |
81,9 |
86,2 |
Cubic SVM |
82,9 |
84,9 |
|
| Wide Neural Network |
76 |
78,9 |
Wide Neural Network |
76 |
78,4 |
Wide Neural Network |
77 |
79,8 |
|
| EfficientNet-b0 |
Subspace KNN |
89,3 |
91,7 |
Fine KNN |
88,4 |
92 |
Subspace KNN |
89,6 |
91,2 |
|
| Cubic SVM |
84,7 |
85,4 |
Cubic SVM |
83,6 |
86,6 |
Cubic SVM |
85,4 |
87,1 |
|
| Wide Neural Network |
79,3 |
77,7 |
Wide Neural Network |
78 |
79,4 |
Wide Neural Network |
78,9 |
81 |
|
| Placed365-GoogLeNet |
Subspace KNN |
84,4 |
86,7 |
Fine KNN |
82,8 |
85,4 |
Subspace KNN |
85,6 |
87,9 |
|
| Cubic SVM |
79,4 |
83,1 |
Cubic SVM |
79,6 |
82,4 |
Cubic SVM |
80,5 |
81,2 |
|
| Wide Neural Network |
73,3 |
75 |
Wide Neural Network |
73,8 |
76,2 |
Wide Neural Network |
73,7 |
74,4 |
|
When the results obtained in Table 4 are evaluated and the heat map in Figure 8 is created, it is seen that the 5 deep learning algorithms with the highest test accuracy are NasNet-Large, ResNet-50, DenseNet-201, EfficientNet-b0 and Placed365-GoogLeNet models. Using the algorithms with the best results, 1000 features were extracted with the feature extraction process. After feature extraction, the number of features was reduced to 100 by using MRMR, Chi2, ReliefF, ANOVA and KrukalWallis methods to select the most successful features in classification. Since the success rates of the classification processes using the datasets obtained with ANOVA and KrukalWallis feature extraction methods remained at low values compared to other methods, the results of these two feature selection algorithms were not shared. This new data set was divided into 80% Training and 20% Test, and reclassification was performed with machine learning algorithms by applying 5-fold cross-validation method. Table 5 shows the results of the 3 machine learning algorithms that produced the best classification results using 100 features.
Figure 8.
Heat map of the 5 deep learning algorithms with the highest test accuracy.
Figure 8.
Heat map of the 5 deep learning algorithms with the highest test accuracy.
Table 7.
Training and test accuracy results obtained with 100 features.
Table 7.
Training and test accuracy results obtained with 100 features.
| Deep feature extraction methods |
Machine Learning Model |
MRMR Training Accuracy Rate |
Test Accuracy Rate |
Machine Learning Model |
Chi2 Training Accuracy Rate |
Test Accuracy Rate |
Machine Learning Model |
ReliefF Training Accuracy Rate |
Test Accuracy Rate |
| NasNet-Large |
Fine KNN |
83,53 |
86,70 |
Fine KNN |
82,40 |
87,40 |
Fine KNN |
84,30 |
86,60 |
| ResNet-50 |
KSubspace KNN |
88,3 |
88,8 |
Subspace KNN |
86,7 |
89,4 |
Subspace KNN |
87,8 |
90,7 |
| DenseNet-201 |
Subspace KNN |
86,6 |
90,8 |
Subspace KNN |
86,3 |
90,4 |
Subspace KNN |
87,5 |
90,2 |
| EfficientNet-b0 |
Subspace KNN |
89,3 |
91,7 |
Fine KNN |
88,4 |
92 |
Subspace KNN |
89,6 |
91,2 |
| Placed365-GoogLeNet |
Subspace KNN |
84,4 |
86,7 |
Fine KNN |
82,8 |
85,4 |
Subspace KNN |
85,6 |
87,9 |
| Ortalama |
86,43 |
88,94 |
|
85,32 |
88,92 |
|
86,96 |
89,32 |
|
| Standart Sapma |
2,20 |
2,06 |
|
2,33 |
2,31 |
|
1,84 |
1,77 |
|
| Varyans Analiz |
4.86 |
4.23 |
|
5.45 |
5.32 |
|
3.38 |
3.13 |
|
When Table 6 is examined, it is seen that the Subspace KNN classifier provides the highest test accuracies in almost all data sets with 100 features, the Cubic SVM classifier is generally good but does not reach as high results as the Subspace KNN classifier, and the Wide Neural Network classifier generally gives the lowest test accuracies. It is also seen that the Subspace KNN classifier generally provides the highest test accuracy rates in the DenseNet-201 and ResNet-50 deep learning algorithms.
Table 7 compares training and test accuracy rates based on 100 features selected from a set of 1000 deep features using various feature selection methods, including MRMR, Chi2, and ReliefF, applied to different deep feature extraction techniques. According to this table, the highest test accuracy of 92% was achieved when features extracted from the EfficientNet-b0 model were selected using the Chi2 method and classified using the Fine K-Nearest Neighbors (Fine KNN) machine learning algorithm.
To determine the most consistent classification performance, the mean, standard deviation, and variance of the obtained results were calculated. Based on these calculations, the ReliefF method demonstrated the most stable performance, with an average test accuracy of 89.32%.
Among the deep feature extraction models examined, features derived from EfficientNet-b0 consistently yielded the highest accuracy rates across all feature selection methods. Features extracted from DenseNet-201 and ResNet-50 also showed strong classification performance with high accuracy values. In contrast, features derived from the NasNet-Large model resulted in relatively lower classification accuracy, around 86%, compared to other models.
When considering standard deviation and variance analyses, the ReliefF method exhibited lower standard deviation values in both training and test accuracies, indicating more stable performance. In contrast, the Chi2 method showed greater variability in accuracy rates.
As a result, it can be said that the combination that produces the most stable results in the classifications using 100 features is the classification of the features extracted from the EfficientNet-b0 model with machine learning algorithms by selecting them with the ReliefF method.
Since the best result was obtained using the EfficientNet-b0 algorithm, the training and test confusion matrices based on the Chi2 feature selection method applied to this algorithm are presented in Figure 9. As seen in Figure 9, when the number of features was reduced using the Chi2 method with the EfficientNet-b0 algorithm, a training accuracy of 88.4% and a test accuracy of 92% were achieved.
Figure 9.
EfficientNet-b0 algorithm Chi2 method training and test comparison matrices (a) Training comparison matrix (b) Disease classification accuracy rates using training comparison matrices (c) Test comparison matrix (d) Disease classification accuracy rates using test comparison matrices.
Figure 9.
EfficientNet-b0 algorithm Chi2 method training and test comparison matrices (a) Training comparison matrix (b) Disease classification accuracy rates using training comparison matrices (c) Test comparison matrix (d) Disease classification accuracy rates using test comparison matrices.
In this study, a five-class classification was performed, covering the most common tomato diseases—late blight, early blight, gray mold, bacterial cancer, and healthy tomatoes. The dataset used in the study is entirely original, consisting of 6,414 images collected from three groups: leaves, red tomatoes, and green tomatoes, captured in the production field. These data were processed using 21 deep learning algorithms, and their results were evaluated. Among these, the best-performing models were identified as NasNet-Large, ResNet-50, DenseNet201, EfficientNet-b0, and Placed365-GoogLeNet.
From each of the selected models, 1,000 features were extracted. Feature selection algorithms (MRMR, Chi2, ReliefF, ANOVA, and Kruskal-Wallis) were then applied to select 100 features from each set. The newly constructed datasets were split into 80% training and 20% test sets, and models were trained using 5-fold cross-validation with various machine learning algorithms. As a result, a total of 51 combinations were evaluated, comprising 21 deep learning algorithms and 5 feature selection methods.
Among these combinations, the highest performance was achieved with the EfficientNet-b0 algorithm, where the Chi2 feature selection method yielded a training accuracy of 88.4% and a test accuracy of 92%. Furthermore, it can be concluded that the most stable results in classifications using 100 features were obtained when features extracted from EfficientNet-b0 were selected using the ReliefF method and classified with machine learning algorithms.
In conclusion, this study demonstrates the potential and applicability of deep learning-based disease diagnosis systems in the agricultural sector. More effective integration of technology in agricultural practices can enhance both production processes and product quality, thus contributing to the widespread adoption of sustainable agricultural practices. In this context, the study can be considered an important step that may inspire future research.