1. Introduction
Diabetes is a debilitating and persistent condition characterized by insufficient production or use of insulin in the body. The count of individuals with diabetes overall is supposed to increase from 536.6 million in 2021 to 783.2 million individuals aged 20 to 79 by 2045 [
1]. Type 1 and Type 2 are the two forms of diabetes. Retinopathy usually strikes 60% of type 2 diabetes patients and nearly all type 1 diabetes patients in the 20 years after diagnosis. The worldwide occurrence of diabetes has been reported to be 9.8% [
2].
Diabetic retinopathy (DR) is the main underlying factor behind worldwide vision loss. It impacts around 33% of individuals who have been diagnosed with diabetes. Consequently, timely diagnosis and therapy of DR can significantly diminish the likelihood of permanent visual impairment, floaters in the eye, and eventual loss of eyesight. Within 20 years after being diagnosed, retinopathy impacts around 60% of patients with type 2 diabetes and nearly all those with type 1 diabetes. Nevertheless, DR can progress without exhibiting any symptoms until it poses a risk to vision.
Manually inspecting and determining the severity of DR requires a significant amount of time that is prone to errors. Precise evaluation of DR’s impact requires highly experienced ophthalmologists’ skills [
3]. The screening approach depends on an expert’s evaluation of color fundus photographs, misdiagnosing several instances and causing a delay in treatment [
4].
Medical imaging has employed deep-learning algorithms for disease screening and detection. Diverse approaches are used to identify DR at its earlier stages, with machine learning playing a significant part in this effort. Artificial intelligence employs machine learning techniques to enhance the learning process independently, eliminating the requirement for direct human interaction. Deep learning is often used in predicting medical conditions [
5]. Many fields, including computer vision, medical image analysis, and fraud detection systems, offer advantages for deep learning and machine learning [
6,
7].
Deep learning extracts fundamental features from massive datasets utilizing feature extraction, which data mining tools assess and predict. These algorithms anticipate optimal, exact outputs, simplifying early illness forecasts and saving millions of lives [
8]. Computer vision and medical image processing applications, notably CNNs [
8,
9,
10,
11,
12], have extensively used deep learning techniques.
Most conventional techniques cannot reveal hidden characteristics and their relationships. Instead of learning valuable information, it retains useless features like size, brightness, and rotation, which decreases performance and accuracy [
13]. Another concern is that understanding the little unnecessary information in retinal images causes the model’s ability to generalize and adapt to other datasets [
14].
Non-Proliferative (NonPrDR) and Proliferative (PrDR) are the two primary stages of DR. The NonPrDR refers to the initial stages of DR and can be mild, moderate, and severe. Conversely, PrDR is the advanced stage of the illness. Many people do not get symptoms until the disease progresses to the PrDR stage [
15].
Figure 1 shows several stages of retinal imaging with DR from the APTOS dataset [
16].
The combination of machine learning and deep learning indicates the potential to transform the categorization of DR [
17]. However, there is an increasing demand for automated technologies that can efficiently evaluate vast datasets while maintaining high levels of accuracy due to the growing volume of collected patient data. The demand for efficient and cost-effective techniques for early identification of diabetic retinopathy is of utmost importance [
18]. As each dataset varies in demographics and clinical situations, this research is essential to the study of DR because it employs a unique strategy that combines two different datasets to improve the model’s capacity to generalize the unseen DR images.
Our work aimed to develop a deep learning model that successfully predicts DR images using a CNN model. The model was trained, validated, and tested on APTOS-DDR fundus datasets. The images were preprocessed using CLAHE to perform histogram equalization and then to DWT to extract the coefficients. The CNN model can extract the relevant information or features from the preprocessed images. To train the model, a multi-class classification technique categorizes DR into five stages: normal or no DR, mild, moderate, severe, and PrDR. Limited academic research evaluates CLAHE and DWT with combined datasets, focusing on their performance in categorizing DR stages.
Significant contributions to this study include:
The study utilized two public datasets, including APTOS and DDR, to categorize each stage of DR. The APTOS and DDR datasets were merged to train, validate, and test the model. To mitigate the issue of generalizability, one can address it by blending diverse datasets and evaluating the model’s performance on an unseen test set.
A CNN model has been constructed to predict DR stages. The learning rate is tuned during the model training, and optimizers such as Adaptive moment estimation (Adam), Root mean square propagation (RMSProp), Adamax, Adaptive gradient algorithm (Adagrad), and stochastic gradient descent (SGD) are evaluated on the model. Data augmentation is used in the training set to mitigate the issue of overfitting.
We provide a novel image classification framework that carefully blends many effective processing algorithms to improve classification performance. Each image undergoes preprocessing using CLAHE. By altering the intensity distribution in specific regions of the picture, this approach enhances the contrast of an image. We then employ the DWT to divide images into frequency components to recover spatially covered features.
Our methodology also offers new research paths, notably in integrating DWT with other deep learning architectures and applying it to complicated image processing problems.
The consequent portions of the article are categorized as follows: The background research in section two shows the literature review of the DR, followed by
Section 3, an overview of the methodology used in the investigation. The results are given and examined in part 4, and
Section 5 discusses them. Finally, part six provides a conclusion to the study.
2. Literature Review
Deep learning has been employed in medical imaging for illness diagnosis. Several research papers have investigated the prediction of DR utilizing CNN and pre-trained models, including DenseNet, VGG, ResNet, and Inception [
19]. Moreover, specific study is required to enhance the efficiency of the deep learning framework [
20]. With 94.4% accuracy, Usman et al. [
4] observed that a pre-trained CNN model detected lesions better. The researchers used principal component analysis to characterize fundus pictures.
Gargeya and Leng [
21] introduced a sophisticated automated technique for identifying DR using deep learning methods. It offers a neutral alternative for detecting DR that can be used on any device. Another work by Areeb and Nadeem [
22] categorizes DR photos using support vector machines and random forest methods, with convolutional layers employed to extract features. The analysis revealed that random forest had high sensitivity and specificity, surpassing other approaches in performance.
The surge in transfer learning over the past few years can be linked to the scarcity of supervised learning options for a diverse array of practical scenarios and the abundance of pre-trained models. Several research has constructed DR models utilizing VGG16 [
23,
24], DenseNet [
25], InceptionV3 [
26], and ResNet [
27]. In their study, Qian et al. [
28] utilized transfer learning and attention approaches to classify DR and obtained an accuracy rate of 83.2%. Another paper [
25] utilized a “convolutional block attention module (CBAM)” in conjunction with the APTOS dataset. The DR grading exercise was completed with an accuracy rate of 82%.
Zhang et al. [
29] employed a limited retinal dataset to refine a pre-existing CNN model using a process known as fine-tuning. Using the EyePACS and Messidor datasets, they outperformed previous DR grading techniques regarding accuracy and sensitivity. The research authors [
30] investigated DR classification by combining Swin with wavelet transformation. Their performance reached an accuracy rate of 82%, along with a sensitivity of 0.4626. An attention model was used in conjunction with a pretrained DenseNet model by Dinpajhouh and Seyyedsalehi to determine the severity of DR utilizing the APTOS database [
31]. The evaluation resulted in an accuracy of 83.69%.
A machine-learning approach was suggested to determine the main causes of DR in individuals with elevated glucose levels [
32]. Employing transfer learning approaches, this process isolates and organizes the features of DR into many classes. Before the classification phase, the entropy technique selects the most distinguishing attributes from a set of features. The model’s objective is to ascertain the level of severity in the patient’s eye, and it will be valuable in precisely categorizing the severity of DR. The study conducted by Murugappan et al. [
33] utilized a Few-Shot Learning approach to develop a tool for detecting and evaluating DR. The approach employs an attention mechanism and episodic learning to train the model with limited training data. When evaluated on the APTOS dataset, the model achieves high DR classification and grading regarding accuracy and sensitivity.
CNN’s U-Net architecture is utilized for image segmentation applications [
34]. The study by Jena et al. [
35] employed an asymmetric deep learning architecture utilizing U-Net architecture for DR image segmentation. The pictures from the green channel are utilized to analyze and thus improve the performance using CLAHE. The paper [
36] presents a computerized system that utilizes a mix of U-Net and transfer learning to diagnose DR automatically from fundus pictures. The researchers utilize the Kaggle DR Detection dataset, which is accessible to the public, and improve an Inception V3 model that has already been trained to extract features. The collected characteristics are subsequently input into a U-Net model to partition the retina and optic disc. Divided into segments, the pictures are further categorized into five degrees of DR severity using a multi-layer perceptron classifier. On the Kaggle dataset, their strategy outperforms other cutting-edge techniques with an accuracy of 88.3%.
Transfer learning has been utilized in the domain of DR to overcome the limitation of having little annotated data available to train models based on deep learning. The researchers utilized AlexNet, InceptionV3, VGG16, DenseNet169, VGG19, and ResNet50, trained on large datasets like ImageNet, as feature extractors for DR pictures [
36,
37,
38,
39,
40]. To achieve high precision in DR classification, the pre-existing models were adjusted and optimized using a labeled dataset specifically designed for DR, such as the APTOS, EyePACS, and Messidor datasets. C. Zhang et al. [
29] conducted another investigation that suggested a transfer learning method for assessing diabetic retinopathy without needing a source. The researchers employed a pre-trained CNN model and adjusted its parameters on a limited number of retinal pictures without needing more source data. The evaluation uses the EyePACS and Messidor datasets, surpassing other cutting-edge methodologies regarding accuracy and sensitivity in assessing DR.
Data augmentation is widely regarded as the predominant technique for addressing the problem of imbalanced data in image classification tasks [
41]. It involves creating new data components from current data to enhance the data artificially. Image augmentation techniques, such as cropping, rotating, zooming, vertical and horizontal flip, width shift, and rescaling, can modify the pictures [
42]. Mungloo-Dilmohamud [
43] found that the quality of transfer learning in DR classification is enhanced using a data augmentation technique. Different datasets and CLAHE techniques were employed by Ashwini et al. [
44] for detecting DR. The images were preprocessed using the DWT method. According to their study, the individual datasets performed better than combined datasets.
In a study by Dihin et al. [
45], a wavelet transform with a shifted window model is utilized on the APTOS dataset. Compared to binary classification, which has 98% accuracy, the multi-class classification has reached only 86% accuracy. Image segmentation was carried out by Cornforth et al. [
46], who employed a combination of wavelet assessment, supervised classifier likelihoods, adaptive threshold methods, and morphology-based approaches. Different retinal diseases were monitored by Yagmur et al. [
47] by incorporating the DWT in image processing. Another study by Rehman et al. [
48] applied DWT with different machine learning classifiers, such as k-nearest neighbors (KNN) and support vector machine (SVM) models. They used Messidor data with binary classification.
The challenges of DR detection include proper image processing methods and developing accurate models that can detect different datasets [
49]. Most of the studies employed a single image set for training and testing. The common performance measures used to analyze are accuracy, recall or sensitivity, and precision [
50].
Table 1 describes the inference related to previous DR-related studies.
3. Materials and Methods
The workflow of the study is shown in
Figure 2. The stages involved collecting retinal images from public data, preprocessing the images, designing the model, training and validating the model with fine-tuning, and testing the model with an external dataset.
3.1. Dataset
The dataset employed in this study is public access data. The APTOS [
16] and DDR [
60] data are combined for training, testing, and validation. The APTOS dataset contains 3662 photos, each accompanied by its corresponding label. The DDR dataset has 12,524 images depicting various stages of DR. The two datasets are merged, resulting in 16,186 images fed into the model with five labels (no DR, mild, moderate, severe, and PrDR). However, 16 images were excluded from the dataset due to missing valid image names in the file, so a total of 16,170 images are employed in the study.
The class label no Dr (class 0) has 8062 images, mild DR (class 1) has 999 images, moderate DR (class 2) has 5472 images, severe DR (class 3) has 429 images, and PDR (class 4) has 1208 images. Following a 70:15:15 data split, 11,682 images are utilized for training, 2062 images are employed for validation, and 2426 images for testing. The complete count for each class is depicted in
Table 2.
3.2. Data Augmentation
Deep learning methods exhibit optimal performance when applied to large-scale data sets. Typically, the model’s performance is improved with more data. Whether it is images or data, data augmentation is used to complement the model’s training data. Each image will be assigned an identical label as the original image from which it was generated. Before subjecting the dataset to the model, more images were incorporated into the training set to rectify the imbalance [
41].
Different techniques involve rotating an image, image shifting, flipping, zooming, and applying brightness. This proposed study applies properties such as image rotation with 40 degrees, width shift with a value of 0.2, height shift with 0.2, and horizontal flipping on the training set.
Figure 3 shows the augmented images of a random sample image.
3.3. Image Preprocessing
Image preprocessing is crucial for machine learning and deep learning to function effectively in their respective domains. During the preprocessing stage, it is customary to normalize images by rescaling them to the range [0, 1], ensuring that they have an average of zero and a standard deviation of 1. Scaling input data involves standardizing it, which promotes faster learning and convergence in neural networks.
CLAHE is an image processing method employed to enhance the contrast and visibility of fine features in images, particularly those that exhibit uneven lighting conditions or poor contrast. CLAHE prevents contrast over-amplification in adaptive histogram equalization (AHE). Tiles, not the complete image, are what CLAHE processes. It removes spurious borders by combining nearby tiles using bilinear interpolation. This method boosts visual contrast [
57]. CLAHE may also be used to color photos, mainly to the luminance channel, which yields better results than total channel equalization in BGR images. Initially, CLAHE enhanced the quality of medical images with low contrast.
Contrary to typical AHE, CLAHE restricts contrast. The CLAHE established a clipping limit to alleviate the problem of noise amplification [
61]. In this study, after applying CLAHE preprocessing, the images were resized to a similar size. The image size varies in each dataset; for example, APTOS images are 3216x2136 pixels, and DDR has 512x512 pixels. So, to make it similar, the image size parameter is set to 150x150 pixels. Then, the DWT method is applied to each image and rescaled to [0,1].
3.3.1. Discrete Wavelet Transform
A mathematical method for processing signals and images, the Discrete Wavelet Transform (DWT) allows the examination of distinct signal components at varying resolutions or scales. In particular, the DWT breaks down a signal into two sets of coefficients: approximation coefficients and detail coefficients, which stand for the signal’s low-frequency and high-frequency components, respectively [
62]. In frequency, among the most often used techniques for image processing is the wavelet transform, which allows the information contained in images to be represented collectively [
63]. Using the Python wavelet framework, the DWT is represented by
where cA represents approximation coefficients, cH is the horizontal coefficients, cV is the vertical coefficients, and cD is the diagonal coefficients. Here cH, cV, and cD are the detailed coefficients.
The first step in the DWT method is to run the original image through two filters: a high-pass filter to extract the cV, cH, and cD and a low-pass filter to extract the cA component, as shown in Equation (1). These filters aid in separating the signal’s fine and coarse features. Following filtering, a downsampling of two is applied to both sets of coefficients, reducing the image resolution to half.
3.4. The Deep Learning Model
The proposed model aims to precisely classify and predict DR stages by analyzing the image. The input image preprocessed data is used as features, while the label supplied to each image acts as the target variable. Multiple filters and layers extract image information and attributes throughout the feature extraction process. Following this stage, the images are categorized based on the desired target labels. The recommended model’s architecture is seen in
Figure 4.
The input layer is modified by reshaping it to include three channels, which correspond to the red, green, and blue channels of the RGB color model. The picture dimensions are adjusted using 150 (width and height) values. The preprocessed image (with CLAHE and DWT applied) is fed into the CNN model. The image resolution reduces to half after applying DWT method.
Convolutional layers have convolution operations to summarize and minimize information features. The feature map output shows picture corners and edges. Later, further layers use this feature map to learn more input image features. The pooling layer reduces convolved feature map size to reduce computational expenses. It is done by lowering layer connections and separately operating on each feature map. The kernel size is 3, and the units of each convolutional layer are 16, 32, 64, and 128, respectively. The activation function employed is ‘relu’. Max Pooling uses the most significant feature map element with a pool size of two. BatchNormalization in neural networks is a technique that normalizes the activation levels between layers. It helps to improve the efficiency and accuracy of model training by using regularization. The Flatten layer reduces the dimensionality of the previous layer’s pictures by flattening the last layer’s results.
The fully connected layers consist of two dense layers. The first dense layer has 256 units, and the activation function is ‘relu’. The model’s overfitting is mitigated by including a dropout layer set at a frequency rate of 0.5. These layers are then connected to the output dense layer, which has ‘softmax’ activation and is designed to classify data into five different DR stages. The loss function is sparse categorical entropy, as the class labels are considered integers.
3.4.1. Model Optimizer
Optimization methods are essential to machine and deep learning model training for enhancing the model’s predictions. Various techniques have been created to tackle specific optimization problems, including handling sparse gradients or quickening convergence [
64]. One of the most straightforward yet most powerful optimization methods is SGD. It uses a small portion of training data to update the model’s weights instead of the entire dataset, which speeds up computation significantly. By scaling each parameter inversely proportionate to the square root of the total of all of its previous squared values, Adagrad modifies the learning rates of all parameters. Adagrad works exceptionally well with sparse data.
RMSprop uses a moving average of squared gradients to solve the Adagrad problem of declining learning rates. This technique dynamically modifies the learning rate for each parameter, increasing it for steps with small gradients to expedite convergence and decreasing it for steps with significant gradients to avoid divergence. By modifying the learning rate according to a moving average of the gradients and their squares, Adam combines the benefits of Adagrad and RMSprop. Because of its ability to traverse the optimization landscape efficiently, Adam is one of the most often used optimizers for deep learning model training. A variant of Adam based on the infinite norm is called Adamax. Despite being less popular than Adam, Adamax can be helpful in some situations when its normalizing technique has benefits.
The choice of optimizer can be influenced by the particulars of the training job and the kind of data since each optimizer has advantages and uses [
65]. For example, Adagrad or RMSprop may be favored for jobs with sparse data, although Adam is frequently used due to its broad usefulness across various situations. To ascertain which optimizer is most effective in this study, we utilize all five optimizers mentioned above to evaluate the model.
3.5. Performance Measures
Accuracy in DR detection measures how often the model successfully detects positive and negative situations (Equation (2)). Precision measures how successfully the model recognizes real positive cases (actual DR) among all projected positive cases (Equation (3)). Recall, or sensitivity, (Equation (4)) measures how well the model collects and detects all real occurrences of DR, minimizing false negatives. F1-score is essential because it incorporates false positives and negatives (Equation (5)). It is beneficial when positive and negative situations are imbalanced. We may calculate the AUC (area under the curve) to see how much of the graphic is below the curve. The closer the AUC is to 1, the better the model is [
66].
where truePos is true positive, trueNeg is true negative, falsePos is false positive, and falseNeg is false negative.
Figure 1.
The stages of DR from the APTOS dataset. (a) No DR or normal retina, (b) mild DR, (c) moderate DR, (d) severe DR, and (e) proliferative DR.
Figure 1.
The stages of DR from the APTOS dataset. (a) No DR or normal retina, (b) mild DR, (c) moderate DR, (d) severe DR, and (e) proliferative DR.
Figure 2.
The suggested study’s workflow.
Figure 2.
The suggested study’s workflow.
Figure 3.
Example of an augmented image from a random image in training data. From left to right: the original image, followed by a horizontally flipped image, height-shifted image, width-shifted image, and rotated image.
Figure 3.
Example of an augmented image from a random image in training data. From left to right: the original image, followed by a horizontally flipped image, height-shifted image, width-shifted image, and rotated image.
Figure 4.
The architecture of the proposed model.
Figure 4.
The architecture of the proposed model.
Figure 5.
The original and preprocessed image in the model. (a) raw image, (b) CLAHE preprocessed image, (c) sample of Approximation coefficients from DWT preprocessing.
Figure 5.
The original and preprocessed image in the model. (a) raw image, (b) CLAHE preprocessed image, (c) sample of Approximation coefficients from DWT preprocessing.
Figure 6.
The accuracy plot of training and validation data. (a) Adam optimizer, (b) RMSProp optimizer, (c) SGD optimizer, (d) Adagrad optimizer, (e) Adamax optimizer.
Figure 6.
The accuracy plot of training and validation data. (a) Adam optimizer, (b) RMSProp optimizer, (c) SGD optimizer, (d) Adagrad optimizer, (e) Adamax optimizer.
Figure 7.
The loss plot of training and validation data. (a) Adam optimizer, (b) RMSProp optimizer, (c) SGD optimizer, (d) Adagrad optimizer, (e) Adamax optimizer.
Figure 7.
The loss plot of training and validation data. (a) Adam optimizer, (b) RMSProp optimizer, (c) SGD optimizer, (d) Adagrad optimizer, (e) Adamax optimizer.
Figure 8.
The comparison of the proposed model’s performance with the test dataset.
Figure 8.
The comparison of the proposed model’s performance with the test dataset.
Table 1.
Inference from the previous studies related to the classification of DR stages.
Table 1.
Inference from the previous studies related to the classification of DR stages.
Reference |
Model |
Dataset |
Advantage |
Disadvantage |
[19] |
DenseNet121 |
APTOS, EyePACS, ODIR |
Able to classify based on different datasets. |
The stage of DR is not detected. |
[51] |
CNN |
Kaggle |
The model achieves an accuracy of 89%. |
The model needs to be tested on different datasets. |
[52] |
VGG16 |
IDRiD |
The model was capable of detecting different stages of DR. |
The model was tested on very few images. |
[53] |
Vision Transformer |
DDR, IDRiD |
The model was capable of detecting different stages of DR. |
The model has a class imbalance problem and tests on a few images. |
[54] |
ResNext, DenseNet |
APTOS |
The model classifies the DR with high performance. |
The model has a class imbalance problem and tests on a single dataset. |
[55] |
Capsule network |
Messidor |
The model detects the stages of DR. |
Only four stages are detected and tested only on a single dataset. |
[56] |
InceptionV3, Resnet50, CNN |
Messidor, IDRiD |
The model classifies the DR with high performance. |
The features are extracted using only two pretrained models. |
[35] |
U-Net |
APTOS, Messidor |
The model segments and detects the stages of DR. |
Only four stages are detected, and parameter tuning can be performed. |
[57] |
EfficientNet, VGG16, InceptionV3 |
APTOS |
The model classifies DR after CLAHE preprocessing. |
The stage of DR is not detected, and only a single dataset is evaluated. |
[58] |
InceptionV3 |
APTOS |
The model classifies DR stages after CLAHE preprocessing. |
Only a single dataset and model are evaluated. |
[59] |
Swin Transformer |
APTOS |
The model classifies DR with high performance. |
The stage of DR is not detected, and the model can be tuned with more datasets. |
[43] |
VGG16 |
APTOS, Mauritius |
The model detects the stages of DR. |
The model needs to be tuned for moderate and proliferative DR. |
[45] |
Wavlet with Swin Transformer |
APTOS |
The classification accuracy was improved. |
The study only utilized a single image set for testing the model. |
[48] |
DWT with KNN, SVM |
Messidor |
The model classifies the normal, and DR images perfectly. |
The stage of DR is not detected, and the dataset contains fewer samples. |
Table 2.
The count of each class in training, validating, and testing datasets.
Table 2.
The count of each class in training, validating, and testing datasets.
Dataset |
Class 0 |
Class 1 |
Class 2 |
Class 3 |
Class 4 |
Training |
5824 |
722 |
3953 |
310 |
873 |
Validation |
1028 |
127 |
698 |
55 |
154 |
Testing |
1210 |
150 |
821 |
64 |
181 |
Table 3.
The training parameter of the model.
Table 3.
The training parameter of the model.
Parameter |
Value |
Image size |
150 |
Initial Learning rate |
0.001 |
Optimizer |
Adam, SGD, Adamax, Adagrad, RMSProp |
Loss function |
SparseCategoricalEntropy |
Epoch |
70 |
Batch Size |
32 |
Table 4.
The validation accuracy of the APTOS-DDR dataset.
Table 4.
The validation accuracy of the APTOS-DDR dataset.
Optimizers |
Accuracy |
Loss |
Adam |
68.77% |
0.8432 |
SGD |
61.29% |
1.0354 |
Adamax |
60.74% |
0.9731 |
Adagrad |
64.95% |
0.9069 |
RMSProp |
65.06% |
0.9185 |
Table 5.
The test accuracy result of all optimizers.
Table 5.
The test accuracy result of all optimizers.
Optimizer |
Accuracy |
Recall |
Precision |
F1-score |
Loss |
Adam |
69.51% |
0.70 |
0.64 |
0.65 |
0.8215 |
SGD |
62.21% |
0.62 |
0.55 |
0.58 |
1.0032 |
Adamax |
62.61% |
0.63 |
0.57 |
0.57 |
0.9573 |
Adagrad |
65.05% |
0.65 |
0.63 |
0.64 |
0.8988 |
RMSProp |
66.67% |
0.67 |
0.62 |
0.62 |
0.8906 |
Table 6.
The AUC score of the test result.
Table 6.
The AUC score of the test result.
Optimizer |
AUC |
Adam |
0.8153 |
SGD |
0.7253 |
Adamax |
0.7783 |
Adagrad |
0.7885 |
RMSProp |
0.7772 |
Table 7.
Comparison of the proposed study with previous studies.
Table 7.
Comparison of the proposed study with previous studies.
Reference |
Year |
Model |
Class type |
Dataset |
Accuracy |
F1-score |
AUC |
[48] |
2018 |
KNN, DWT |
Binary |
Messidor |
98.16% |
- |
- |
[69] |
2022 |
CNN |
5-class |
DDR |
66.68% |
- |
- |
[53] |
2023 |
Vision Transformer |
6-class |
DDR |
91.54% |
0.67 |
- |
[45] |
2024 |
Swin Transformer, DWT |
5-class |
APTOS |
86.00% |
- |
- |
Proposed model |
CNN with CLAHE, DWT |
5-class |
APTOS+DDR |
69.51% |
0.65 |
0.81 |