Submitted:
29 April 2025
Posted:
30 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Research Objectives
2. Literature Review
2.1. Evolution of Skin Disease Classification Methods
2.2. Deep Learning Approaches in Dermatological Image Analysis
2.3. Multi-Class Classification of Diverse Skin Conditions
2.4. Addressing Diversity and Fairness in Skin Disease Classification
2.5. Interpretability and Explainability in Dermatological AI
2.6. Comparative Analysis of Current Approaches
3. Research Methodology
3.1. Dataset Acquisition and Preparation
3.1.1. Data Sources
3.1.2. Data Preprocessing and Augmentation
- Random rotations (±30°)
- Horizontal and vertical flips
- Random brightness and contrast adjustments (±15%)
- Random zoom (0.8-1.2x)
- Slight color jitter (hue and saturation shifts of ±10%)
- Random cropping with minimum 85% area coverage
- Cutout regularization with random 32×32 pixel patches
3.1.3. Dataset Stratification
3.2. CNN Architecture and Implementation
3.2.1. Base Architecture Selection
3.2.2. Architectural Modifications
- Global average pooling to squeeze spatial information
- A two-layer fully connected network with bottleneck structure to generate channel-wise attention weights
- Rescaling of the original feature maps using these weights
- Global average pooling layer to aggregate spatial information
- Dropout layer (rate = 0.5) to reduce overfitting
- Fully connected layer with 1024 units and ReLU activation
- Batch normalization layer to stabilize training
- Dropout layer (rate = 0.3) for additional regularization
- Final fully connected layer with softmax activation, outputting probabilities for N disease classes
3.2.3. Implementation Details
3.3. Training Strategy and Optimization
3.3.1. Transfer Learning Approach
- The EfficientNet-B3 base model weights were frozen
- Only the custom classification head was trained
- Learning rate: 1e-3, Optimizer: Adam
- The final 50 layers of the base model were unfrozen
- Both these layers and the classification head were trained
- Learning rate: 5e-4, Optimizer: Adam with weight decay (1e-5)
- The entire network was unfrozen and trained end-to-end
- Learning rate: 1e-4, Optimizer: Adam with weight decay (1e-5)
- Cosine annealing learning rate schedule with warm restarts
3.3.2. Loss Function and Class Weighting
3.3.3. Regularization and Early Stopping
3.4. Evaluation Framework
3.4.1. Performance Metrics
3.4.2. Comparative Analysis
- Impact of attention mechanisms (with vs. without SE blocks)
- Effect of multi-scale feature fusion
- Contribution of different data augmentation strategies
- Influence of transfer learning and progressive unfreezing
3.4.3. Fairness and Bias Assessment
3.4.4. Model Interpretability Analysis
References
- American Cancer Society, “Cancer facts & Figure 2023,” American Cancer Society, Atlanta, GA, 2023.
- G. Argenziano, et al., “Twenty years of dermoscopy,” Journal of the American Academy of Dermatology, vol. 81, no. 4, pp. 1088-1086, 2019.
- C. Barata, M. E. Celebi, and J. S. Marques, “Explainable skin lesion diagnosis using taxonomies,” Pattern Recognition, vol. 110, pp. 107413, 2021.
- M. E. Celebi, et al., “A methodological approach to the classification of dermoscopy images,” Computerized Medical Imaging and Graphics, vol. 31, no. 6, pp. 362-373, 2007.
- M. E. Celebi, H. A. Kingravi, H. Iyatomi, Y. A. Aslandogan, W. V. Stoecker, R. H. Moss, J. M. Malters, J. M. Grichnik, A. A. Marghoob, H. S. Rabinovitz, and S. W. Menzies, “Border detection in dermoscopy images using statistical region merging,” Skin Research and Technology, vol. 14, no. 3, pp. 347-353, 2008.
- N. C. F. Codella, D. Gutman, M. E. Celebi, B. Helba, M. A. Marchetti, S. W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler, and A. Halpern, “Skin lesion analysis toward melanoma detection: A challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC),” IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 168-172, 2018.
- N. C. F. Codella, V. Rotemberg, P. Tschandl, M. E. Celebi, S. Dusza, D. Gutman, B. Helba, A. Kalloo, K. Liopyris, M. Marchetti, H. Kittler, and A. Halpern, “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC),” arXiv:1902.03368, 2019.
- R. Daneshjou, M. Vodrahalli, V. Novoa, M. Marsch, M. Alam, J. Y. Lee, S. Abrouk, H. Rabinovitz, M. Frazier, S. Sadeghpour, A. Young, L. Phillips and J. Zou, “Disparities in dermatology AI: Fewer skin of color images, lower performance, and fairness warnings,” Journal of the American Academy of Dermatology, vol. 86, no. 1, pp. 103-114, 2021.
- R. Daneshjou, R. Zakeri, and J. Zou, “Artificial intelligence and dermatology: opportunities, challenges, and future directions,” JAMA Dermatology, vol. 158, no. 3, pp. 318-324, 2022.
- A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, pp. 115-118, 2017.
- H. Feng, J. Berk-Krauss, P. W. Feng, and J. A. Stein, “Comparison of dermatologist density between urban and rural counties in the United States,” JAMA Dermatology, vol. 154, no. 11, pp. 1265-1271, 2018.
- A. Y. Finlay, R. J. Hay, N. C. Dlova, S. A. Garg, R. Joshipura, and S. Lulla, “Global alliance for patients with serious skin diseases: a way forward,” Journal of the American Academy of Dermatology, vol. 76, no. 2, pp. 368-370, 2017.
- H. Ganster, A. Pinz, R. Röhrer, E. Wildling, M. Binder, and H. Kittler, “Automated melanoma recognition,” IEEE Transactions on Medical Imaging, vol. 20, no. 3, pp. 233-239, 2001.
- A. Garza-Mayers and K. C. McClain, “Telemedicine in deep disparities: a persistent pandemic illuminates the need to address a perennial problem,” Journal of the American Academy of Dermatology, vol. 83, no. 6, pp. e401-e402, 2020.
- N. Gessert, T. Sentker, F. Madesta, R. Schmitz, H. Kniep, I. Baltruschat, R. Werner, and A. Schlaefer, “Skin lesion classification using CNNs with patch-based attention and diagnosis-guided loss weighting,” IEEE Transactions on Biomedical Engineering, vol. 67, no. 2, pp. 495-503, 2020.
- M. Groh, C. Harris, A. Soenksen, F. Lau, R. Han, A. Kim, A. Koochek, and O. Badri, “Evaluating deep neural networks trained on clinical images in dermatology with the Fitzpatrick 17k dataset,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1820-1828, 2021.
- H. A. Haenssle, C. Fink, R. Schneiderbauer, F. Toberer, T. Buhl, A. Blum, A. Kalloo, A. Ben Hadj Hassen, L. Thomas, A. Enk, L. Uhlmann, “Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists,” Annals of Oncology, vol. 29, no. 8, pp. 1836-1842, 2018.
| Study | Architecture | Dataset | Performance | Limitations |
|---|---|---|---|---|
| Esteva et al. [10] (2017) | Inception v3 | 129,450 images, 2,032 diseases | AUC 0.96 for keratinocyte carcinoma, 0.94 for melanoma | Limited demographic diversity; binary classification focus |
| Han et al. [19] (2018) | ResNet-152 | 15,408 images, 12 diseases | Mean AUC 0.95 across all classes | Limited to common conditions; single-center dataset |
| Tschandl et al. [36] (2019) | Ensemble of ResNet-50, SE-ResNeXt-50 | HAM10000 dataset, 7 skin conditions | Accuracy 87.3%, mean AUC 0.93 | Limited to dermoscopic images; seven disease classes only |
| Liu et al. [11] (2020) | Dual-branch CNN with SENet | 8,545 images (clinical and dermoscopic), 9 diseases | Accuracy 89.7%, F1-score 0.87 | Requires both clinical and dermoscopic images for optimal performance |
| Gessert et al. [15] (2020) | Multi-resolution ResNet-50 | ISIC 2019 dataset, 8 disease classes | Balanced accuracy 63.9%, AUC 0.93 for melanoma | Limited to dermoscopic images; moderate performance on rare classes |
| Daneshjou et al. [8] (2021) | DenseNet-121 | Multi-source dataset with Fitzpatrick skin type annotations | 10-15% lower accuracy on darker skin tones | Highlights disparities but does not fully resolve them |
| Wu et al. [37] (2022) | EfficientNet-B4 with attention | 45,000 images, 26 disease classes | Top-1 accuracy 82.6%, top-3 accuracy 95.7% | Limited evaluation across demographic subgroups |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).