Submitted:
12 December 2023
Posted:
13 December 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- How do different attention mechanisms affect the performance of transfer learning models in the field of skin cancer detection and classification?
- What is the trade-off between computational complexity and performance when incorporating attention mechanisms into transfer learning models for skin cancer detection and classification?
2. Related Work
2.1. Transfer Learning in Skin Cancer Detection and Classification
2.2. Attention Mechanisms in Skin Cancer Detection and Classification
3. Background
3.1. Transfer Learning in Computer Vision
- DenseNet
- DenseNet is a robust deep learning framework renowned for its unique dense connectivity structure, which facilitates direct connections between each layer. This design promotes efficient feature reuse, rendering feature extraction remarkably effective. DenseNet's architecture enables the development of larger networks that remain resource-efficient, making it a favored option for a wide range of computer vision tasks [37]. The efficacy of the DenseNet framework has been demonstrated in addressing various challenges, such as poor convergence, overfitting, and gradient disappearance, that may arise in comprehensive architectures [38].
-
InceptionInception, also known as GoogLeNet, is a deep convolutional neural network architecture that brought forth the notion of inception modules. These modules employ various filter sizes within the same layer to capture features at various scales. Inception accomplished high accuracy in image classification tasks while minimizing computational complexity [39].
-
MobileNetMobileNet, a deep convolution neural network, was introduced by Google in the year 2017. Its distinguishing characteristic lies in its ability to effectively utilize computational resources and model size, rendering it suitable for environments with limited resources such as mobile devices and embedded systems. The efficiency of MobileNet is achieved through the implementation of depth-wise separable convolutions, which serve to decrease the number of parameters while simultaneously enabling the extraction of meaningful features by the model [40].
-
VGGVGG (Visual Geometry Group) signifies a classical convolutional neural network (CNN) architecture from the University of Oxford celebrated for its simplicity and efficacy in image analysis, boasting a uniform structure comprising 16 or 19 layers of 3x3 convolutional layers and max-pooling layers. VGG's impact extends beyond ImageNet, excelling in various tasks and datasets. Its architecture includes 64-channel 3x3 convolutional layers with 1x1 convolution filters and ReLU units, concluding with three fully connected layers with 4096 channels and 1000 classes [41].
-
XceptionXception, a deep learning framework, was introduced by Google in the year 2016. The incorporation of depth-wise separable convolutions in Xception enhances the performance and efficiency of convolutional neural networks (CNNs). Through the separation of spatial and depth-wise convolutions, Xception reduces the number of parameters, while simultaneously preserving the ability to accurately capture complex features. Consequently, this produces a more compact and computationally efficient model [42].
3.2. Attention Mechanisms
-
Spatial AttentionSpatial Attention is a deep learning technique that improves a model's focus on specific areas (regions) of an image while reducing attention to others, allowing the model to prioritize important parts of the input data.
- 1.
- Average Pooling: First, an average pooling operation is applied along the channel axis (axis=-1) to calculate the average value of each spatial location across all channels. This operation is represented as avg_pool, and the resulting tensor has dimensions H x W x 1. This operation is expressed mathematically as:where Ii,j,c represents the value of the input tensor at spatial position (i, j) and channel (c).
- 2.
- Sigmoid Convolution: Next, a convolution operation is applied with a kernel size of (1, 1) and a single output channel. This convolution layer has a sigmoid activation function. The purpose of this convolution is to learn spatial attention weights for each spatial location. The output of this convolution, denoted as conv_output, has the dimensions H x W x 1 and contains values between 0 and 1 due to the sigmoid activation:where δ represents the sigmoid activation function, Wi,j,c represents the learned convolution kernel weights, and avg_pooli,j represents the average-pooled value at spatial position (i, j).
- 3.
- Spatial Attention Applied to Input: Finally, the output of the sigmoid convolution is element-wise multiplied with the original input tensor Ii,j,c to produce the final output of the Spatial Attention layer. This operation assigns higher weights to spatial locations that are more important based on the learned attention values:
- 2.
- Channel Attention
-
Average Pooling: First, an average pooling operation is applied to the input tensor I along the spatial dimensions (height and width). This operation calculates the average value for each channel, resulting in a tensor avg_pool with dimensions 1 x 1 x C. The average pooling operation is represented mathematically as:Where Ii,j,c represents the value of the input tensor at spatial position (i, j) and channel (c).
-
Fully Connected Layers (FC1 and FC2): Two fully connected (dense) layers are used to process the avg_pool tensor:
- fc1: This layer reduces the dimensionality of the avg_pool tensor by applying a linear transformation followed by a ReLU activation. The output of fc1, denoted as fc1_output, has dimensions 1 x 1 x (C/8), where (C/8) represents a reduction of one-eighth of the original channel dimension.
- fc2: This layer takes the output of fc1 and applies another linear transformation followed by a sigmoid activation function. The output of fc2, denoted as channel_attention, has the same dimensions as avg_pool, which is 1 x 1 x C.where δ represents the sigmoid activation function, W1 and W2 are learnable weight matrices, and b1 and b2 are learnable bias vectors.
- Channel Attention Application: Finally, the channel attention tensor channel_attention is element-wise multiplied with the original input tensor I to produce the final output of the Channel Attention layer:
- 3.
- Positional Attention
- Query Computation: The first step is to compute a query tensor, denoted as query, which will determine the attention weights for each position in the input tensor. This is done using a 1x1 convolution layer with a sigmoid activation. The output of this convolution, query, has the same dimensions as the input tensor H x W x 1 and contains values between 0 and 1 due to the sigmoid activation:where δ represents the sigmoid activation function, Wi,j,c represents the learned convolution kernel weights, and Ii,j,c represents the value of the input tensor at spatial position (i, j) and channel (c).
- Positional Attention Application: The final output of the Positional Attention layer is obtained by element-wise multiplying the original input tensor I with the query tensor query:
- 4.
- Nonlocal Attention
- 1.
-
Query, Key, and Value Computation:
- Query (Q): Calculate query matrices by applying a 1x1 convolution operation to the input tensor I. The query tensor, denoted as queries, has the same dimensions as the input tensor H x W x C.
- Key (K): Calculate key matrices by applying another 1x1 convolution operation to I. The key tensor, denoted as keys, also has dimensions H x W x C.
- Value (V): Calculate value matrices by applying a third 1x1 convolution operation to I. The value tensor, denoted as values, has dimensions H x W x C.
- 2.
- Attention Score Calculation: Compute attention scores attention_scores by taking the dot product of the queries and keys, followed by a Softmax operation along the last dimension (channel dimension):where (.) represents the dot product, and the Softmax operation is applied along the channel dimension to obtain normalized attention scores for each position in the input.
- 3.
- Applying Attention to Values: Apply the computed attention scores to the values to obtain the attended values attention_values:
- 4.
- Combining Attention Values with Inputs: Multiply the attended values attention_values element-wise with the original input tensor I:
- 5.
- Global Context Attention
- 1.
- Global Average Pooling: The first step is to compute a global context vector by applying global average pooling to the input tensor I. Global average pooling computes the average value across all spatial positions, resulting in a tensor of size 1x1xC:
- 2.
- Context Vector Generation: To generate a context vector that can be applied to the input tensor, a fully connected (dense) layer is used. This layer takes the global context tensor as input and produces a context vector context_vector with dimensions 1x1xC using a sigmoid activation function:
- 3.
- Expanding the Context Vector: To match the dimensions of the input tensor, the context vector context_vector is expanded by adding two dimensions with size 1 at the beginning. This results in a context vector with dimensions 1x1x1xC:
- 4.
- Applying Global Context to Input: Finally, the input tensor I is multiplied element-wise by the context vector context_vector to produce the final output of the Global Context Attention layer:
- 6.
- Guided Attention
- 1.
- Attention Map Calculation: The first step is to compute an attention map, denoted as attention_map, using a convolutional layer. The convolutional layer, with a kernel size of (3, 3) and a sigmoid activation function, calculates the attention map with the same spatial dimensions as the input tensor (H x W) but with a single channel:where δ represents the sigmoid activation function.
- 2.
- Guided Attention Application: The final output of the Guided Attention layer is obtained by element-wise multiplying the original input tensor I with the attention map attention_map:
4. Methodology
4.1. Data Collection and Preprocessing
4.1.1. Data Collection
- HAM10000 dataset
- 2.
- ISIC2017 dataset
- 3.
- ISIC2017 dataset
4.1.2. Data Preprocessing
-
Prepare Datasets
- HAM10000 dataset
- II.
- ISIC2017 dataset
- III.
- Melanoma dataset
- 2.
- Preprocessing data
- 3.
- Data augmentation
4.2. Transfer Learning framework
4.2.1. Transfer Learning Models
4.2.2. Details of Transfer Learning Framework
4.3. Attention Mechanisms Integration
4.4. Evaluation Metrics
- Accuracy:
- Precision:
- Recall (Sensitivity or True Positive Rate):
- F1-Score:
5. Experimental Setup:
5.1. Experiment Design:
5.2. Model Complexity
6. Experimental Results
- Experiments on the HAM10000 dataset
- A.
- Impact of Attention Mechanisms on Base Models
- B.
- Performance of Different Transfer Learning Models
- C.
- Influence of Attention Mechanisms
- 2.
- Experiments on the ISIC2017 dataset
- A.
- Performance of Base Transfer Learning Models
- B.
- Performance from the Perspective of Transfer Learning Models
- C.
- Performance from the Perspective of Attention Mechanisms
- 3.
- Experiments on Melanoma dataset
- A.
- Base Transfer Learning Model Performance
- B.
- Transfer Learning Model Performance with Attention Mechanisms
- C.
- Attention Mechanism Performance
7. Discussion
-
Dataset-specific resultsThe following are some specific observations from the results of each dataset:
- ∘
-
HAM10000 dataset:The spatial attention and channel attention mechanisms were the most effective for all models on this dataset. The other attention mechanisms were generally less effective, and sometimes even decreased the performance of the models.
- ∘
-
ISIC Archive datasetThe spatial attention mechanism was the most effective for all models on this dataset. The channel attention mechanism was also effective for most models, but it decreased the performance of the Xception model. The other attention mechanisms were generally less effective, and sometimes even decreased the performance of the models.
- ∘
-
Melanoma datasetThe spatial attention and channel attention mechanisms were the most effective for all models on this dataset. The global context attention mechanism was also effective for the DenseNet121, VGG16, and Xception models, but it decreased the performance of the InceptionV3 and MobileNet models. The other attention mechanisms were generally less effective, and sometimes even decreased the performance of the models.
- Consideration of Model Complexity:
- Attention Mechanism Selection:
8. Conclusion and Future Work
Data Availability Statement
Conflicts of Interest
References
- Fernandez, R.M. SDG3 good health and well-being: integration and connection with other SDGs. Good Health Well-Being 2020, 629–636.
- Gandhi, S.A.; Kampp, J. Skin cancer epidemiology, detection, and management. Med. Clin. 2015, 99, 1323–1335. [Google Scholar] [CrossRef] [PubMed]
- Ashim, L.K.; Suresh, N.; Prasannakumar, C.V. A comparative analysis of various transfer learning approaches skin cancer detection. In 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI); IEEE: Piscataway, NJ, USA, 2021, pp. 1379–1385. [Google Scholar]
- Kondaveeti, H.K.; Edupuganti, P. Skin cancer classification using transfer learning. In 2020 IEEE International Conference on Advent Trends in Multidisciplinary Research and Innovation (ICATMRI); IEEE: Piscataway, NJ, USA, 2020; pp. 1–4. [Google Scholar]
- Fraiwan, M.; Faouri, E. On the automatic detection and classification of skin cancer using deep transfer learning. Sensors 2022, 22, 4963. [Google Scholar] [CrossRef] [PubMed]
- Suganthe, R.C.; Shanthi, N.; Geetha, M.; Manjunath, R.; Krishna, S.M.; Balaji, P.M. Performance Evaluation of Transfer Learning Based Models On Skin Disease Classification. In 2023 International Conference on Computer Communication and Informatics (ICCCI); IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
- Naik, P.P.; Annappa, B.; Dodia, S. Annappa, B.; Dodia, S. An Efficient Deep Transfer Learning Approach for Classification of Skin Cancer Images. In International Conference on Computer Vision and Image Processing; Springer: Berlin/Heidelberg, Germany, 2022; pp. 524–537. [Google Scholar]
- Haque, M.E.; Ahmed, M.R.; Nila, R.S.; Islam, S. Classification of human monkeypox disease using deep learning models and attention mechanisms. arXiv 2022, arXiv: 221115459.
- Cheng, Z.; Huo, G.; Li, H. A multi-domain collaborative transfer learning method with multi-scale repeated attention mechanism for underwater side-scan sonar image classification. Remote Sens. 2022, 14, 355. [Google Scholar] [CrossRef]
- Song, Z.; Zhou, Y. Skin cancer classification based on CNN model with attention mechanism. In Second International Conference on Medical Imaging and Additive Manufacturing (ICMIAM 2022); SPIE: Bellingham, WA, USA, 2022; pp. 281–287. [Google Scholar]
- Datta, S.K.; Shaikh, M.A.; Srihari, S.N.; Gao, M. Soft attention improves skin cancer classification performance. In Interpretability of Machine Intelligence in Medical Image Computing, and Topological Data Analysis and Its Applications for Medical Data: 4th International Workshop, iMIMIC 2021, and 1st International Workshop, TDA4MedicalData 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 4; Springer: Berlin/Heidelberg, Germany, 2021; pp. 13–23. [Google Scholar]
- Wang, X.; Yang, Y.; Mandal, B. Automatic detection of skin cancer melanoma using transfer learning in deep network. In AIP Conference Proceedings; AIP Publishing: Long Island, NY, USA, 2023. [Google Scholar]
- Sathish, K.; Mohanraj, A.; Raman, R.; Sudha, V.; Kumar, A.; Vijayabhaskar, V. IoT based Mobile App for Skin Cancer Detection using Transfer Learning. In 2022 Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC); IEEE: Piscataway, NJ, USA, 2022; pp. 16–22. [Google Scholar]
- Hemalatha, D.; Latha, K.N.; Latha, P.M. Skin Cancer Detection Using Deep Learning Technique. In 2023 2nd International Conference for Innovation in Technology (INOCON); IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
- Barman, S.; Biswas, M.R.; Marjan, S.; Nahar, N.; Hossain, M.S.; Andersson, K. Transfer Learning Based Skin Cancer Classification Using GoogLeNet. In International Conference on Machine Intelligence and Emerging Technologies; Springer: Berlin/Heidelberg, Germany, 2022; pp. 238–252. [Google Scholar]
- Rashid, J.; et al. Skin cancer disease detection using transfer learning technique. Appl. Sci. 2022, 12, 5714. [Google Scholar] [CrossRef]
- Khandizod, S.; Patil, T.; Dode, A.; Banale, V.; Bawankar, C.D. Deep Learning based Skin Cancer Classifier using MobileNet”.
- Agrahari, P.; Agrawal, A.; Subhashini, N. Skin Cancer Detection Using Deep Learning. In Futuristic Communication and Network Technologies; Sivasubramanian, A., Shastry, P.N., Hong, P.C., Eds.; in Lecture Notes in Electrical Engineering; Springer Nature: Singapore, 2022; pp. 179–190. [Google Scholar] [CrossRef]
- Bansal, N.; Sridhar, S. Skin lesion classification using ensemble transfer learning. In Second International Conference on Image Processing and Capsule Networks: ICIPCN 2021 2; Springer: Berlin/Heidelberg, Germany, 2022; pp. 557–566. [Google Scholar]
- Hum, Y.C.; et al. The development of skin lesion detection application in smart handheld devices using deep neural networks. Multimed. Tools Appl. 2022, 81, 41579–41610. [Google Scholar] [CrossRef]
- Khan, M.R.H.; Uddin, A.H.; Nahid, A.-A.; Bairagi, A.K. Skin cancer detection from low-resolution images using transfer learning. In Intelligent Sustainable Systems: Proceedings of ICISS 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 317–334. [Google Scholar]
- Panthakkan, A.; Anzar, S.M.; Jamal, S.; Mansoor, W. Concatenated Xception-ResNet50—A novel hybrid approach for accurate skin cancer prediction. Comput. Biol. Med. 2022, 150, 106170. [Google Scholar] [CrossRef] [PubMed]
- Mehmood, A.; Gulzar, Y.; Ilyas, Q.M.; Jabbari, A.; Ahmad, M.; Iqbal, S. SBXception: A Shallower and Broader Xception Architecture for Efficient Classification of Skin Lesions. Cancers 2023, 15, 3604. [Google Scholar] [CrossRef] [PubMed]
- Shaaban, S.; Atya, H.; Mohammed, H.; Sameh, A.; Raafat, K.; Magdy, A. Skin Cancer Detection Based on Deep Learning Methods. In The International Conference on Artificial Intelligence and Computer Vision; Springer: Berlin/Heidelberg, Germany, 2023; pp. 58–67. [Google Scholar]
- La Salvia, M.; et al. Attention-based Skin Cancer Classification Through Hyperspectral Imaging. In 2022 25th Euromicro Conference on Digital System Design (DSD); IEEE: Piscataway, NJ, USA, 2022; pp. 871–876. [Google Scholar]
- He, X.; Wang, Y.; Zhao, S.; Chen, X. Co-attention fusion network for multimodal skin cancer diagnosis. Pattern Recognit. 2023, 133, 108990. [Google Scholar] [CrossRef]
- Li, P.; Han, T.; Ren, Y.; Xu, P.; Yu, H. Improved YOLOv4-tiny based on attention mechanism for skin detection. PeerJ Comput. Sci. 2023, 9, e1288. [Google Scholar] [CrossRef] [PubMed]
- Aggarwal, A.; Das, N.; Sreedevi, I. Attention-guided deep convolutional neural networks for skin cancer classification. In 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA); IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
- Ravi, V. Attention cost-sensitive deep learning-based approach for skin cancer detection and classification. Cancers 2022, 14, 5872. [Google Scholar] [CrossRef] [PubMed]
- Nehvi, A.; Dar, R.; Assad, A. Visual Recognition of Local Kashmiri Objects with Limited Image Data using Transfer Learning. In 2021 International Conference on Emerging Techniques in Computational Intelligence (ICETCI); IEEE: Piscataway, NJ, USA, 2021; pp. 49–52. [Google Scholar]
- Liu, X.; Yu, W.; Liang, F.; Griffith, D.; Golmie, N. Toward deep transfer learning in industrial internet of things. IEEE Internet Things J. 2021, 8, 12163–12175. [Google Scholar] [CrossRef]
- Antonio, E.; Rael, C.; Buenavides, E. Changing Input Shape Dimension Using VGG16 Network Model. In 2021 IEEE International Conference on Automatic Control & Intelligent Systems (I2CACIS); IEEE: Piscataway, NJ, USA, 2021; pp. 36–40. [Google Scholar]
- Roy, S.; Kumar, S.S. Feature Construction Through Inductive Transfer Learning in Computer Vision. In Cybernetics, Cognition and Machine Learning Applications: Proceedings of ICCCMLA 2020; Springer: Berlin/Heidelberg, Germany, 2021; pp. 95–107. [Google Scholar]
- Maschler, B.; Braun, D.; Jazdi, N.; Weyrich, M. Transfer learning as an enabler of the intelligent digital twin. Procedia CIRP 2021, 100, 127–132. [Google Scholar] [CrossRef]
- Brodzicki, A.; Piekarski, M.; Kucharski, D.; Jaworek-Korjakowska, J.; Gorgon, M. Transfer learning methods as a new approach in computer vision tasks with small datasets. Found. Comput. Decis. Sci. 2020, 45, 179–193. [Google Scholar] [CrossRef]
- Liang, A. Effectiveness of Transfer Learning, Convolutional Neural Network and Standard Machine Learning in Computer Vision Assisted Bee Health Assessment. In 2022 International Communication Engineering and Cloud Computing Conference (CECCC); IEEE: Piscataway, NJ, USA, 2022; pp. 7–11. [Google Scholar]
- Chen, B.; Zhao, T.; Liu, J.; Lin, L. Multipath feature recalibration DenseNet for image classification. Int. J. Mach. Learn. Cybern. 2021, 12, 651–660. [Google Scholar] [CrossRef]
- Soniya, S. Paul, and L. Singh. Sparsely Connected DenseNet for Malaria Parasite Detection. In Advances in Systems Engineering: Select Proceedings of NSC 2019; Springer: Berlin/Heidelberg, Germany, 2021; pp. 801–807. [Google Scholar]
- Long, M.; Long, S.; Peng, F.; Hu, X. Identifying natural images and computer-generated graphics based on convolutional neural network. Int. J. Auton. Adapt. Commun. Syst. 2021, 14, 151–162. [Google Scholar] [CrossRef]
- Howard, A.G.; et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015. Available online: http://arxiv.org/abs/1409.1556 (accessed on 12 September 2023).
- Sharma, S.; Kumar, S. The Xception model: A potential feature extractor in breast cancer histology images classification. ICT Express 2022, 8, 101–108. [Google Scholar] [CrossRef]
- Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 2018, 5, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Codella, N.C.; et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018); IEEE: Piscataway, NJ, USA, 2018; pp. 168–172. [Google Scholar]
| 1 | |
| 2 | |
| 3 | Alexander Scarlat (2019) ‘Dermoscopic pigmented skin lesions from HAM10k’ Available at:https://www.kaggle.com/datasets/drscarlat/melanoma
|
| 4 | This highlighting policy is applied also for the experiments associated with ISIC2017 dataset and Melanoma dataset. |
| 5 | It is worth mentioning that all the performance comparisons in this study are compared to the performance of base models. |














| ID | Research | Transfer Learning Model | Dataset | Achievements |
|---|---|---|---|---|
| 1 | Wang et al. [12] | VGG model | ISIC 2019 | Achieved a high accuracy of 0.9067 and an AU ROC over 0.93 |
| 2 | Sathish et al. [13] | VGG-16 and AlexNet | Skin cancer dataset | The metrics showed that AlexNet performed better for cancer prediction |
| 3 | Hemalatha et al. [14] | Inception V3 | NA | Increase accuracy to 84% |
| 4 | Barman et al. [15] | GoogLeNet, Xception, InceptionResNetV2, and DenseNet | ISIC dataset | Increase the training accuracy to 91.16% and the testing accuracy to 89.93%. |
| 5 | Rashid et al. [16] | MobileNetV2 | Melanoma dataset | Achieved better accuracy compared to other deep learning techniques |
| 6 | Khandizod et al. [17] | MobileNet | Skin lesion images | Obtained high accuracy in diagnosing skin cancer |
| 7 | Agrahari et al. [18] | MobileNet | Skin cancer dataset | high performance comparable to that of a dermatology expert |
| 8 | Bansal et al. [19] | MobileNet | HAM10000 | Increase accuracy to 91% |
| 9 | Hum et al. [20] | MobileNetV2 | Skin lesion dataset | Achieved an evaluation accuracy of 93.9% |
| 10 | Khan et al. [21] | DenseNet169 | HAM10000 | Achieving a performance of 78.56% |
| 11 | Panthakkan et al. [22] | Xception and ResNet50 | NA | Achieving a prediction accuracy of 97.8% |
| 12 | Mehmood et al. [23] | SBXception | NA | Achieving a reduction of 54.27% in training parameters |
| 13 | Shaaban et al. [24] | Xception | Skin cancer dataset | Achieved an accuracy of 96.66% |
| 14 | Ashim et al. [3] | DenseNet and Xception | Skin cancer dataset | DenseNet achieved an accuracy of 81.94%, and Xception achieved an accuracy of 78.41% |
| ID | Research | Attention Mechanism | Dataset | Achievements |
|---|---|---|---|---|
| 1 | Song et al. [10] |
squeeze-and-excitation attention | Skin moles dataset | Obtained similar performance using fewer parameters |
| 2 | La Salvia et al. [25] |
Vision Transformers | 76 skin cancer images | outperformed the state-of-the-art in terms of false negative |
| 3 | He et al., [26] |
co-attention fusion network | Dataset comprising a seven-point checklist | achieved a mean accuracy of 76.8% |
| 4 | Datta et al. [11] |
Soft-Attention | HAM10000 and ISIC-2017 | precision rate of 93.7% on the HAM10000 dataset and a sensitivity score of 91.6% on the ISIC-2017 dataset |
| 5 | Li et al. [27]. |
CBAM attention | Skin cancer dataset | Achieved a good balance between model size and detection accuracy |
| 6 | Aggarwal et al. [28] |
guided attention | Skin cancer dataset | Enhances the precision by around 12% |
| 7 | Ravi [29] |
cost-sensitive attention | Skin cancer dataset | classification accuracy of 99% for skin diseases |
| AKIEC | BCC | BKL | DF | MEL | NV | VASC | Total | |
|---|---|---|---|---|---|---|---|---|
| Training | 237 | 373 | 788 | 82 | 801 | 4825 | 104 | 7210 |
| Validation | 62 | 86 | 206 | 24 | 191 | 1212 | 22 | 1803 |
| Testing | 28 | 55 | 105 | 9 | 121 | 668 | 16 | 1002 |
| Total | 327 | 514 | 1099 | 115 | 1113 | 6705 | 142 | 10015 |
| Class | Melanoma | Nevus | Seborrheic Keratosis | Total |
|---|---|---|---|---|
| Training | 374 | 1372 | 254 | 2000 |
| Validation | 30 | 78 | 42 | 150 |
| Testing | 117 | 393 | 90 | 600 |
| Total | 521 | 1843 | 386 | 2750 |
| Class | Melanoma | Not Melanoma | Total |
|---|---|---|---|
| Training | 5341 | 5341 | 10682 |
| Validation | 1781 | 1781 | 3562 |
| Testing | 1781 | 1780 | 3561 |
| Total | 8903 | 8902 | 17805 |
| Hyper-Parameter | Value |
|---|---|
| Number of epochs | 20 |
| Loss function | Cross-entropy (Binary/Categorical) |
| Optimizer | Adam |
| Learning rate | 0.001 |
| Batch Size | 4 |
| DenseNet121 | Total Params | Trainable Params | Non-Trainable Params | Layer Params | Ranked Based on Layer Params |
|---|---|---|---|---|---|
| Base (No Attention) | 7563843 | 526339 | 7037504 | 0 | 0 |
| Channel Attention | 7827139 | 789635 | 7037504 | 263296 | 2 |
| Global Context Attention | 8613443 | 1575939 | 7037504 | 1049600 | 5 |
| Guided Attention | 7573060 | 535556 | 7037504 | 9217 | 6 |
| Nonlocal Attention | 10712643 | 3675139 | 7037504 | 3148800 | 4 |
| Positional Attention | 7564868 | 527364 | 7037504 | 1025 | 3 |
| Spatial Attention | 7563845 | 526341 | 7037504 | 2 | 1 |
| InceptionV3 | Total Params | Trainable Params | Non-Trainable Params | Layer Params | Ranked Based on Layer Params |
|---|---|---|---|---|---|
| Base (No Attention) | 22853411 | 1050627 | 21802784 | 0 | 0 |
| Channel Attention | 23904291 | 2101507 | 21802784 | 1050880 | 2 |
| Global Context Attention | 27049763 | 5246979 | 21802784 | 4196352 | 5 |
| Guided Attention | 22871844 | 1069060 | 21802784 | 18433 | 6 |
| Nonlocal Attention | 35442467 | 13639683 | 21802784 | 12589056 | 4 |
| Positional Attention | 22855460 | 1052676 | 21802784 | 2049 | 3 |
| Spatial Attention | 22853413 | 1050629 | 21802784 | 2 | 1 |
| MobileNet | Total Params | Trainable Params | Non-Trainable Params | Layer Params | Ranked Based on Layer Params |
|---|---|---|---|---|---|
| Base (No Attention) | 3755203 | 526339 | 3228864 | 0 | 0 |
| Channel Attention | 4018499 | 789635 | 3228864 | 263296 | 2 |
| Global Context Attention | 4804803 | 1575939 | 3228864 | 1049600 | 5 |
| Guided Attention | 3764420 | 535556 | 3228864 | 9217 | 6 |
| Nonlocal Attention | 6904003 | 3675139 | 3228864 | 3148800 | 4 |
| Positional Attention | 3756228 | 527364 | 3228864 | 1025 | 3 |
| Spatial Attention | 3755205 | 526341 | 3228864 | 2 | 1 |
| VGG16 | Total Params | Trainable Params | Non-Trainable Params | Layer Params | Ranked Based on Layer Params |
|---|---|---|---|---|---|
| Base (No Attention) | 14978883 | 264195 | 14714688 | 0 | 0 |
| Channel Attention | 15044995 | 330307 | 14714688 | 66112 | 2 |
| Global Context Attention | 15241539 | 526851 | 14714688 | 262656 | 5 |
| Guided Attention | 14983492 | 268804 | 14714688 | 4609 | 6 |
| Nonlocal Attention | 15766851 | 1052163 | 14714688 | 787968 | 4 |
| Positional Attention | 14979396 | 264708 | 14714688 | 513 | 3 |
| Spatial Attention | 14978885 | 264197 | 14714688 | 2 | 1 |
| Xception | Total Params | Trainable Params | Non-Trainable Params | Layer Params | Ranked Based on Layer Params |
|---|---|---|---|---|---|
| Base (No Attention) | 21912107 | 1050627 | 20861480 | 0 | 0 |
| Channel Attention | 22962987 | 2101507 | 20861480 | 1050880 | 2 |
| Global Context Attention | 26108459 | 5246979 | 20861480 | 4196352 | 5 |
| Guided Attention | 21930540 | 1069060 | 20861480 | 18433 | 6 |
| Nonlocal Attention | 34501163 | 13639683 | 20861480 | 12589056 | 4 |
| Positional Attention | 21914156 | 1052676 | 20861480 | 2049 | 3 |
| Spatial Attention | 21912109 | 1050629 | 20861480 | 2 | 1 |
| Accuracy | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 91.87 | 91.87 | 93.50 | 91.25 | 93.16 |
| Channel Attention | 92.76 | 92.64 | 93.70 | 91.70 | 92.64 |
| Global Context Attention | 92.81 | 92.07 | 93.19 | 91.93 | 93.44 |
| Guided Attention | 92.02 | 91.25 | 91.87 | 91.79 | 91.87 |
| Nonlocal Attention | 92.13 | 90.73 | 92.22 | 92.22 | 91.93 |
| Positional Attention | 93.33 | 91.50 | 92.96 | 91.90 | 92.99 |
| Spatial Attention | 92.56 | 92.53 | 93.19 | 91.65 | 93.16 |
| Weighted Precision | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 66.44 | 67.00 | 74.97 | 67.94 | 74.55 |
| Channel Attention | 74.27 | 71.35 | 76.26 | 66.68 | 70.83 |
| Global Context Attention | 74.51 | 67.53 | 74.40 | 68.78 | 75.47 |
| Guided Attention | 67.73 | 64.51 | 68.48 | 69.33 | 69.47 |
| Nonlocal Attention | 67.31 | 55.55 | 68.87 | 70.11 | 70.15 |
| Positional Attention | 75.25 | 65.07 | 74.15 | 69.59 | 75.55 |
| Spatial Attention | 72.67 | 72.87 | 75.34 | 65.94 | 74.76 |
| Weighted Recall | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 71.56 | 71.56 | 77.25 | 69.36 | 76.05 |
| Channel Attention | 74.65 | 74.25 | 77.94 | 70.96 | 74.25 |
| Global Context Attention | 74.85 | 72.26 | 76.15 | 71.76 | 77.05 |
| Guided Attention | 72.06 | 69.36 | 71.56 | 71.26 | 71.56 |
| Nonlocal Attention | 72.46 | 67.56 | 72.75 | 72.75 | 71.76 |
| Positional Attention | 76.65 | 70.26 | 75.35 | 71.66 | 75.45 |
| Spatial Attention | 73.95 | 73.85 | 76.15 | 70.76 | 76.05 |
| Weighted F1-Score | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 65.41 | 67.66 | 74.65 | 62.35 | 74.56 |
| Channel Attention | 73.63 | 70.92 | 76.43 | 66.51 | 70.97 |
| Global Context Attention | 73.58 | 66.78 | 72.38 | 70.00 | 75.37 |
| Guided Attention | 68.16 | 65.33 | 67.02 | 69.03 | 67.38 |
| Nonlocal Attention | 69.08 | 57.93 | 68.63 | 69.37 | 69.99 |
| Positional Attention | 75.57 | 66.68 | 73.60 | 67.40 | 74.34 |
| Spatial Attention | 72.51 | 71.52 | 75.14 | 66.16 | 74.03 |
| Accuracy | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 77.48 | 78.04 | 80.57 | 75.61 | 80.91 |
| Channel Attention | 77.15 | 77.37 | 78.92 | 78.81 | 79.80 |
| Global Context Attention | 75.39 | 77.04 | 78.81 | 75.72 | 78.81 |
| Guided Attention | 73.73 | 74.94 | 77.04 | 77.92 | 74.83 |
| Nonlocal Attention | 74.39 | 73.40 | 76.82 | 78.37 | 78.81 |
| Positional Attention | 77.81 | 77.04 | 78.92 | 76.93 | 80.57 |
| Spatial Attention | 72.63 | 80.13 | 79.91 | 76.49 | 75.06 |
| Weighted Precision | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 66.52 | 56.01 | 71.65 | 57.11 | 69.88 |
| Channel Attention | 64.64 | 65.50 | 68.90 | 60.96 | 70.77 |
| Global Context Attention | 64.38 | 63.64 | 67.61 | 61.08 | 70.99 |
| Guided Attention | 52.63 | 51.53 | 42.98 | 52.40 | 62.32 |
| Nonlocal Attention | 62.00 | 64.38 | 70.24 | 65.34 | 67.76 |
| Positional Attention | 74.90 | 67.23 | 68.27 | 58.06 | 69.09 |
| Spatial Attention | 67.60 | 68.84 | 70.35 | 60.94 | 68.39 |
| Weighted Recall | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 66.23 | 67.05 | 70.86 | 63.41 | 71.36 |
| Channel Attention | 65.73 | 66.06 | 68.38 | 68.21 | 69.70 |
| Global Context Attention | 63.08 | 65.56 | 68.21 | 63.58 | 68.21 |
| Guided Attention | 60.60 | 62.42 | 65.56 | 66.89 | 62.25 |
| Nonlocal Attention | 61.59 | 60.10 | 65.23 | 67.55 | 68.21 |
| Positional Attention | 66.72 | 65.56 | 68.38 | 65.40 | 70.86 |
| Spatial Attention | 58.94 | 70.20 | 69.87 | 64.74 | 62.58 |
| Weighted F1 | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 60.80 | 60.83 | 67.63 | 58.11 | 69.53 |
| Channel Attention | 61.26 | 65.66 | 68.11 | 60.95 | 68.19 |
| Global Context Attention | 62.08 | 64.41 | 67.10 | 59.62 | 65.05 |
| Guided Attention | 55.93 | 56.40 | 51.93 | 55.83 | 59.06 |
| Nonlocal Attention | 60.18 | 60.47 | 66.24 | 65.86 | 61.88 |
| Positional Attention | 60.91 | 62.24 | 67.51 | 56.45 | 66.76 |
| Spatial Attention | 61.06 | 68.73 | 67.90 | 59.85 | 62.45 |
| Accuracy | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 93.37 | 92.92 | 93.57 | 84.02 | 93.23 |
| Channel Attention | 93.43 | 93.18 | 93.29 | 85.82 | 93.37 |
| Global Context Attention | 93.68 | 92.84 | 93.15 | 85.17 | 93.79 |
| Guided Attention | 93.01 | 92.73 | 93.18 | 86.44 | 93.34 |
| Nonlocal Attention | 93.37 | 92.90 | 93.37 | 85.40 | 93.04 |
| Positional Attention | 93.32 | 92.87 | 93.20 | 85.40 | 93.23 |
| Spatial Attention | 93.54 | 93.26 | 93.77 | 84.84 | 93.63 |
| Precision | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 88.76 | 88.51 | 89.55 | 86.59 | 88.26 |
| Channel Attention | 89.12 | 88.37 | 89.53 | 85.48 | 88.76 |
| Global Context Attention | 89.09 | 88.07 | 91.42 | 85.09 | 89.83 |
| Guided Attention | 87.88 | 87.97 | 88.21 | 85.85 | 88.67 |
| Nonlocal Attention | 89.07 | 88.46 | 89.15 | 82.14 | 88.49 |
| Positional Attention | 89.18 | 88.07 | 88.07 | 84.16 | 88.26 |
| Spatial Attention | 89.58 | 89.41 | 89.42 | 86.26 | 89.72 |
| Weighted Recall | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 99.33 | 98.65 | 98.65 | 80.51 | 99.72 |
| Channel Attention | 98.93 | 99.44 | 98.03 | 86.29 | 99.33 |
| Global Context Attention | 99.55 | 99.10 | 95.22 | 85.28 | 98.76 |
| Guided Attention | 99.78 | 98.99 | 99.66 | 87.25 | 99.38 |
| Nonlocal Attention | 98.88 | 98.65 | 98.76 | 90.45 | 98.93 |
| Positional Attention | 98.60 | 99.16 | 99.94 | 87.19 | 99.72 |
| Spatial Attention | 98.54 | 98.15 | 99.27 | 82.87 | 98.54 |
| Weighted F1-Score | DenseNet121 | InceptionV3 | MobileNet | VGG16 | Xception |
|---|---|---|---|---|---|
| Base (No Attention) | 93.74 | 93.30 | 93.88 | 83.44 | 93.64 |
| Channel Attention | 93.77 | 93.58 | 93.59 | 85.88 | 93.74 |
| Global Context Attention | 94.03 | 93.26 | 93.29 | 85.19 | 94.09 |
| Guided Attention | 93.45 | 93.15 | 93.59 | 86.54 | 93.72 |
| Nonlocal Attention | 93.72 | 93.28 | 93.71 | 86.10 | 93.42 |
| Positional Attention | 93.65 | 93.29 | 93.63 | 85.65 | 93.64 |
| Spatial Attention | 93.85 | 93.57 | 94.09 | 84.53 | 93.92 |
| Model | Total Params | Trainable Params | Rank of Total Params | Rank of Trainable Params | Number of “Increase Performance” in 18 Experiments | Percentage of “Increase Performance” (%) | Rank of Percentage of “Increase Performance” |
|---|---|---|---|---|---|---|---|
| DenseNet121 | 7563843 | 526339 | 2 | 3 | 13 | 72 | 2 |
| InceptionV3 | 22853411 | 1050627 | 5 | 5 | 8 | 44 | 3 |
| MobileNet | 3755203 | 526339 | 1 | 2 | 5 | 28 | 5 |
| VGG16 | 14978883 | 264195 | 3 | 1 | 16 | 89 | 1 |
| Xception | 21912107 | 1050627 | 4 | 4 | 6 | 33 | 4 |
| Attention Mechanism | Number of “Increase Performance” in 15 Experiments | Percentage of Increase Performance (%) | The Rank of Percentage of Increase Performance | Layer Params - VGG16 | Layer Params - DenseNet121 and MobileNet | Layer Params - InceptionV3 and Xception |
|---|---|---|---|---|---|---|
| Channel Attention | 12 | 80 | 2 | 66112 | 263296 | 1050880 |
| Global Context Attention | 9 | 60 | 3 | 262656 | 1049600 | 4196352 |
| Guided Attention | 4 | 27 | 5 | 4609 | 9217 | 18433 |
| Nonlocal Attention | 4 | 27 | 5 | 787968 | 3148800 | 12589056 |
| Positional Attention | 6 | 40 | 4 | 513 | 1025 | 2049 |
| Spatial Attention | 13 | 87 | 1 | 2 | 2 | 2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).