Submitted:
21 July 2025
Posted:
23 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Dyed-lifted-polyps: Images showing polyps that have been stained and elevated using submucosal injection, aiding in visual contrast and resection planning.
- Dyed-resection-margins: Post-polypectomy images highlighting the margins of resected areas, stained to assess completeness of removal.
- Esophagitis: Inflammatory lesions of the esophageal mucosa, often appearing as erythematous streaks or erosions near the Z-line.
- Polyps: Unstained mucosal protrusions, typically benign growths that may serve as precursors to colorectal cancer.
- Ulcerative-colitis: Chronic inflammatory changes in the colon, characterized by mucosal ulceration, granularity, and vascular pattern loss.
- Normal-cecum: Anatomical landmark at the beginning of the large intestine, often used to confirm complete colonoscopy.
- Normal-pylorus: The muscular opening between the stomach and duodenum, appearing as a round, symmetric structure in healthy individuals.
- Normal-z-line: The gastroesophageal junction, where the squamous epithelium of the esophagus transitions to the columnar epithelium of the stomach.
- Benchmarking of CNN Architectures: A set of CNN models, including classical architectures like AlexNet and VGGNet, and advanced designs such as GoogLeNet (Inception), ResNet, DenseNet, and CapsNet were tested on the Ksavir dataset [27], composed of real-world endoscopic images, to assess their effectiveness in gastrointestinal lesion categorization.
- Optimizations: DenseNet121 and ResNet50 were fine-tuned using Transfer learning and dynamic class weighting, while CapsNet was improved with attention mechanisms to improve feature localization and reduced overfitting, especially in classes with limited samples. Thus, choosing ResNet50 represents a contribution of this study, guided by both optimization parameter tuning and empirical performance metrics, including validation accuracy and loss behavior across multiple folds.
- Biomimetic Model Selection: The study introduces a biomimetic framework for selecting CNN architectures, inspired by the hierarchical and layered processing of the human visual cortex [14]. This approach guided the prioritisation of models that emulate biological feature abstraction, such as residual and capsule-based networks.
- Explainability Integration: To improve performance on sparse and imbalanced medical datasets, this study combines Explainable Artificial Intelligence (XAI) techniques, namely Grad-CAM, LIME, and SHAP with Transfer learning from large-scale datasets such as ImageNet. The novelty lies in the task-specific adaptation of XAI methods to guide model and error analysis, enabling iterative feedback during training.
- Benchmarking: The comparative analysis between ResNet50, DenseNet121, and MobileNetV2 offered the best trade-off between accuracy, inference speed, and generalisation. In contrast, deeper models such as NASNetLarge and EfficientNetB8 showed signs of overfitting and slower inference.
2. Materials and Methods
2.1. Dataset and Methodology
2.2. Pre-Trained Models in endoscopic imaging
2.3. Pre-trained Architectures and Transfer Learning
2.4. Inception Architecture
- InceptionV1 (GoogLeNet): Introduced parallel convolutions of varying sizes (1×1, 3×3, 5×5) and auxiliary classifiers to improve gradient flow.
- InceptionV2: Replaced expensive 5×5 convolutions with stacked 3×3 layers, introduced batch normalization, and improved computational efficiency.
- InceptionV3: Built upon V2 with additional optimisations such as factorised 7×7 convolutions, label smoothing, and RMSprop optimisation. It also included deeper modules and more efficient grid size reduction strategies, resulting in improved accuracy and reduced training cost [33].
2.5. Preprocessing Techniques
3. Experimental Results and Performance Analysis
3.1. Experimental Environment
3.2. Model Training-Validation-Testing
3.3. Evaluation Methodology: Accuracy, Confusion Matrix and Inference Time
- Accuracy provides a general overview of correct classifications but can be misleading when classes are imbalanced. For example, if one polyp type is overrepresented, a high accuracy may mask poor performance on rare classes.
- Confusion Matrix visualises misclassifications across all classes. It reveals patterns such as false positives or confusion between visually similar polyp types, guiding further refinement or reannotation.
- Precision reflects the proportion of true positives among predicted positives for each class. In clinical settings, high precision is crucial to minimise false diagnoses.
- Sensitivity (Recall) indicates the proportion of true positives detected among all actual instances of a class. This is especially important in medicine, where failing to detect a pathology (false negative) can be more dangerous than over-detection.
- F1-score balances precision and recall. It is particularly valuable when the dataset is unbalanced or when both false positives and false negatives carry clinical risk.

4. Discussion
4.1. Improvements
- Customised Data Augmentation: While augmentation is a standard technique, this study applies a task-specific augmentation pipeline, including rotation, scaling, flipping, and contrast adjustment—optimised for gastrointestinal polyp morphology. This approach reduces overfitting.
- Dynamic Class: A novel weighting scheme was implemented based on real-time class distribution during training, rather than static frequency-based weights. This improves learning stability across imbalanced classes.
- Explainability: Grad-CAM was integrated as a feedback mechanism during model refinement. This dual use helped identify misclassified regions and guided architectural adjustments [51].
4.2. Error Analysis
- Confusions between visually similar classes: The most frequent misclassifications involved visually similar classes, such as polyps and inflamed mucosa, which shared overlapping color gradients, mucosal textures, and ill-defined boundaries.
- Class Imbalance Sensitivity: These include inter-class confusion (e.g., misidentifying hyperplastic polyps as adenomatous ones) and incorrect localisation or attention to irrelevant image regions. Precision and recall were lower for classes with fewer representative samples, such as uncommon polyp subtypes. This results from bias caused during training.
4.3. Limitations
- Dataset Size: The Ksavir dataset contains a relatively limited number of images for certain polyp subtypes, which can impair generalisation and lead to classifier bias.
- Overfitting in Complex Models: Architectures like EfficientNetB8, InceptionResNetV2 and NASNetLarge demonstrated high variance between training and validation metrics, indicating overfitting. These models were excluded from the final evaluation.
- Domain Limitation: The trained models performed well on Ksavir data, but may experience performance degradation when applied to endoscopic images from different institutions, due to lighting conditions, device variability, and annotation inconsistency.
- Hardware Constraints: Training and tuning were conducted on an RTX 3050 GPU (4GB), which limited the ability to perform extensive hyperparameter tuning or ensemble testing across large architectures.
4.4. Proposals for future work (integration with segmentation, multi-label classification).
5. Conclusions
6. Testing the ResNet50 Model Using a Graphical User Interface (GUI)

Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ichihara, M.; et al. Long noncoding RNA 01534 maintains cancer stemness by downregulating endoplasmic reticulum stress response in colorectal cancer. Annals of Gastroenterological Surgery 2023, 7, 458–470. [CrossRef]
- Chitca, D.D.; et al. Advancing Colorectal Cancer Diagnostics from Barium Enema to AI-Assisted Colonoscopy. Diagnostics 2025, 15. [CrossRef]
- Puzzo, M.; et al. Colorectal Cancer: Current and Future Therapeutic Approaches and Related Technologies Addressing Multidrug Strategies Against Multiple Level Resistance Mechanisms. International Journal of Molecular Sciences 2025, 26. [CrossRef]
- Waldum, H.; Fossmark, R. Gastritis, Gastric Polyps and Gastric Cancer. International Journal of Molecular Sciences 2021, 22, 6548. [CrossRef]
- Siegel RL, Giaquinto AN, J. Global Cancer Statistics. CA: A Cancer Journal for Clinicians 2024, 74(2):203. [CrossRef]
- Cincar, K.; Sima, I. Machine Learning algorithms approach for Gastrointestinal Polyps classification. In Proceedings of the International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2020, Novi Sad, Serbia, August 24-26, 2020. IEEE, 2020, pp. 1–6. [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpretable Machine Learning. Advances in Neural Information Processing Systems 2017, 30.
- Bishop, C.M. Pattern Recognition and Machine Learning; Information Science and Statistics, Springer: New York, NY, 2006.
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef]
- Dong, Y.; Li, J.; Wang, Z.; Jia, W. CoDC: Accurate Learning with Noisy Labels via Disagreement and Consistency. Biomimetics 2024, 9, 92. [CrossRef]
- Wang, S.; Chen, H.; Zhang, Y. Bionic Artificial Neural Networks in Medical Image Analysis. Biomimetics 2023, 8, 211. [CrossRef]
- Consortium, M. A map of neural signals and circuits traces the logic of brain computation. Nature 2025. [CrossRef]
- Mienye, I.D.; et al. Deep Convolutional Neural Networks in Medical Image Analysis: A Review. Information 2025, 16. [CrossRef]
- Kountchev, R.; Iantovics, B.; Kountcheva, R. Hierarchical Third-Order Tensor Decomposition through Inverse Difference Pyramid, Based on the 3D Walsh-Hadamard Transform with Applications in Data Mining. WIREs Data Mining and Knowledge Discovery 2020, 10. [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems 2012, 25. [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2015, pp. 1–14.
- Szegedy, C.; et al. Going Deeper with Convolutions. CVPR 2015, pp. 1–9. [CrossRef]
- He, K.; et al. Deep Residual Learning for Image Recognition. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, pp. 4700–4708.
- Gong, E.J.; Bang, C.S.; Lee, J.J. Edge Artificial Intelligence Device in Real-Time Endoscopy for Classification of Gastric Neoplasms: Development and Validation Study. Biomimetics 2024, 9. [CrossRef]
- Nazir, A.; et al. A deep learning-based novel hybrid CNN-LSTM architecture for efficient detection of threats in the IoT ecosystem. Ain Shams Engineering Journal 2024, 15. [CrossRef]
- Zhen, L.; Bărbulescu, A. Comparative Analysis of Convolutional Neural Network-Long Short-Term Memory, Sparrow Search Algorithm-Backpropagation Neural Network, and Particle Swarm Optimization-Extreme Learning Machine Models for the Water Discharge of the Buzău River, Romania. Water 2024, 16. [CrossRef]
- Jalan, A.; Mishra, D.; Marisha.; Gupta, M. Diagnosis of Schizophrenia Using Feature Extraction from EEG Signals Based on Markov Transition Fields and Deep Learning. Biomimetics 2025, 10, 449. [CrossRef]
- Iman, M.; Arabnia, H.R.; Rasheed, K. A Review of Deep Transfer Learning and Recent Advancements. Technologies 2023, 11, 40.
- Mallouk, O.; Joudar, N.E.; Ettaouil, M. A Selective Model for Transfer Learning in CNNs: Optimization of Fine-Tuning Layers. International Journal of Data Science and Analytics 2024.
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Proceedings of the IEEE International Conference on Computer Vision 2017, pp. 618–626.
- Pogorelov, K.; et al. Kvasir: A Multi-Class Image-Dataset for Computer Aided Gastrointestinal Disease Detection, 2017. [CrossRef]
- Muratarat, M. Kvasir Dataset v2 Classifier. https://github.com/mmuratarat/kvasir-v2-ViT-classifier, 2024. GitHub Repository.
- Demirbaş, A.A.; Üzen, H.; Fırat, H. Spatial-attention ConvMixer architecture for classification and detection of gastrointestinal diseases using the Kvasir dataset. Health Information Science and Systems 2024, 12, 32.
- Demirbaş, A.A.; Üzen, H.; Fırat, H. Automated classification of gastrointestinal diseases using deep learning. Medical & Biological Engineering & Computing 2024, 63, 293–320. [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4700–4708.
- Sandler, M.; et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510–4520.
- Szegedy, C.; et al. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1251–1258.
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
- Iandola, F.N.; et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv preprint arXiv:1602.07360, 2016.
- Howard, J.; Gugger, S. xResNet architectures within FastAI. https://docs.fast.ai/vision.models.xresnet.html, 2020.
- He, K.; et al. Deep Residual Learning for Image Recognition. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- Esmaeilzadeh, H.; Ghodrati, S.; Kahng, A.B. Performance Analysis of DNN Inference/Training with Convolution and Non-Convolution Operations. arXiv 2023.
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, pp. 4510–4520.
- Yang, X.; Zhang, W.; Li, J. Challenges in CNN-Based Medical Image Analysis and Future Directions. Journal of Medical AI Research 2024, 12, 45–62. [CrossRef]
- Zoph, B.; et al. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8697–8710.
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the Proceedings of the International Conference on Machine Learning (ICML), 2019, pp. 6105–6114.
- Liu, Z.; et al. A ConvNet for the 2020s. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022, pp. 11976–11986.
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv preprint 2017, [1602.07261].
- Huang, G.; et al. Densely Connected Convolutional Networks. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4700–4708.
- Kaur, G.; Saini, S. Comparative Analysis of RMSE and MAP Metrics for Evaluating CNN and LSTM Models. AIP Conference Proceedings 2024, 3121, 040003.
- Zhukov, A.; Benois-Pineau, J.; Giot, R. Reference-based and No-reference Metrics to Evaluate Explanation Methods of AI - CNNs in Image Classification Tasks. arXiv 2024.
- Nazir, Z.; Yarovenko, V.; Park, J.G. Interpretable ML Enhanced CNN Performance Analysis of cuBLAS, cuDNN, and TensorRT. ResearchGate 2023.
- Düntsch, I.; Gediga, G. Confusion Matrices and Rough Set Data Analysis. In Proceedings of the Proceedings of the 2019 International Conference on Pattern Recognition and Intelligent Systems (PRIS). arXiv, 2019. [CrossRef]
- Team, T.R. Multiclass Confusion Matrix for Object Detection. Edge AI and Vision Alliance 2023.
- Dosovitskiy, A.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929 2020.
- Yi, X.; Walia, E.; Babyn, P. Generative Adversarial Network in Medical Imaging: A Review. Medical Image Analysis 2019, 58, 101552. [CrossRef]




| Model | Train | Valid | Prec | Rec | F1 | Predicted Class |
|---|---|---|---|---|---|---|
| ResNet50 | 88% | 90% | 92% | 89% | 90.5% | dyed-lifted-polyps |
| DenseNet121 | 84% | 87% | 89% | 86% | 87.5% | dyed-resection-margins |
| MobileNetV2 | 86% | 87% | 88% | 85% | 86.5% | esophagitis |
| InceptionV3 | 83% | 85% | 86% | 83% | 84.5% | normal-cecum |
| FastAI xResNet18 | 64% | 70% | 74% | 68% | 71.0% | normal-pylorus |
| VGG16/VGG19 | 60% | 68% | 70% | 63% | 66.0% | normal-z-line |
| SqueezeNet | 57% | 63% | 68% | 59% | 63.0% | polyps |
| Xception | 82% | 85% | 85% | 84% | 84.5% | ulcerative-colitis |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).