Submitted:
09 June 2026
Posted:
09 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Literature
2.1. Machine Learning Approaches
2.2. Deep Learning Approaches
3. Methodology
3.1. Dataset Preparation
3.2. Data Augmentation
- Geometric: horizontal flip, small rotations, translations, and mild scaling to simulate slide handling variability.
- Photometric: limited brightness/contrast jitter and color perturbations to reflect staining variability.
- Regularization: light Gaussian blur/noise where appropriate. Each transformation used clinically conservative ranges and per-operation probabilities to ensure that augmented images remained biologically plausible. All augmentations were applied on the fly during training (virtual augmentation): each image was transformed stochastically at load time rather than expanded into a fixed enlarged dataset on disk, so the nominal dataset size of 917 images was unchanged while the effective diversity seen across epochs increased. Augmentations were applied only to training data.
3.3. Class Weighting
3.4. ViT Model Training
3.5. Grad-CAM Application
3.6. Evaluation Protocol
4. Results and Discussion
4.1. Dataset Preparation Results
4.2. Data Augmentation Results
4.3. Class Weighting Results
4.4. ViT Model Training Results
4.5. Grad-CAM Interpretability Results
4.6. Comparison with CNN Baselines and Computational Efficiency
4.7. Comparison with Prior Work
5. Conclusions
References
- Sung, H.; et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA. Cancer J. Clin 2021, vol. 71(no. 3), 209–249. [Google Scholar] [CrossRef]
- Stoler, M. H.; Schiffman, M. and for the Atypical Squamous Cells of Undetermined Significance–Low-grade Squamous Intraepithelial Lesion Triage Study (ALTS) Group, Interobserver Reproducibility of Cervical Cytologic and Histologic InterpretationsRealistic Estimates From the ASCUS-LSIL Triage Study. JAMA 2001, vol. 285(no. 11), 1500–1505. [Google Scholar] [CrossRef]
- Koonmee, S.; et al. False-Negative Rate of Papanicolaou Testing: A National Survey from the Thai Society of Cytology. Acta Cytol. 2017, vol. 61(no. 6), 434–440. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T. M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, vol. 6(no. 1), 60. [Google Scholar] [CrossRef]
- Greenspan, H.; Van Ginneken, B.; Summers, R. M. Guest Editorial Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique. IEEE Trans. Med. Imaging 2016, vol. 35(no. 5), 1153–1159. [Google Scholar] [CrossRef]
- Litjens, G.; et al. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, vol. 42, 60–88. [Google Scholar] [CrossRef] [PubMed]
- da Silva, E. L. P. Combining machine learning and deep learning approaches to detect cervical cancer in cytology images. 2021. [Google Scholar]
- Tan, X.; et al. Automatic model for cervical cancer screening based on convolutional neural network: a retrospective, multicohort, multicenter study. Cancer Cell Int. 2021, vol. 21(no. 1), 35. [Google Scholar] [CrossRef] [PubMed]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. presented at the International conference on machine learning, PMLR, 2017; pp. 3319–3328. [Google Scholar]
- Samek, W.; Wiegand, T.; Müller, K.-R. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. ArXiv Prepr. 2017, ArXiv17080829. [Google Scholar]
- Al-Zoubi, H.; Al-Bzoor, N. Toward driverless AI: Automating leukemia detection and classification using hyperautomation, a case study. 2022. [Google Scholar] [CrossRef] [PubMed]
- Albzour, N. “Do Hybrid Deep Learning–Gradient Boosting Ensembles Generalize? A Cross-Cohort Evaluation of Breast Cancer Survival Prediction Using SEER and METABRIC,” SSRN preprint. 2026. Available online: https://ssrn.com/abstract=6810699.
- Dosovitskiy, A.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv Prepr. 2020, ArXiv201011929. [Google Scholar]
- Vaswani, et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, vol. 30. [Google Scholar]
- Han, K.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, vol. 45(no. 1), 87–110. [Google Scholar] [CrossRef]
- Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In presented at the Proceedings of the IEEE international conference on computer vision, 2017; pp. 618–626. [Google Scholar]
- Yuan, F.; Zhang, Z.; Fang, Z. An effective CNN and Transformer complementary network for medical image segmentation. Pattern Recognit. 2023, vol. 136, 109228. [Google Scholar] [CrossRef]
- Guo, X.; Lin, X.; Yang, X.; Yu, L.; Cheng, K.-T.; Yan, Z. UCTNet: Uncertainty-guided CNN-Transformer hybrid networks for medical image segmentation. Pattern Recognit. 2024, vol. 152, 110491. [Google Scholar] [CrossRef]
- Magaraja, A.D.; et al. A hybrid linear iterative clustering and Bayes classification-based GrabCut segmentation scheme for dynamic detection of cervical cancer. Appl. Sci. 2022, vol. 12(no. 20), 10522. [Google Scholar] [CrossRef]
- Mehmood, M.; Rizwan, M.; Gregus Ml, M.; Abbas, S. “Machine Learning Assisted Cervical Cancer Detection,” Front. Public Health 2021, vol. 9, 788376. [Google Scholar] [CrossRef]
- Mariarputham, E. J.; Stephen, A. Nominated Texture Based Cervical Cancer Classification. Comput. Math. Methods Med. 2015, vol. 2015, 1–10. [Google Scholar] [CrossRef]
- Iliyasu, A.M.; Fatichah, C. A quantum hybrid PSO combined with fuzzy k-NN approach to feature selection and cell classification in cervical cancer detection. Sensors 2017, vol. 17(no. 12), 2935. [Google Scholar] [CrossRef]
- Pranuthi, Tenali. Predicting Cervical Cancer Cases Resulting in Biopsies Using Machine Learning Techniques. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2021, 28–37. [Google Scholar] [CrossRef]
- Prusty, S.; Patnaik, S.; Dash, S. K. SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. Front. Nanotechnol. 2022, vol. 4, 972421. [Google Scholar] [CrossRef]
- HassanMbaga, A.; ZhiJun, P. Pap Smear Images Classification for Early Detection of Cervical Cancer. Int. J. Comput. Appl. 2015, vol. 118(no. 7), 10–16. [Google Scholar] [CrossRef]
- Al-Batah, M. S.; Alzyoud, M.; Alazaidah, R.; Toubat, M.; Alzoubi, H.; Olaiyat, A. Early prediction of cervical cancer using machine learning techniques. Jordanian J. Comput. Inf. Technol. 2022, vol. 8(no. 4). [Google Scholar] [CrossRef]
- Nandanwar, P. D.; Dhonde, S. B. A Novel Approach to Cervical Cancer Detection Using Hybrid Stacked Ensemble Models and Feature Selection. Int. J. Electr. Electron. Res. 2023, vol. 11(no. 2), 582–589. [Google Scholar] [CrossRef]
- K., D. Cervical Cancer Classification. Int. J. Emerg. Trends Eng. Res. 2020, vol. 8(no. 3), 804–807. [Google Scholar] [CrossRef]
- Dash, S.; Sethy, P. K.; Behera, S. K. Cervical Transformation Zone Segmentation and Classification based on Improved Inception-ResNet-V2 Using Colposcopy Images. Cancer Inform. 2023, vol. 22. [Google Scholar] [CrossRef]
- Battula, K. Prasad; Chandana, B. Sai. Multi-class Cervical Cancer Classification using Transfer Learning-based Optimized SE-ResNet152 model in Pap Smear Whole Slide Images. Int. J. Electr. Comput. Eng. Syst. 2023, vol. 14(no. 6), 623–623. [Google Scholar] [CrossRef]
- Alsubai, S.; et al. Privacy Preserved Cervical Cancer Detection Using Convolutional Neural Networks Applied to Pap Smear Images. Comput. Math. Methods Med. 2023, vol. 2023(no. 1). [Google Scholar] [CrossRef]
- Huang, P.; Tan, X.; Chen, C.; Lv, X.; Li, Y. AF-SENet: Classification of Cancer in Cervical Tissue Pathological Images Based on Fusing Deep Convolution Features. Sensors 2020, vol. 21(no. 1), 122. [Google Scholar] [CrossRef] [PubMed]
- Kurnianingsih, et al. Segmentation and Classification of Cervical Cells Using Deep Learning. IEEE Access 2019, vol. 7, 116925–116941. [Google Scholar] [CrossRef]
- Suguna, S. P.; Balamurugan. Multi-Class Segmentation with Deep Learning based Pap Smear Image Analysis for Cervical Cancer Detection and Classification Model. Tuijin JishuJournal Propuls. Technol. 2023, vol. 44(no. 3), 4475–4487. [Google Scholar] [CrossRef]
- Albzour, N.; Lam, S. S. Segmentation and Classification of Pap Smear Images for Cervical Cancer Detection Using Deep Learning. arXiv 2025, arXiv:2508.17728. [Google Scholar] [CrossRef]
- Chowdary, G. J.; S. G, P. M.; Yogarajah, P. Nucleus segmentation and classification using residual SE-UNet and feature concatenation approach incervical cytopathology cell images. Technol. Cancer Res. Treat. 2023, vol. 22. [Google Scholar] [CrossRef]
- Park, J.; Yang, H.; Roh, H.-J.; Jung, W.; Jang, G.-J. Encoder-Weighted W-Net for Unsupervised Segmentation of Cervix Region in Colposcopy Images. Cancers 2022, vol. 14(no. 14), 3400. [Google Scholar] [CrossRef]
- Shinde, S.; Kalbhor, M.; Wajire, P. DeepCyto: a hybrid framework for cervical cancer classification by using deep feature fusion of cytology images. Math. Biosci. Eng. 2022, vol. 19(no. 7), 6415–6434. [Google Scholar] [CrossRef] [PubMed]
- Kalbhor, M.; Shinde, S.; Popescu, D. E.; Hemanth, D. J. Hybridization of Deep Learning Pre-Trained Models with Machine Learning Classifiers and Fuzzy Min–Max Neural Network for Cervical Cancer Diagnosis. Diagnostics 2023, vol. 13(no. 7), 1363. [Google Scholar] [CrossRef] [PubMed]
- Jiménez Gaona, Y.; et al. Radiomics Diagnostic Tool Based on Deep Learning for Colposcopy Image Classification. Diagnostics 2022, vol. 12(no. 7), 1694. [Google Scholar] [CrossRef]
- Nirmala, G.; Nayudu, P. P.; Kumar, A. R.; Sagar, R. Automatic cervical cancer classification using adaptive vision transformer encoder with CNN for medical application. Pattern Recognit. 2025, vol. 160, 111201. [Google Scholar] [CrossRef]
- Şahin, E.; Özdemir, D.; Temurtaş, H. Multi-objective optimization of ViT architecture for efficient brain tumor classification. Biomed. Signal Process. Control 2024, vol. 91, 105938. [Google Scholar] [CrossRef]
- Liu, X.; Hu, Y.; Chen, J. Hybrid CNN-Transformer model for medical image segmentation with pyramid convolution and multi-layer perceptron. Biomed. Signal Process. Control 2023, vol. 86, 105331. [Google Scholar] [CrossRef]
- Albzour, N.; Lam, S. S. “A reproducible benchmark of ViT-Tiny against CNN baselines for cervical cell classification: Accuracy, statistical validation, and deployment efficiency,” SSRN preprint, 2025. Available online: https://ssrn.com/abstract=6839541.
- Jantzen, J.; Norup, J.; Dounias, G.; Bjerregaard, B. Pap-smear benchmark data for pattern classification. Proc. Nature Inspired Smart Information Systems (NiSIS), Albufeira, Portugal, 2005; pp. 1–9. [Google Scholar]
- Albzour, N.; Agarwal, S.; Althnaibat, H.; Lu, S. S. Predicting post-stroke activities of daily living: Enhancing Machine Learning with Feature Selection. IISE Annual Conference Proceedings, 2025; pp. 1–6. [Google Scholar]
- Marinakis, Y.; Dounias, G.; Jantzen, J. Pap smear diagnosis using a hybrid intelligent scheme focusing on genetic algorithm based feature selection and nearest neighbor classification. Comput. Biol. Med. 2009, vol. 39(no. 1), 69–78. [Google Scholar] [CrossRef]
- Yilmaz, E.; Kantar, M. Comparison of deep learning and traditional machine learning techniques for classification of normal and abnormal cervical cells. arXiv 2020, arXiv:2009.06366. [Google Scholar] [CrossRef]
- Ghoneim, A.; Muhammad, G.; Hossain, M. S. Cervical cancer classification using convolutional neural networks and extreme learning machines. Future Gener. Comput. Syst. 2020, vol. 102, 643–649. [Google Scholar] [CrossRef]
- Deo, B.S.; Pal, M.; Panigrahi, P. K.; Pradhan, A. CerviFormer: A Pap-smear-based cervical cancer classification method using cross-attention and latent transformer. arXiv 2023, arXiv:2303.10222. [Google Scholar] [CrossRef]
- Pirovano; Almeida, L. G.; Ladjal, S. Regression Constraint for an Explainable Cervical Cancer Classifier. arXiv 2019, arXiv:1908.02650. [Google Scholar] [CrossRef]
- Kaur, H.; Sharma, R.; Kaur, J. Comparison of deep transfer learning models for classification of cervical cancer from pap smear images. Sci. Rep. vol. 15(no. 1), 3945, 2025. [CrossRef]






| Augmentation Strategy | Precision (%) | Recall (%) | F1-score | Accuracy (%) |
|---|---|---|---|---|
| Color Jitter | 80.40 | 90.60 | 83.30 | 89.87 |
| Horizontal Flip | 89.70 | 91.30 | 90.00 | 94.77 |
| Random Affine | 86.90 | 83.40 | 83.10 | 91.39 |
| Color Jitter + Horizontal Flip | 89.10 | 90.10 | 89.40 | 94.33 |
| Color Jitter + Random Affine | 84.10 | 95.00 | 88.70 | 93.23 |
| Horizontal Flip + Random Affine | 83.60 | 89.70 | 85.70 | 92.26 |
| All Three Combined | 89.40 | 89.70 | 89.10 | 94.22 |
| Case # | Weight Multiplier | Abnormal Weight | Normal Weight | Precision (%) | Recall (%) | F1-score (%) | Accuracy (%) |
|---|---|---|---|---|---|---|---|
| 1 | 1.0×1.0 | 0.68 | 1.90 | 92.10 | 84.30 | 87.40 | 93.67 |
| 2 | 0.8×0.8 | 0.54 | 1.52 | 84.40 | 90.60 | 85.80 | 91.93 |
| 3 | 1.2×1.2 | 0.82 | 2.27 | 83.00 | 93.00 | 86.70 | 91.72 |
| 4 | 0.7×1.3 | 0.48 | 2.46 | 90.90 | 93.40 | 91.90 | 95.64 |
| 5 | 1.3×0.7 | 0.88 | 1.33 | 90.70 | 88.80 | 89.70 | 94.55 |
| Experiment # | Batch Size | Learning Rate | Epochs | Precision (%) | Recall (%) | F1-score (%) | Accuracy (%) |
|---|---|---|---|---|---|---|---|
| 1 | 16 | 0.0001 | 5 | 79.14 | 90.93 | 84.28 | 90.95 |
| 2 | 16 | 0.0001 | 10 | 89.91 | 90.07 | 89.66 | 94.55 |
| 3 | 16 | 0.0001 | 15 | 93.36 | 90.52 | 91.82 | 95.75 |
| 4 | 16 | 0.0005 | 5 | 53.84 | 87.17 | 61.0 | 65.32 |
| 5 | 16 | 0.0005 | 10 | 58.96 | 84.23 | 68.55 | 78.96 |
| 6 | 16 | 0.0005 | 15 | 65.7 | 83.97 | 72.31 | 82.54 |
| 7 | 16 | 0.001 | 5 | 53.53 | 79.31 | 58.01 | 62.83 |
| 8 | 16 | 0.001 | 10 | 53.03 | 83.38 | 60.91 | 68.72 |
| 9 | 16 | 0.001 | 15 | 59.24 | 78.87 | 63.69 | 73.07 |
| 10 | 32 | 0.0001 | 5 | 85.31 | 90.47 | 87.56 | 93.34 |
| 11 | 32 | 0.0001 | 10 | 90.47 | 95.06 | 92.59 | 95.96 |
| 12 | 32 | 0.0001 | 15 | 96.23 | 90.49 | 93.1 | 96.51 |
| 13 | 32 | 0.0005 | 5 | 56.57 | 77.67 | 61.21 | 70.13 |
| 14 | 32 | 0.0005 | 10 | 68.24 | 78.98 | 71.36 | 82.43 |
| 15 | 32 | 0.0005 | 15 | 53.52 | 95.88 | 68.11 | 74.94 |
| 16 | 32 | 0.001 | 5 | 67.09 | 60.26 | 57.91 | 77.53 |
| 17 | 32 | 0.001 | 10 | 52.15 | 81.38 | 61.44 | 70.77 |
| 18 | 32 | 0.001 | 15 | 49.43 | 86.44 | 60.69 | 67.26 |
| 19 | 64 | 0.0001 | 5 | 78.96 | 95.46 | 86.05 | 91.71 |
| 20 | 64 | 0.0001 | 10 | 84.55 | 95.03 | 89.39 | 94.0 |
| 21 | 64 | 0.0001 | 15 | 93.83 | 92.15 | 92.87 | 96.29 |
| 22 | 64 | 0.0005 | 5 | 50.99 | 84.75 | 59.55 | 67.39 |
| 23 | 64 | 0.0005 | 10 | 61.6 | 93.38 | 73.71 | 82.02 |
| 24 | 64 | 0.0005 | 15 | 71.84 | 82.19 | 75.72 | 85.72 |
| 25 | 64 | 0.001 | 5 | 51.95 | 69.81 | 57.18 | 72.73 |
| 26 | 64 | 0.001 | 10 | 42.27 | 88.89 | 55.77 | 61.49 |
| 27 | 64 | 0.001 | 15 | 58.11 | 88.87 | 69.27 | 78.17 |
| Configuration (B = Batch size, E = Epochs) |
CV Precision (%) |
CV Recall (%) |
CV F1-score (%) |
CV Accuracy (%) |
App Precision (%) |
App Recall (%) |
App F1-score (%) |
App Accuracy (%) |
|---|---|---|---|---|---|---|---|---|
| B16_E15 | 91.08 ± 4.10 | 90.99 ± 1.43 | 90.74 ± 1.93 | 95.05 ± 1.20 | 96.98 ± 4.15 | 96.65 ± 6.85 | 96.63 ± 4.27 | 98.27 ± 2.10 |
| B32_E10 | 86.91 ± 3.86 | 93.02 ± 2.82 | 89.36 ± 2.06 | 93.93 ± 1.41 | 97.59 ± 2.48 | 99.55 ± 0.47 | 98.54 ± 1.32 | 99.21 ± 0.72 |
| B32_E15 | 90.78 ± 2.41 | 91.91 ± 1.49 | 91.03 ± 0.87 | 95.15 ± 0.57 | 93.57 ± 14.15 | 99.96 ± 0.12 | 96.00 ± 9.02 | 97.18 ± 6.67 |
| B64_E15 | 91.22 ± 3.48 | 90.38 ± 1.94 | 90.53 ± 1.91 | 94.92 ± 1.22 | 98.41 ± 2.71 | 99.55 ± 0.84 | 98.95 ± 1.34 | 99.43 ± 0.73 |
| Comparison | Mean1 (Exp A) | Mean2 (Exp B) | Diff (A-B) | 95% CI (Diff) | p-value | Significant? (p < 0.05) |
|---|---|---|---|---|---|---|
| Exp1 vs Exp2 | 95.05 | 93.93 | +1.122 | (−0.179, 2.423) | 0.087 | No |
| Exp1 vs Exp3 | 95.05 | 95.15 | −0.099 | (−1.064, 0.865) | 0.826 | No |
| Exp1 vs Exp4 | 95.05 | 94.92 | +0.128 | (−1.077, 1.334) | 0.825 | No |
| Exp2 vs Exp3 | 93.93 | 95.15 | −1.221 | (−2.336, −0.106) | 0.035 | Yes |
| Exp2 vs Exp4 | 93.93 | 94.92 | −0.993 | (−2.306, 0.319) | 0.129 | No |
| Exp3 vs Exp4 | 95.15 | 94.92 | +0.228 | (−0.753, 1.209) | 0.622 | No |
| Comparison | Mean1 (Exp A) | Mean2 (Exp B) | Diff (A-B) | 95% CI (Diff) | p-value | Significant? (p < 0.05) |
|---|---|---|---|---|---|---|
| Exp1 vs Exp2 | 90.74 | 89.36 | +1.384 | (−0.603, 3.371) | 0.160 | No |
| Exp1 vs Exp3 | 90.74 | 91.03 | −0.287 | (−1.824, 1.250) | 0.692 | No |
| Exp1 vs Exp4 | 90.74 | 90.53 | +0.212 | (−1.697, 2.121) | 0.817 | No |
| Exp2 vs Exp3 | 89.36 | 91.03 | −1.671 | (−3.297, −0.045) | 0.045 | Yes |
| Exp2 vs Exp4 | 89.36 | 90.53 | −1.172 | (−3.149, 0.805) | 0.228 | No |
| Exp3 vs Exp4 | 91.03 | 90.53 | +0.499 | (−1.024, 2.021) | 0.489 | No |
| Study | Method | Acc (%) | F1 (%) | Validation technique | Split type | Dataset | Statistical test | Interpretability |
|---|---|---|---|---|---|---|---|---|
| Yilmaz & Kantar [49] | XGBoost/k-NN and Custom CNN | 85.0 / 93.0 | 87.0 / 95.0 | Train/test split (85/15) | Single split | Herlev (917) | No | No |
| Ghoneim et al. [50] | CNN + extreme learning machine | 99.5 | — | 5-fold CV | Cross-validation | Herlev (917) | No | No |
| Deo et al., CerviFormer [51] | Cross-attention transformer | 94.57 | 92.5† | Train/test split (90/10) | Single split | Herlev (917) | No | No |
| Pirovano et al. [52] | ResNet-101 + Integrated Gradients | 94.0 | 96.0 | 4-fold CV | Cross-validation | Herlev (917) | No | Yes (Integrated Gradients) |
| Kaur et al. [53] | ResNet50 (best of 16 TL models) | 95.0 | 94.0 | Train/val/test (60/20/20) | Single split | Herlev (917) | No | No |
| Present work — ViT-Tiny | ViT-Tiny (~5.5 M) + Grad-CAM | 95.15 | 91.03 | Stratified 5-fold CV (×10 reps) | Cross-validation | Herlev (917) | Yes (paired t-tests) | Yes (Grad-CAM) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).