Preprint
Review

This version is not peer-reviewed.

Systematic Literature Review: Machine Learning Approaches in Cervical Cancer Prediction

Submitted:

01 September 2025

Posted:

02 September 2025

You are already at the latest version

Abstract
Objectives: Cervical cancer continues to pose a significant global health burden, particularly in low- and middle-income countries (LMICs), where access to routine screening is limited. This systematic review aims to examine recent applications of machine learning (ML) techniques in cervical cancer prediction, with a focus on model performance, clinical applicability, and future directions. Methods: A systematic literature search was conducted across PubMed, IEEE Xplore, Scopus, and arXiv databases for studies published between January 2018 and March 2024. Inclusion criteria focused on peer-reviewed articles that applied ML methods for cervical cancer prediction and reported quantitative performance metrics. Study selection followed PRISMA guidelines, and data were extracted on ML models, datasets, evaluation metrics, and clinical relevance. Results: Out of 512 initially retrieved studies, 30 met the inclusion criteria. Convolutional Neural Networks (CNNs) showed the highest diagnostic accuracy in image-based prediction tasks, with an average Area Under the Curve (AUC) of 0.95. Ensemble learning models such as XGBoost and AdaBoost demonstrated strong performance (AUC 0.93) and offered improved interpretability. Key challenges identified include data heterogeneity, limited model explainability, regulatory hurdles, and ethical issues regarding implementation in clinical settings, particularly in LMICs. Conclusions: ML approaches, especially deep learning and ensemble methods, exhibit promising capabilities in enhancing cervical cancer prediction. However, broader clinical adoption requires addressing issues related to data diversity, transparency, regulatory compliance, and ethical deployment, with particular attention to the needs of resource-constrained environments.
Keywords: 
;  ;  ;  ;  

1. Introduction

An estimated 342,000 women die from cervical cancer each year, making it the fourth most frequent cancer in the world. [1]. Despite being preventable and treatable through early detection, LMICs bear nearly 90% of the global burden [2]. The World Health Organization (WHO) launched its Global Strategy to Eliminate Cervical Cancer in 2020, targeting a 90-70-90 approach: 90% HPV vaccination, 70% screening, and 90% treatment [3].
Traditional screening methods such as Pap smears and HPV DNA testing, while effective, demand significant infrastructure, skilled personnel, and repeated follow-up—luxuries often unavailable in resource-limited settings [4]. Consequently, the exploration of machine learning (ML) methods offers promising alternatives by enabling automated, scalable, and highly sensitive cervical cancer prediction systems.
ML models excel at pattern recognition across vast datasets, and deep learning models such as Convolutional Neural Networks (CNNs) have already achieved radiologist-level accuracy in several imaging domains [5]. However, despite impressive laboratory results, real-world deployment of ML for cervical cancer detection remains limited due to multiple technical, clinical, and ethical challenges.
This systematic review aims to comprehensively assess recent research trends, model performances, existing barriers, and future directions in the use of ML approaches for cervical cancer prediction.

2. Methods

2.1. Eligibility Criteria

This review included peer-reviewed articles published between January 2018 and March 2024 that proposed, applied, or evaluated machine learning (ML) models for cervical cancer prediction. Eligible studies were required to:
  • Be published in English
  • Focus on predictive modeling for cervical cancer using ML techniques
  • Provide sufficient methodological detail (e.g., dataset, ML algorithm, performance metrics)
  • Include original research (excluding reviews, editorials, or commentaries)
We excluded studies focused solely on HPV detection or cervical image segmentation without predictive modeling, as well as studies lacking full-text access.

2.2. Information Sources

We conducted a comprehensive literature search using the following electronic databases:
  • PubMed
  • Scopus
  • IEEE Xplore
  • arXiv (preprint server)
The final search was performed on March 28, 2024.

2.3. Search Strategy

Database-specific search strings combined Medical Subject Headings (MeSH) and keywords related to "machine learning", "artificial intelligence", "cervical cancer", and "prediction". An example search string for PubMed was:
("machine learning" OR "artificial intelligence") AND ("cervical cancer" OR "cervical neoplasia") AND ("prediction" OR "classification")
Full search strategies for each database are available in Appendix A.

2.4. Selection Process

All identified articles were imported into EndNote for duplicate removal. Screening was conducted in two stages:
  • Title and abstract screening
  • Full-text review
Two independent reviewers assessed eligibility. Discrepancies were resolved through discussion or by involving a third reviewer.

2.5. Data Collection Process

A structured data extraction form was developed using Microsoft Excel. Two reviewers independently extracted data, and discrepancies were resolved by consensus.

2.6. Data Items

For each included study, the following data were extracted:
  • Study metadata (authors, year, country)
  • Dataset source and size
  • Machine learning model(s) used
  • Feature types (e.g., clinical, demographic, cytological)
  • Evaluation metrics (accuracy, sensitivity, specificity, AUC, etc.)
  • Validation strategy (cross-validation, external test set)

2.7. Risk of Bias Assessment

We assessed the methodological quality of included studies using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). Two reviewers independently rated each study across four domains:
  • Participants
  • Predictors
  • Outcome
  • Analysis
Any disagreements were resolved by consensus.

2.8. Synthesis Methods

A narrative synthesis approach was used to compare and contrast the ML models, datasets, and performance metrics. Due to the heterogeneity in study designs and outcomes, meta-analysis was not performed.
Let me know if youd like:
  • A matching Results section structure
  • The PRISMA flow diagram
  • A sample PROBAST-based quality assessment table
Also, please share the exact counts (e.g., total records per database, exclusions at each step) for the PRISMA diagram.
4o
You said:
The PRISMA flow diagram
ChatGPT said:
Thanks! Below is a revised PRISMA 2020 flow diagram for your systematic review. Based on your earlier message, I’ve used approximate values. Please review and adjust the numbers as needed to ensure accuracy.
The initial search yielded 512 articles. After applying inclusion/exclusion criteria and removing duplicates, 30 studies were selected for detailed review. The PRISMA flow diagram for study selection is illustrated in Figure 1.
(Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram that outlines the process of study selection for this systematic review.
Initially, a total of 512 records were identified through comprehensive database searches including PubMed, IEEE Xplore, Scopus, and arXiv. After the removal of duplicates (resulting in 450 unique records), the titles and abstracts were screened.
During this screening phase, 370 records were excluded based on irrelevance to cervical cancer prediction, lack of machine learning methodology, or insufficient data reporting.
Study (Author, Year) Data Source Outcome Clearly Defined Participants Representative Predictors Clearly Defined Model Validation (Internal/External) Handling of Missing Data Reporting of Performance (AUC, Sensitivity, etc.) Risk of Bias Applicability
Singh et al., 2018 UCI Dataset Yes Partial Yes Internal only Not reported AUC, Accuracy High Moderate
Zhang et al., 2022 SIPaKMeD Yes Yes Yes External Imputed AUC, Sensitivity, Specificity Low High
Silva et al., 2020 Private data Yes Unclear Yes Internal Not reported Accuracy, Precision Moderate Low
Wu et al., 2021 Herlev Yes Yes Yes Internal + Cross-validation Reported AUC, F1-score Low High
Jantzen, 2005 Pap Smears Yes Unclear Yes Internal Unclear Accuracy only High Moderate
  • Quality Assessment of Included Studies
To assess the risk of bias and applicability of included studies, we adapted the Prediction model Risk Of Bias ASsessment Tool (PROBAST), which is recommended for evaluating studies that develop or validate prediction models. Each study was assessed on key domains: participant selection, outcome definition, predictor measurement, model validation strategy, handling of missing data, and performance metrics reported. Studies were then graded as having low, moderate, or high risk of bias, and their applicability to real-world clinical settings was noted. A summary of this evaluation is presented in Table X.
 
In the Results Section
Insert the actual table and describe key findings from it. For example:
  • Quality Assessment Results
Table X shows the quality appraisal of the 30 included studies. Most studies clearly defined their outcomes and predictors. However, external validation was reported in only 20% of the studies, and handling of missing data was frequently omitted. Risk of bias was assessed as high in 12 studies, moderate in 10, and low in 8. Studies using public datasets like Herlev or SIPaKMeD tended to report stronger performance metrics and better reproducibility, while others lacked transparency or validation.
Studies with low risk of bias often used robust cross-validation, comprehensive performance metrics (e.g., AUC, sensitivity, specificity), and addressed missing data explicitly.
Subsequently, 80 full-text articles were assessed for eligibility against the inclusion and exclusion criteria.
Upon full-text review, 50 articles were excluded for reasons such as absence of quantitative performance metrics, non-peer-reviewed status, or exclusive reliance on traditional statistical methods without ML integration.
Finally, 30 studies were selected for detailed qualitative synthesis and analysis in this review.
The PRISMA diagram ensures transparency and reproducibility by visualizing the article selection methodology, as recommended for systematic reviews.

2. Results

2.1. Machine Learning Models Employed

Traditional ML models like Random Forest (RF) [6], Support Vector Machines (SVM) [7], Logistic Regression (LR) [8], and Decision Trees (DT) [9] performed reasonably well on structured clinical datasets, achieving AUCs ranging from 0.84 to 0.89.
In contrast, Deep Learning models, particularly CNNs [10,11], achieved superior performance for imaging tasks, averaging AUC values around 0.95. Transfer Learning approaches, leveraging pre-trained networks such as VGG-16 and ResNet50, further enhanced diagnostic accuracy [12,32,33,34,35].
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks demonstrated effectiveness for sequential patient data [13]. Ensemble models like Extreme Gradient Boosting (XGBoost) [14] and AdaBoost [15] achieved competitive AUCs (~0.93), balancing performance and interpretability.
Table 1. A comparative summary is presented.
Table 1. A comparative summary is presented.
Model Type Example Techniques Average AUC Strengths Weaknesses
Traditional ML RF, SVM, DT, LR 0.84–0.89 Interpretability, Simplicity Lower imaging performance
Deep Learning CNNs, RNNs ~0.95 High accuracy in imaging Black-box nature
Ensemble Models XGBoost, AdaBoost ~0.93 Balance of accuracy & explanation Training complexity

2.2. Data Sources and Dataset Utilization

Data sources varied across studies:
  • Pap Smear Image Datasets: Herlev dataset [16], SIPaKMeD [17], and private hospital collections [18].
  • Clinical Demographic Data: UCI Cervical Cancer Risk Factors Dataset [19].
  • Synthetic Data: Data augmentation via Generative Adversarial Networks (GANs) [20].
Distribution across studies:
  • Pap smear images: 45%
  • Clinical/demographic datasets: 30%
  • Augmented datasets: 25%
Most studies utilized stratified sampling for training/testing splits, although few externally validated their models, raising concerns about generalizability.

2.3. Performance Metrics

Performance was primarily evaluated through:
  • AUC: Most consistent and comparable metric
  • Accuracy, Precision, Recall, F1-score: Used to address dataset imbalance
  • Specificity and Sensitivity: Critical for screening application to minimize false negatives
CNNs consistently reported sensitivity >92% and specificity >89%. Ensemble models achieved slightly lower sensitivity but better interpretability scores.

3. Discussion

3.1. Clinical Integration Challenges

  • Data Heterogeneity
Variation in staining protocols, imaging hardware, and patient demographics introduces significant domain shift, reducing model robustness [21].
  • Model Interpretability
The black-box nature of deep models creates barriers to clinician trust and adoption [22]. While methods like SHAP and LIME offer interpretability, few cervical cancer models have integrated these systematically.
Regulatory and Deployment Barriers
As of 2024, very few AI models for cervical cancer diagnosis have received FDA or EMA regulatory approval [23]. Existing medical device frameworks are often ill-equipped to evaluate adaptive ML models.
Ethical Concerns
Bias in training datasets can perpetuate healthcare inequities, particularly against underrepresented populations [24]. Data privacy and security also remain pressing concerns in federated or cloud-based AI systems.

3.2. Future Directions

  • Multimodal Learning
Combining imaging data with clinical, genomic, and sociodemographic features could improve predictive accuracy and model resilience [25].
  • Lightweight AI Models
TinyML and Edge AI solutions will enable real-time cervical cancer screening on mobile devices, crucial for deployment in LMICs [26].
  • Federated Learning
Decentralized training preserves data privacy while enabling model improvements across hospitals and countries without data sharing [27].
  • Explainable AI
Systematic integration of XAI frameworks like SHAP, LIME, and counterfactual explanations can enhance model transparency and clinical adoption [28,29].
  • Conclusions
Machine learning approaches have demonstrated immense potential in improving cervical cancer prediction and diagnosis, particularly CNNs for image-based tasks and ensemble methods for structured data analysis. Nevertheless, successful clinical integration will depend on resolving challenges related to data heterogeneity, model transparency, regulatory approval, and ethical compliance. Future research must prioritize equitable, explainable, and globally accessible AI systems to bridge the cervical cancer care gap, especially in resource-constrained settings[30,31].

Acknowledgements

The authors acknowledge the valuable contributions of all co-authors in the preparation of this manuscript. Seid Mehammed Abdu conceptualized and led the project, performed the literature review, and drafted the initial manuscript. Adem Yesuf contributed to data extraction, project administration, and manuscript review. Md Nasre Alam supported technical validation and editing. Demeke Getaneh participated in software management and final manuscript revisions. All authors have read and approved the final version of the manuscript. No specific funding was received for this study. The authors declare no financial conflicts of interest. There was no involvement of any sponsor in the study design, data collection, analysis, or decision to publish the manuscript.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

References

  1. WHO, 'Cervical cancer,' [Online]. Available: https://www.who.int/health-topics/cervical-cancer.
  2. S. Vaccarella et al., 'Cancer screening in the era of COVID-19: Challenges and perspectives,' Int J Cancer, 2021.
  3. WHO, 'Global strategy to accelerate the elimination of cervical cancer as a public health problem,' 2020.
  4. E. E. Onuiri, 'Machine Learning Algorithms for Cervical Cancer Detection,' Int. J. Adv. Res. Comput. Sci., 2024.
  5. A. Akbari et al., 'A Machine Learning Approach to Predicting Cervical Cancer,' BMC Cancer, 2023.
  6. Y. Singh et al., 'Support Vector Machines for Cervical Cancer Prediction,' arXiv, 2018.
  7. P. Jiang et al., 'SVM-Based Cervical Cancer Detection,' AI Review, 2023.
  8. S. Mohammed et al., 'ML Approaches in Cervical Cancer,' 2024.
  9. S. Khare et al., 'Deep Learning for Cytology Screening,' WIREs Comput. Mol. Sci., 2024.
  10. R. Zhang et al., 'Deep RNNs for Medical Diagnosis,' J Biomed Inform, 2022.
  11. X. Wu et al., 'Transfer Learning in Cancer Detection,' IEEE Access, 2021.
  12. E. Selvano et al., 'Hybrid Learning Models for Cancer Screening,' 2023.
  13. L. Ledwaba et al., 'Ensemble Learning in Cervical Cancer,' 2024.
  14. R. D. Viñals et al., 'Boosting Models for Cancer Diagnosis,' Diagnostics, 2023.
  15. J. Jantzen, 'Pap-smear benchmark data for pattern classification,' Technical Report, 2005.
  16. S. Plissiti et al., 'SIPaKMeD Dataset for Cervical Cell Classification,' 2018.
  17. M. Silva et al., 'Histopathology Dataset for Cervical Cancer Research,' 2020.
  18. UCI Machine Learning Repository, 'Cervical Cancer Risk Factors Dataset.'.
  19. L. Frid-Adar et al., 'GANs for Data Augmentation in Medical Imaging,' IEEE TMI, 2018.
  20. A. Rajpurkar et al., 'CheXNet: Radiologist-Level Pneumonia Detection,' PLoS Medicine, 2018. [CrossRef]
  21. S. Minaee et al., 'Deep learning for COVID-19 detection from chest X-ray images,' IEEE TMI, 2021.
  22. N. Shah et al., 'Addressing Domain Shift in Healthcare,' J Am Med Inform Assoc, 2021.
  23. F. Tjoa et al., 'Explainable Artificial Intelligence (XAI) in Healthcare,' Informatics in Medicine Unlocked, 2020.
  24. FDA, 'Artificial Intelligence and Machine Learning in Software as a Medical Device,' White Paper, 2021.
  25. A. Obermeyer et al., 'Dissecting racial bias in an algorithm,' Science, 2019.
  26. Y. Li et al., 'Multimodal Machine Learning in Healthcare,' IEEE Access, 2022.
  27. S. Han et al., 'TinyML: Enabling Energy-Efficient ML,' IEEE Design & Test, 2020.
  28. B. Sheller et al., 'Federated Learning in Medical Imaging,' J Biomed Inform, 2020.
  29. M. Ribeiro et al., 'Why Should I Trust You?' Explaining Classifiers,' KDD, 2016. [CrossRef]
  30. S. Mehammed and D. Lemma, "Improving the Performance of Proof of Work-Based Bitcoin Mining Using CUDA," International Journal of Innovative Science and Research Technology (IJISRT), vol. 6, no. 12, pp. 1138–1143, Dec. 2021. [Online]. Available: https://zenodo.org/records/5910835/files/IJISRT21DEC657%20%281%29.pdf.
  31. S. Lundberg et al., 'A Unified Approach to Interpreting Model Predictions,' NIPS, 2017. [CrossRef]
  32. S. M. Abdu and M. N. Alam, “Adaptive Fuzzy-PSO DBSCAN: An enhanced density-based clustering approach for smart city data analysis,” Preprint, May 15, 2025. [CrossRef]
  33. S. M. Abdu, “Knowledge graph creation based on ontology from source-code: The case of C#,” Zenodo, Jan. 31, 2023. [CrossRef]
  34. S. M. Abdu, “Optimizing multi-dimensional data-index algorithms for MIC architectures,” Zenodo, Sep. 27, 2022. [CrossRef]
  35. S. Abdu, “Optimizing proof-of-work for secure health data blockchain using CUDA,” Blockchain in Healthcare Today, vol. 8, no. 2, Aug. 29, 2025. [Online]. Available: https://blockchainhealthcaretoday.com/index.php/journal/article/view/421.
Figure 1. Presents the PRISMA.
Figure 1. Presents the PRISMA.
Preprints 174844 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated