A Machine Learning Model Based on Routine Preoperative Laboratory Parameters for the Auxiliary Differentiation of HSIL and Cervical Cancer

Zhaoquan Su; Weiwei Deng; Weina Kong; Zhenzhen Cheng; Dewei Li; Xiumin Ma

doi:10.20944/preprints202606.1874.v1

Submitted:

24 June 2026

Posted:

25 June 2026

You are already at the latest version

Abstract

Background/Objectives: High-grade squamous intraepithelial lesion (HSIL) is a precursor of cervical cancer, but the two conditions require different clinical management. This study aimed to develop a machine learning model based on routine preoperative laboratory parameters to differentiate HSIL from cervical cancer. Methods: A total of 702 patients, including 370 with HSIL and 332 with cervical cancer, were retrospectively enrolled. Patients were divided into training and test sets at a 7:3 ratio. Candidate variables were selected using LASSO regression and Pearson correlation analysis. Five machine learning models were constructed, and SHAP analysis was used for model interpretation. Results: Ten variables were retained for modeling. Among the five models, random forest showed the best performance in the test set, with an AUC of 0.977, accuracy of 0.886 and F1 score of 0.876. SHAP analysis identified progesterone as the most important contributor. Conclusions: A random forest model using routine laboratory parameters may provide a low-cost, accessible, and interpretable tool for differentiating HSIL from cervical cancer.

Keywords:

high-grade squamous intraepithelial lesion

;

cervical cancer

;

machine learning

;

random forest

;

routine laboratory parameters

;

SHAP

;

preoperative differentiation

Subject:

Public Health and Healthcare - Other

1. Introduction

Cervical cancer is one of the most common malignancies affecting women worldwide [1,2]. Its development usually follows a stepwise progression from cervical intraepithelial neoplasia to invasive cancer [3,4]. HSIL is an important precancerous lesion. Timely detection and treatment may prevent progression in some patients [3,5]. However, HSIL and cervical cancer are not simply adjacent stages of the same disease process. They differ in preoperative assessment, treatment planning and follow-up. Patients with HSIL are usually treated with local excision, conization or close surveillance. Patients with cervical cancer require tumor staging, imaging assessment and individualized treatment [5,6]. More accurate preoperative differentiation is therefore important. It helps define the lesion and guides subsequent treatment.

Current screening and diagnosis of cervical lesions rely mainly on HPV testing, cytology, colposcopy and histopathology [5,6,7]. HPV testing is sensitive and helps identify high-risk individuals. Cytology and colposcopy can detect abnormal lesions and guide biopsy. Histopathology remains the gold standard for diagnosis. These methods still have practical limitations. Cytology and colposcopy can be affected by sampling quality, operator experience and subjective interpretation. Biopsy is invasive and may be affected by sampling site, lesion heterogeneity and the extent of local disease. Access to organized screening and diagnostic services also remains uneven across settings [7,8]. For some patients, a single local examination may not fully reflect disease status. Objective and readily available supplementary information may improve preoperative differentiation between HSIL and cervical cancer.

Routine laboratory parameters are obtained during routine clinical care. They are accessible, inexpensive and relatively objective. A single routine parameter is usually not specific for cervical cancer. However, progression from a precancerous lesion to invasive cancer may involve systemic changes in inflammation and coagulation [9,10]. Lipid metabolism and broader metabolic reprogramming may also change during malignant progression [11,12]. Serum inflammatory markers and plasma proteomic profiling also suggest that systemic alterations may accompany increasing severity of cervical lesions [13,14]. These findings are consistent with the concept that cancer progression involves multiple biological processes rather than one isolated pathway [15]. Such changes may not appear as marked abnormalities in one parameter. They may instead appear as small, scattered and interrelated changes across several parameters. Traditional statistical methods have limits in detecting nonlinear multivariable patterns. Machine learning can integrate multidimensional information and identify potential interactions. This provides a practical way to reuse routine laboratory data.

Machine learning has been widely used in medical diagnosis, risk stratification and prognosis prediction. In cervical lesion research, models have used routine blood indicators, colposcopic images and clinical parameters for risk prediction [16,17,18]. Other studies have used cytology images, HSIL-specific management variables and HPV genotype information to improve lesion classification or risk stratification [19,20,21]. Recent work on AI-assisted cytology and systematic reviews also supports the potential of artificial intelligence in cervical lesion diagnosis [22,23]. Colposcopic lesion segmentation models provide another example of image-based auxiliary diagnosis [24]. Most studies have focused on screening, CIN grading or HSIL+ risk assessment. Few studies have directly used routine preoperative laboratory parameters to distinguish HSIL from cervical cancer. Routine laboratory parameters do not come directly from local cervical lesions. However, they can provide systemic information. If these weak signals are integrated by machine learning, they may offer an additional reference for preoperative differentiation. Therefore, this study aimed to extract combined features from routine preoperative test data. The goal was to explore a low-cost and accessible auxiliary stratification method that can fit into existing clinical workflows.

2. Materials and Methods

2.1. Study Population

We retrospectively reviewed age and preoperative laboratory data from patients at the Affiliated Cancer Hospital of Xinjiang Medical University. Eligible patients were pathologically diagnosed with HSIL or cervical cancer between April 2024 and June 2025. A total of 370 patients with HSIL and 332 patients with cervical cancer were included. The inclusion criteria were as follows: ref. [1] pathologically confirmed HSIL or cervical cancer; ref. [2] complete clinical information and peripheral blood parameters; and [3] no antitumor therapy before enrollment. The exclusion criteria were as follows: ref. [1] concurrent malignant tumors; and [2] infection, hematological disease, autoimmune disease, or severe hepatic or renal insufficiency. The study was approved by the hospital Ethics Committee (K-2025095).

2.2. Statistical Analysis

Descriptive statistics and normality testing were performed using SPSS 23.0. Model development and evaluation were performed in R software (version 4.4.2). Normality was assessed using the Shapiro-Wilk test. Normally distributed variables are presented as mean ± standard deviation and were compared using the independent-samples t-test. Homogeneity of variance was assessed using Levene's test. Welch's t-test was used when this assumption was violated. Non-normally distributed variables are presented as median and interquartile range and were compared using the Mann-Whitney U test. A two-sided P value <0.05 was considered statistically significant.

2.3. Model Construction

Stratified random sampling was performed using the rsample package in R. This approach was used to maintain a balanced distribution of the outcome between the training and test sets. The 702 patients were divided into a training set (n = 491) and a test set (n = 211) at a 7:3 ratio. The training set was used for model construction, hyperparameter tuning and internal validation. The test set was used to assess model performance. A stepwise feature-selection strategy was used to reduce model complexity. LASSO regression was first applied to reduce variable dimensionality. The optimal regularization parameter lambda was selected by 10-fold cross-validation using minimum binomial deviance [25]. Pearson correlation analysis was then performed. Variables with stronger outcome relevance and lower intervariable redundancy were retained. The selected features were used to build five models. Random forest was included as an ensemble tree-based method [26]. XGBoost and LightGBM were included as gradient-boosting methods [27,28]. Support vector machine and K-nearest neighbors were included as comparator algorithms [29,30].

2.4. Model Evaluation

Accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were calculated in the test set. ROC curves and AUC values were used to assess discrimination. Calibration curves and comparisons of classification metrics were also generated. SHAP analysis was used to improve model interpretability. Mean absolute SHAP value was used as the feature-importance metric.

Figure 1. Study workflow for machine learning-based differentiation between HSIL and cervical cancer. A total of 702 pathologically confirmed patients were enrolled, including 370 patients with HSIL and 332 patients with cervical cancer. Patients were divided into a training set (n = 491) and a test set (n = 211) using stratified random sampling. Feature selection was performed using LASSO regression and Pearson correlation analysis. Five machine learning models were developed and evaluated in the test set. The best-performing model was interpreted using SHAP analysis.

3. Results

3.1. Clinical Features and Variable Selection

A total of 702 female patients were included. There were 370 patients with HSIL and 332 patients with cervical cancer. All patients had complete age and preoperative laboratory data. Forty-one variables met the predefined screening criteria and entered feature selection. In the training set, LASSO regression retained 25 candidate features with nonzero coefficients at lambda = 1.0 (lambda.1se). Pearson correlation analysis was then used to reduce redundancy. Variables were retained when |r|>0.2 with the outcome and when the pairwise correlation coefficient was |r|<0.2. Ten features were selected as model inputs: P4, TT, CA125, Lp(a), PT, AFP, HDL-C, HCO₃^⁻, GGT and age. The selected variables are summarized in Table 1, and the comparison of all 41 candidate variables is provided in Supplementary Table 1.

Figure 2. Feature selection using LASSO regression. (A) LASSO coefficient profiles of candidate variables. (B) Ten-fold cross-validation curve for binomial deviance. The vertical dashed lines indicate the selected lambda values.

3.2. Development and Evaluation of Diagnostic Models

After grid search and 5-fold cross-validation, five machine learning models were compared. All models showed discriminatory ability for differentiating HSIL from cervical cancer. Random forest achieved the highest AUC in the test set. The test-set AUC was 0.977. The positive predictive value, negative predictive value, F1 score and accuracy were 0.988, 0.815, 0.876 and 0.886, respectively. The optimal random forest hyperparameters were mtry = 1, trees = 200 and min_n = 30. The detailed performance metrics are shown in Table 2.

Figure 3. Performance of five machine learning models for differentiating HSIL from cervical cancer. (A) ROC curves of the models. (B) AUC comparison in the test set. (C) Calibration curves. (D) Comparison of classification metrics across models. AUC, area under the receiver operating characteristic curve.

3.3. SHAP Analysis of the Random Forest Model

SHAP analysis showed that each variable contributed differently to random forest output in the test set. In the SHAP swarm plot, variables were ranked as follows: progesterone, thrombin time, CA125, Lp(a), prothrombin time, alpha-fetoprotein, HDL-C, serum bicarbonate, GGT and age. Progesterone had the largest influence on prediction. TT, CA125, Lp(a), PT and AFP also made substantial contributions.

Figure 4. SHAP interpretation of the random forest model. (A) SHAP beeswarm plot showing the direction and magnitude of feature effects. (B) Mean absolute SHAP value ranking of the 10 input variables. SHAP, SHapley Additive exPlanations.

4. Discussion

This study included 702 patients with pathologically confirmed HSIL or cervical cancer. We developed a diagnostic model based on routine preoperative laboratory parameters. After LASSO regression and Pearson correlation analysis, 10 variables were retained. The random forest model showed good performance in the test set. Its AUC was 0.977, and the positive predictive value, negative predictive value, F1 score and accuracy were 0.988, 0.815, 0.876 and 0.886, respectively.

Among the five algorithms, random forest achieved the highest AUC. This may be related to the structure of the data. Random forest constructs multiple decision trees and repeatedly partitions samples. It can capture nonlinear relationships and interactions among variables. This may be useful for the multivariable, nonlinear and weak-signal patterns in this study. We also used SHAP to interpret the model. SHAP is based on Shapley values from cooperative game theory. It quantifies the contribution of each feature to model output and helps explain the model at both the global and individual levels.

The variables identified in this study reflect systemic differences between patients with HSIL and those with cervical cancer. Progesterone reflects endocrine status and may be influenced by ovarian function, the menstrual cycle and the hormonal environment. TT and PT reflect different stages of coagulation. Patients with cervical cancer may have greater coagulation fluctuation or inflammation-related coagulation changes than patients with HSIL [10]. CA125 and AFP are not specific markers for cervical cancer. CA125 may be associated with tumor burden and selected pathological features in cervical cancer. AFP is mainly used as an adjunctive marker for hepatocellular carcinoma and some germ cell tumors [31,32]. Lp(a) is associated with lipid transport, endothelial function and chronic inflammation. HDL-C participates in reverse cholesterol transport and has anti-inflammatory and antioxidant effects [11,12]. HCO₃⁻ reflects acid-base balance and metabolic compensation. GGT is related to hepatobiliary metabolism, glutathione metabolism and oxidative stress [33]. Age may reflect differences in persistent HPV infection risk, endocrine status and immune response. Together, these routine indicators describe the preoperative systemic status of patients with HSIL and cervical cancer across several dimensions, including endocrine function, coagulation, metabolism and oxidative stress.

The random forest model performed well in differentiating HSIL from cervical cancer. Its predictive performance was relatively high compared with previous studies. Yue et al. developed an SVM model based on routine hematological and biochemical indicators for high-grade CIN risk stratification and reported an AUC of 0.75 [16]. Yuan et al. developed a colposcopic image-based CIN classification model with an AUC of 0.93 [17]. Li et al. combined HPV status, cytology and transformation-zone type to predict HSIL+ risk in patients with LSIL and reported an AUC of 0.936 [18]. Other studies have used cytology images, HSIL-specific risk-stratification variables and HPV genotype information for cervical lesion classification and risk assessment [19,20,21]. AI-assisted cytology studies and systematic reviews also support the diagnostic potential of artificial intelligence in cervical lesions [22,23]. Recent studies have examined occult microinvasive or invasive cervical cancer among patients with HSIL [34,35]. These comparisons should be interpreted carefully because target populations, predictors and validation designs differ. This is also emphasized in current guidance for clinical prediction models using artificial intelligence [36]. Colposcopic segmentation studies provide further context for image-based auxiliary diagnosis [24]. Other localization and deep-learning studies also support model-based cervical lesion stratification [37,38]. Studies on HPV viral load also support model-based risk stratification for cervical lesions [39]. Compared with models based on colposcopic images, HPV genotyping or cytology, our model used routine preoperative laboratory parameters and age. These indicators are obtained in routine clinical practice and are relatively objective. The model should therefore be regarded as a supplement to current clinical workflows. It is not a substitute for histopathology, colposcopy, imaging assessment or clinical judgment. Its value lies in integrating subtle laboratory signals that are difficult to interpret from single parameters. It may provide an objective and quantifiable reference for auxiliary differentiation between HSIL and cervical cancer.

Several limitations should be acknowledged. First, this was a single-center retrospective study. All cases came from the same hospital. Testing platforms and clinical procedures were relatively consistent. This may reduce measurement variability, but it may also limit generalizability to other centers, regions and platforms. For machine learning models, sample diversity directly affects stability and generalization. The current results therefore require validation in larger multicenter cohorts. Future reports should also follow current guidance for clinical prediction models using artificial intelligence. Second, this study mainly included age and routine preoperative laboratory parameters. HPV genotyping, TCT results, colposcopic findings and imaging data were not integrated. The model therefore reflects the classification ability of routine laboratory combinations. It does not represent the full clinical decision-making process. Future studies should integrate HPV, cytology, colposcopy and imaging data. Such models may better reflect real-world practice and may improve preoperative discrimination.

5. Conclusions

This study developed a machine learning model based on routine preoperative laboratory parameters for auxiliary differentiation between HSIL and cervical cancer. The model does not require additional omics testing or complex imaging inputs. Instead, it extracts multimarker information from routinely available clinical data. SHAP analysis provides interpretability support. These findings suggest that routine preoperative test data may contain information useful for distinguishing precancerous lesions from invasive cervical cancer. This approach may supplement current clinical workflows and provide a low-cost auxiliary tool for preoperative risk stratification.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org. Supplementary Table 1: Comparison of all 41 candidate variables.

Author Contributions

Conceptualization, S.Z. and M.X.; methodology, S.Z. and L.D.; software, S.Z. and L.D.; validation, K.W., C.Z. and L.D.; formal analysis, S.Z. and L.D.; investigation, S.Z., D.W., K.W. and C.Z.; data curation, S.Z., D.W., K.W. and C.Z.; writing—original draft preparation, S.Z.; writing—review and editing, K.W., C.Z., L.D. and M.X.; supervision, M.X.; funding acquisition, M.X. All authors have read and agreed to the published version of the manuscript. S.Z. and D.W. contributed equally to this work.

Funding

This research was funded by the Natural Science Foundation of Xinjiang Uygur Autonomous Region, grant number 2022D01C507, and the Open Project of the Institute of Medical Science, Xinjiang Medical University, grant number YXYJ20240302.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Affiliated Cancer Hospital of Xinjiang Medical University (approval No. K-2025095).

Informed Consent Statement

Patient consent was waived because of the retrospective design and the use of anonymized data.

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021 May;71(3):209-249.
Singh D, Vignat J, Lorenzoni V, et al. Global estimates of incidence and mortality of cervical cancer in 2020: a baseline analysis of the WHO Global Cervical Cancer Elimination Initiative. Lancet Glob Health. 2023 Feb;11(2):e197-e206. [CrossRef]
World Health Organization. WHO guideline for screening and treatment of cervical pre-cancer lesions for cervical cancer prevention. 2nd ed. Geneva: World Health Organization; 2021.
McCredie MR, Sharples KJ, Paul C, et al. Natural history of cervical neoplasia and risk of invasive cancer in women with cervical intraepithelial neoplasia 3: a retrospective cohort study. Lancet Oncol. 2008 May;9(5):425-34. [CrossRef]
Saslow D, Solomon D, Lawson HW, et al. American Cancer Society, American Society for Colposcopy and Cervical Pathology, and American Society for Clinical Pathology screening guidelines for the prevention and early detection of cervical cancer. Am J Clin Pathol. 2012 Apr;137(4):516-42.
Massad LS, Einstein MH, Huh WK, et al. 2012 updated consensus guidelines for the management of abnormal cervical cancer screening tests and cancer precursors. J Low Genit Tract Dis. 2013 Apr;17(5 Suppl 1):S1-s27.
Petersen Z, Jaca A, Ginindza TG, et al. Barriers to uptake of cervical cancer screening services in low-and-middle-income countries: a systematic review. BMC Womens Health. 2022 Dec 2;22(1):486. [CrossRef]
Arbyn M, Weiderpass E, Bruni L, et al. Estimates of incidence and mortality of cervical cancer in 2018: a worldwide analysis. Lancet Glob Health. 2020 Feb;8(2):e191-e203. [CrossRef]
Mantovani A, Allavena P, Sica A, et al. Cancer-related inflammation. Nature. 2008 Jul 24;454(7203):436-44. [CrossRef]
Khorana AA, Mackman N, Falanga A, et al. Cancer-associated venous thromboembolism. Nat Rev Dis Primers. 2022 Feb 17;8(1):11.
Cheng L, Li Z, Zheng Q, et al. Correlation study of serum lipid levels and lipid metabolism-related genes in cervical cancer. Front Oncol. 2024;14:1384778. [CrossRef]
Snaebjornsson MT, Janaki-Raman S, Schulze A. Greasing the Wheels of the Cancer Machine: The Role of Lipid Metabolism in Cancer. Cell Metab. 2020 Jan 7;31(1):62-76. [CrossRef]
Qin L, Zhang L. The predictive value of serum inflammatory markers for the severity of cervical lesions. BMC Cancer. 2024 Jun 28;24(1):780. [CrossRef]
Han S, Zhang J, Sun Y, et al. The Plasma DIA-Based Quantitative Proteomics Reveals the Pathogenic Pathways and New Biomarkers in Cervical Cancer and High Grade Squamous Intraepithelial Lesion. J Clin Med. 2022 Dec 1;11(23). [CrossRef]
Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011 Mar 4;144(5):646-74.
Yue C, Liu S, Wang W, et al. Machine learning in early screening for high-grade cervical intraepithelial neoplasia using blood testing. BMC Med Inform Decis Mak. 2025 Dec 18;26(1):25. [CrossRef]
Yuan C, Yao Y, Cheng B, et al. The application of deep learning based diagnostic system to cervical squamous intraepithelial lesions recognition in colposcopy images. Sci Rep. 2020 Jul 15;10(1):11639. [CrossRef]
Li D, Wang Z, Liu Y, et al. Assessing the risk of high-grade squamous intraepithelial lesions (HSIL+) in women with LSIL biopsies: a machine learning-based study. Infect Agent Cancer. 2024 Dec 5;19(1):61. [CrossRef]
Wang CW, Liou YA, Lin YJ, et al. Artificial intelligence-assisted fast screening cervical high grade squamous intraepithelial lesion and squamous cell carcinoma diagnosis and treatment planning. Sci Rep. 2021 Aug 10;11(1):16244. [CrossRef]
Zhang L, Tian P, Li B, et al. Risk-stratified management of cervical high-grade squamous intraepithelial lesion based on machine learning. J Med Virol. 2024 Oct;96(10):e70016. [CrossRef]
Xiao T, Wang C, Yang M, et al. Use of Virus Genotypes in Machine Learning Diagnostic Prediction Models for Cervical Cancer in Women With High-Risk Human Papillomavirus Infection. JAMA Netw Open. 2023 Aug 1;6(8):e2326890. [CrossRef]
Wang J, Yu Y, Tan Y, et al. Artificial intelligence enables precision diagnosis of cervical cytology grades and cervical cancer. Nat Commun. 2024 May 22;15(1):4369. [CrossRef]
Liu L, Liu J, Su Q, et al. Performance of artificial intelligence for diagnosing cervical intraepithelial neoplasia and cervical cancer: a systematic review and meta-analysis. EClinicalMedicine. 2025 Feb;80:102992. [CrossRef]
Li Z, Zeng CM, Dong YG, et al. A segmentation model to detect cevical lesions based on machine learning of colposcopic images. Heliyon. 2023 Nov;9(11):e21043. [CrossRef]
Xie Y, Shi H, Han B. Bioinformatic analysis of underlying mechanisms of Kawasaki disease via Weighted Gene Correlation Network Analysis (WGCNA) and the Least Absolute Shrinkage and Selection Operator method (LASSO) regression model. BMC Pediatr. 2023 Feb 24;23(1):90. [CrossRef]
Liu B, Mazumder R. Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests [Article]. J Mach Learn Res. 2025 2025;26:1-49.
Sagi O, Rokach L. Approximating XGBoost with an interpretable decision tree [Article]. Inf Sci. 2021 Sep;572:522-542. [CrossRef]
Hajihosseinlou M, Maghsoudi A, Ghezelbash R. A Novel Scheme for Mapping of MVT-Type Pb-Zn Prospectivity: LightGBM, a Highly Efficient Gradient Boosting Decision Tree Machine Learning Algorithm [Article]. Nat Resour Res. 2023 Dec;32(6):2417-2438.
Hao PY, Chiang JH, Chen YD. Possibilistic classification by support vector networks. Neural Netw. 2022 May;149:40-56. [CrossRef]
Wang Y, Pan Z, Pan Y. A Training Data Set Cleaning Method by Classification Ability Ranking for the k -Nearest Neighbor Classifier. IEEE Trans Neural Netw Learn Syst. 2020 May;31(5):1544-1556. [CrossRef]
Bender DP, Sorosky JI, Buller RE, et al. Serum CA 125 is an independent prognostic factor in cervical adenocarcinoma. Am J Obstet Gynecol. 2003 Jul;189(1):113-7. [CrossRef]
Hanif H, Ali MJ, Susheela AT, et al. Update on the applications and limitations of alpha-fetoprotein for hepatocellular carcinoma. World J Gastroenterol. 2022 Jan 14;28(2):216-229. [CrossRef]
Fentiman IS. Gamma-glutamyl transferase: risk and prognosis of cancer. Br J Cancer. 2012 Apr 24;106(9):1467-8. [CrossRef]
Huang M, Chen X, Lin X, et al. Prediction Models of Microinvasive Cervical Cancer in High-Grade Squamous Intraepithelial Lesion Treatment by Loop Electrosurgical Excision Procedure. Risk Manag Healthc Policy. 2025;18:2921-2934. [CrossRef]
Liu Q, Yang J, Cheng H, et al. A Clinical Prediction Model for Pathologic Upgrade to Invasive Carcinoma Following Conization of Cervical High-Grade Squamous Intraepithelial Lesions. Cancer Med. 2025 Jan;14(1):e70540. [CrossRef]
Collins GS, Moons KGM, Dhiman P, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Bmj. 2024 Apr 16;385:e078378.
Wang L, Chen R, Weng J, et al. Detecting and localizing cervical lesions in colposcopic images with deep semantic feature mining. Front Oncol. 2024;14:1423782. [CrossRef]
Yu H, Fan Y, Ma H, et al. Segmentation of the cervical lesion region in colposcopic images based on deep learning. Front Oncol. 2022;12:952847. [CrossRef]
Zhou Y, Shi X, Liu J, et al. Correlation between human papillomavirus viral load and cervical lesions classification: A review of current research. Front Med (Lausanne). 2023;10:1111269. [CrossRef]

Table 1. Baseline characteristics and selected variables.

Variable	HSIL (n=370)	Cervical cancer (n=332)	Statistic	P value
Age	36.74±4.98	39.31±4.68	t=-7.055	<0.001
Progesterone (P4)	11.61 (2.08, 22.86)	4.60 (1.47, 8.61)	Z=8.725	<0.001
Thrombin time (TT)	18.08±1.52	17.11±1.28	t=9.216	<0.001
Carbohydrate antigen 125 (CA125)	14.70 (11.80, 18.06)	20.50 (14.30, 27.90)	Z=-8.617	<0.001
Lipoprotein(a) [Lp(a)]	99.00 (50.00, 187.59)	187.50 (101.95, 299.25)	Z=-7.840	<0.001
Prothrombin time (PT)	11.51±0.60	11.87±0.57	t=-8.246	<0.001
Alpha-fetoprotein (AFP)	1.79 (1.65, 2.11)	2.21 (1.76, 3.17)	Z=-7.710	<0.001
High-density lipoprotein cholesterol (HDL-C)	1.25±0.27	1.09±0.21	t=9.045	<0.001
Serum bicarbonate (HCO₃^⁻)	22.82±2.01	23.92±2.27	t=-6.800	<0.001
Gamma-glutamyl transferase (GGT)	14.20 (12.00, 21.90)	21.30 (15.07, 35.57)	Z=-8.307	<0.001

Table 2. Performance metrics of five machine learning models in the test set.

Model	AUC	Accuracy	Sensitivity	Specificity	PPV	NPV	F1
KNN	0.920	0.829	0.750	0.912	0.900	0.775	0.818
LightGBM	0.975	0.876	0.778	0.980	0.977	0.806	0.866
RF	0.977	0.886	0.787	0.990	0.988	0.815	0.876
SVM	0.933	0.886	0.898	0.873	0.882	0.890	0.890
XGBoost	0.951	0.895	0.889	0.902	0.906	0.885	0.897

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.