Machine Learning Framework for Ovarian Cancer Diagnostics Using Plasma Lipidomics and Metabolomics

Alisa Tokareva; Mariia Iurova; Natalia Starodubtseva; Vitaliy Chagovets; Anastasia Novoselova; Evgenii Kukaev; Vladimir Frankevich; Gennady Sukhikh

doi:10.20944/preprints202505.2384.v1

Submitted:

29 May 2025

Posted:

30 May 2025

You are already at the latest version

Abstract

Ovarian cancer (OC), the third most common gynecologic malignancy, exhibits distinct metabolic alterations that could enable early detection via liquid biopsy. We developed an advanced machine learning pipeline integrating lipidomics (HPLC-MS, positive/negative ion modes) and NMR-based metabolomics to analyze plasma samples from 229 subjects, including 103 serous OC patients, 107 benign cases, and 19 healthy controls. By systematically evaluating feature selection methods and machine learning architectures, we identified optimal biomarker combinations for OC detection. Convolutional Neural Network (CNN) model based on Mann-Whitney-selected features demonstrated strong discriminatory power (81% accuracy) in distinguishing malignant from benign cases, while Extreme Gradient Boosting (XGBoost) combined with Support Vector Machine-Recursive Feature Elimination (SVM-RFE) achieved exceptional performance (96% accuracy) in differentiating benign from control samples. For multiclass classification, XGBoost with Kruskal-Wallis-selected features achieved 77% accuracy, while one-versus-one CNN models utilizing Mann-Whitney-selected features attained 78% accuracy, demonstrating optimal performance among tested approaches. The complementary strengths of deep learning and ensemble methods highlight their potential for tailored diagnostic applications. While clinical implementation requires further standardization, these findings provide both a methodological framework for metabolic biomarker discovery and biological insights into OC pathophysiology, paving the way for integrated multi-omics approaches in gynecologic oncology.

Keywords:

ovarian tumor

;

plasma

;

metabolome

;

machine learning

;

feature selection

;

MLP

;

Xgboost

;

CNN

;

neural network

Subject:

Medicine and Pharmacology - Obstetrics and Gynaecology

1. Introduction

Ovarian cancer (OC) remains a significant global health challenge, with GLOBOCAN reporting 69,472 new cases and 46,232 deaths across Europe in 2022 alone [1]. This malignancy is particularly insidious as it is frequently diagnosed at advanced stages, with approximately 70% of cases detected after regional or distant metastasis has occurred. Current diagnostic protocols typically identify ovarian tumors during routine gynecological examinations, where accurate malignancy determination becomes crucial for clinical decision-making and treatment stratification. The standard diagnostic approach incorporates the Risk of Malignancy Index (RMI) and Risk of Ovarian Malignancy Algorithm (ROMA), which combine ultrasound findings, menopausal status, and serum CA125 levels [2]. While CA125 has served as the gold standard biomarker for decades, its limited sensitivity (50-60% for early-stage disease) and specificity (affected by benign conditions) have driven research into alternative molecular signatures [3].

Recent advances in molecular diagnostics have identified promising alternatives to CA125, including protein panels (particularly CA15-3, CA 19-9, HE4, and hCG [4,5,6,7,8]), circulating miRNAs (such as miR-200 family members [8,9]), and small molecules (notably lysophospholipids and acylcarnitines) [10,11]. The emerging field of metabolomics has shown particular promise, with studies demonstrating that tumor-specific metabolic reprogramming [12,13] produces distinct signatures in both tissue biopsies and biological fluids [14,15,16,17]. These -omics approaches offer potential for earlier detection and more accurate differentiation between benign and malignant states, with some recent studies reporting AUC values exceeding 0.90 in validation cohorts [18,19].

However, -omics research (encompassing metabolomics, proteomics, and lipidomics) faces the persistent challenge of the “curse of dimensionality” [20,21]. This phenomenon, where datasets contain orders of magnitude more features (p) than samples (n), creates significant statistical challenges for robust biomarker discovery [22,23,24,25]. Modern analytical pipelines address this through sophisticated computational strategies, including three primary feature selection approaches: filter methods (employing univariate statistical thresholds), wrapper methods (using iterative classifier performance), and embedded methods (with built-in feature selection like LASSO (least absolute shrinkage and selection operator) regression) [21,26,27]. Recent methodological innovations have demonstrated that hybrid approaches - particularly ensemble feature selection combining multiple methods [28,29,30,31,32] or consensus analysis across different platforms [14] - can improve biomarker reliability. Similarly, integrating multiple classification algorithms (including Random Forest (RF), Support Vector Machines (SVM) with nonlinear kernels, and regularized regression models) has shown promise in overcoming individual method limitations [15,33,34]. Deep learning approaches are also gaining traction, with convolutional neural networks (CNN) achieving notable success in image-based OC diagnostics [26,35].

This study aims to develop and validate robust classification models capable of distinguishing between healthy controls, benign ovarian tumors, and OC. Our approach will incorporate lipidomic (liquid chromatography–mass spectrometry, HPLC-MS) and metabolomic (nuclear magnetic resonance, NMR) data for more than 200 patients while implementing advanced feature selection and machine learning techniques to address the high-dimensionality challenge inherent in -omics datasets.

2. Results

2.1. Clinical Characteristics of Study Participants

Plasma lipid and metabolite profiles were comprehensively analyzed across three well-characterized cohorts: patients with serous ovarian carcinoma (n=103), benign ovarian tumors (n=107), and healthy controls (n=19) to identify potential diagnostic biomarkers. As detailed in Table 1, the OC cohort exhibited distinct histopathological characteristics, with high-grade serous carcinoma representing the predominant subtype (57%, n=59). The remaining cases comprised borderline tumors (27%, n=28) and low-grade carcinomas (16%, n=16). FIGO staging distribution among high-grade cases revealed 5 stage I (8.5%), 6 stage II (10.2%), 44 stage III (74.6%), and 4 stage IV (6.8%) patients.

2.2. Plasma Lipidome/Metabolome Data

Our HPLC-MS analysis identified 280 distinct lipid species, representing diverse biochemical classes. Lipid profile of plasma included 49 ether-linked glycerophospholipids (PC O-/P- and PE O-/P-), 45 diacylphosphatidylcholines (PC), and 36 oxidized lipids (OxL), along with 27 sphingomyelins (SM), 13 lysophospholipids (LPC, LPE), 15 monogalactosyldiacylglycerols (MGDG), 9 ceramides (Cer), 7 cholesterol esters (CE), and 54 triglycerides (TG). Detection specificity varied by lipid class: TG and CE were only observable in positive ion mode, while oxidized glycerophospholipids and diacylphosphatidylinositols required negative ion mode analysis. Cer and LPE demonstrated optimal detection in negative mode, in contrast to LPC, PC O-/P-, and SM, which were reliably detected in both ionization modes.

The NMR analysis expanded our metabolic characterization, identifying 33 crucial metabolites spanning several biochemical categories. These included amino acids such as alanine, arginine, and glutamine, alcohols like ethanol and myoinositol, various ketoacids including 2-hydroxybutyrate, and carboxylic acids such as citrate and lactate. Beyond individual metabolites, we derived 36 clinically significant metabolite ratios that provide insights into critical metabolic pathway activities.

Through integrated analysis of both HPLC-MS and NMR datasets, we performed rigorous feature selection followed by advanced multivariate statistical approaches. This comprehensive strategy enabled identification of the most biologically and clinically relevant molecular signatures, establishing a robust foundation for subsequent biomarker discovery and pathway analysis [36].

2.3. Feature Selection

2.3.1. Comparative Performance and Stability of Feature Selection Methods in Binary Classification

The most stable method for binary comparisons was the Support Vector Machine-Recursive Feature Elimination (SVM-RFE), as shown in Table 2. This method also demonstrated the highest stability due to its large final marker sets: 85 markers for distinguishing benign versus malignant ovarian tumors, 13 for benign tumors versus controls, and 14 for malignant tumors versus controls (Table S1). In contrast, LASSO and Boruta exhibited extremely low stability, failing to produce consistent marker sets (Table S1, Table 2).

Principal component (PC) space based on Mann-Whitney-selected markers yielded the optimal clustering results, with the lowest values for Hubert-Levin’s C-index and Davies-Bouldin index (Table 2). The SVM-RFE marker space performed worse than both the Mann-Whitney and Welch methods in terms of cluster compactness and separation. However, Orthogonal Projection on Latent Structures-Discriminant Analysis (OPLS-DA) markers achieved the highest Calinski-Harabasz pseudo-F statistic, followed closely by SVM-RFE markers (Table 3).

When evaluating the overall performance of binary feature selection methods, both SVM-RFE and Mann-Whitney achieved the highest total score (23 points). However, SVM-RFE was preferred due to its superior stability. Despite their equal scores, the marker sets from these methods showed minimal overlap. For instance, only five lipids were common to both sets when distinguishing benign from malignant tumors: SM d18:1/22:0, SM d18:2/14:0, TG 10:0_18:2_18:2, TG 16:1_22:4_8:0, and TG 18:0_18:1_18:2 (Figure 1A). Similarly, PC 16:0_20:1 was the sole shared lipid between marker sets for controls versus benign tumors (Figure 1B), while CE 18:3, CE 20:4, and LPC 18:2 were common to both methods for controls versus malignant tumors (Figure 1C).

2.3.2. Comparative Performance and Stability of Feature Selection Methods in Multiclass Analysis

The Kruskal-Wallis and Partial Least Squares Discriminant Analysis (PLS-DA) selection methods demonstrated the highest quality metrics, with the Kruskal-Wallis approach showing slightly greater stability in its feature set (Table 4). In contrast, RF and LASSO-based methods failed to produce stable feature sets across iterations (Table S2).

A significant overlap was observed between the Kruskal-Wallis and PLS-DA-derived marker sets, with more than 25% of features shared between the two (Figure 2 (A)). Notably, seven features were consistently identified across all multiclass and binary selection methods: SM d18:1/22:0, SM d18:1/22:1, TG 10:0_18:2_18:2, TG 16:1_22:4_8:0, TG 18:0_18:1_18:2, CE 20:4, and LPC 18:2 (Figure 2(B), Table S3).

2.4. Machine Learning Models in Ovarian Tumor Classification

The machine learning models were optimized using Particle Swarm Optimization (PSO) to determine their ideal hyperparameters. This included three SVM configurations with distinct kernel functions (polynomial, radial basis function, and sigmoid), three neural network architectures (Multilayer Perceptron (MLP), CNN, and Residual Convolutional Neural Network (ResNet)), and an Extreme Gradient Boosting (XGBoost) model. The resulting optimal hyperparameter sets for each algorithm are comprehensively presented in Supplementary Table S4. For distinguishing between benign and malignant tumors, a CNN-based model utilizing the Mann-Whitney marker set achieved the highest performance, with accuracy and mean recall both at 0.81, along with 77% sensitivity and 84% specificity. In classifying benign tumors versus controls, an XGBoost model with SVM-RFE-selected features demonstrated exceptional results, achieving 0.96 accuracy and 0.95 mean recall, with perfect sensitivity (100%) and high specificity (90%). For malignant tumor versus control classification, both RF (SVM-RFE set) and XGBoost (Mann-Whitney set) models performed equally well, each showing 0.92 accuracy and mean recall, 93% sensitivity, and 90% specificity.

In one-versus-one (OvO) classification, the CNN model with Mann-Whitney markers yielded the best results (accuracy: 0.78, mean recall: 0.77) (Figure 3(A)-(D), Figure 4, Table S5, S6). Overall, models using Mann-Whitney-selected features showed non-significantly higher accuracy (median: 0.78, IQR: 0.67-0.85) and mean recall (0.85, IQR: 0.76-0.90) compared to those using SVM-RFE features (accuracy: 0.74, IQR: 0.67-0.90; mean recall: 0.74, IQR: 0.68-0.90; p = 0.28 for accuracy, p = 0.17 for recall).

For multiclass classification, XGBoost with Kruskal-Wallis-selected features achieved the highest performance (accuracy: 0.78, mean recall: 0.77) (Figure 3(E)-(F), Table S7). Models using Kruskal-Wallis markers consistently outperformed others, showing significantly higher accuracy (median 0.72, IQR 0.70 – 0.74) than PLS-DA-based models (median 0.66, IQR 0.63 – 0.67, p=0.02) and marginally higher than SVM-RFE (median 0.72, IQR 0.69 – 0.72, p=0.07) and Mann-Whitney (median 0.72, IQR 0.67 – 0.73, p=0.05) sets. Similarly, they demonstrated significantly better mean recall (median 0.74, IQR 0.71-0.77) versus PLS-DA (median 0.68, IQR 0.63-0.69) and Mann-Whitney sets (median 0.74, IQR 0.68-0.75), in both cases p=0.01 and a non-significant improvement over SVM-RFE (median 0.73, IQR 0.70-0.73), p=0.10).

Comparing the top multiclass models, XGBoost (Kruskal-Wallis set) and the OvO CNN (Mann-Whitney set) showed comparable overall accuracy and recall. However, XGBoost had lower malignant tumor recall (60%) and benign tumor precision (69%), but higher benign tumor recall (84%) and malignant tumor precision (90%) relative to the CNN model (87%, 95%, 63%, 68%, respectively) (Table 5). Finally, XGBoost (median accuracy: 0.74, IQR: 0.72-0.89; recall: 0.76, IQR: 0.72-0.89) and CNN models (accuracy: 0.76, IQR: 0.73-0.85; recall: 0.77, IQR: 0.74-0.88) performed similarly (p=0.21 for accuracy, p=0.57 for recall), while CNNs significantly outperformed MLP in both metrics (median 0.74, IQR 0.69-0.81, p=0.02 for accuracy, (median 0.75, IQR 0.69-0.83, p=0.006 for recall).

3. Discussion

The identification of optimal feature sets represents a fundamental step in developing reliable predictive models for tumor classification, with significant implications for diagnostic accuracy and clinical decision-making. Feature selection methods such as Mann-Whitney U tests and SVM-RFE employ distinct statistical approaches yet frequently generate models with similar overall performance metrics. A closer examination reveals important distinctions between these approaches. Feature sets derived from Mann-Whitney testing demonstrate particularly strong performance in cluster separation quality, as quantified by established validation metrics. The elevated Davies-Bouldin index scores indicate more compact and well-separated clusters, while improved Hubert-Levin’s C index values reflect superior between-class discrimination. These properties suggest that Mann-Whitney selected features may be particularly valuable for applications requiring clear pathological categorization, such as distinguishing between benign and malignant tumor subtypes.

The SVM-RFE approach demonstrates complementary strengths in predictive modeling applications. Lopez and colleagues provided compelling evidence that models built using SVM-RFE selected features surpass those utilizing RF or Relief-based selection in terms of classification accuracy and biomarker [37]. This advantage likely stems from SVM-RFE’s iterative optimization process, which evaluates feature importance within the context of the classifier’s decision boundary rather than relying solely on univariate statistical tests. Such characteristics make SVM-RFE particularly effective for complex discrimination tasks where multiple biomarkers interact in non-linear ways to determine pathological status.

However, as Barbieri’s research team convincingly demonstrated, the performance of any feature selection method remains intrinsically tied to dataset characteristics [38]. Factors including sample size, class imbalance, measurement noise, and biological heterogeneity all substantially influence which selection approach proves most effective. For instance, in datasets with strong effect sizes and minimal confounding variables, simpler univariate methods like Mann-Whitney may suffice. Conversely, in scenarios involving high-dimensional data with numerous correlated features, more sophisticated techniques like SVM-RFE or embedded methods may be necessary to capture complex biomarker interactions.

The choice of feature selection methodology also carries important implications for model translation into clinical practice. While computationally intensive methods may achieve marginally better performance in research settings, simpler approaches often prove more practical for clinical implementation due to easier validation and interpretation. This trade-off between performance and practicality underscores the need for careful method selection aligned with the specific application requirements and implementation constraints. Future research directions should focus on developing adaptive selection frameworks that can automatically adjust to dataset characteristics while maintaining biological interpretability and clinical relevance.

This study represents a significant advancement in ovarian cancer diagnostics by being the first to systematically identify optimal machine learning algorithms for analyzing complex multi-omics data (plasma metabolites and lipids) in a large clinical cohort of 229 ovarian cancer patients. Our comprehensive evaluation demonstrates that XGBoost and RF models achieve an exceptional balanced accuracy of 92% for differentiation of OC from controls, with 93% sensitivity and 90% specificity - a performance level that compares favorably with established metabolomic approaches [39]. Notably, while Ban et al. found that SVM outperformed Adaptive Boosting and RF [39] , our results underscore the critical importance of algorithm selection tailored to specific data characteristics and diagnostic objectives [40]. In classifying benign tumors versus controls, our XGBoost model with SVM-RFE-selected features achieved perfect sensitivity (100%) and high specificity (90%). Similarly, Fei Long et al. (2025) identified plasma extracellular vesicle metabolites as highly discriminative biomarkers, with SVM and RF models achieving an AUC of 0.94 in differentiating OC from benign tumors [11].

While plasma metabolites show great promise, protein-based markers remain clinically relevant, though their performance varies depending on analyte combinations and detection methods. Diagnostic panels incorporating fibrinogen, D-dimer, and the well-established CA-125 marker have achieved notable sensitivity (92%) and specificity (79%) in some studies [41]. More complex protein signatures, such as those combining CA125 with IGFBP2, SPP1, TSP1, and ADI, have demonstrated accuracy comparable to advanced XGBoost models [42,43,44]. Similarly, logistic regression models using IL-8 and TNFα [6], as well as neutrophil gelatinase-associated lipocalin/matrix metallopeptidase-9 complexes [5], have shown diagnostic performance on par with machine learning approaches. However, multi-analyte models exhibit considerable variability; for instance, combinations of CA125, CCL20, and menopausal status yielded reduced accuracy (77%) compared to lipidomic-based XGBoost models (81%) [7]. This underscores the necessity of rigorous biomarker selection and validation.

MLP architecture represents a fundamental yet powerful type of artificial neural network particularly well-suited for clinical research applications. Its versatility stems from the ability to directly process diverse data types including -omics profiles, categorical clinical variables, and continuous numerical measurements [45,46,47,48]. This inherent flexibility has established MLP as a widely adopted approach across various clinical prediction tasks. Most notably, MLPs operate effectively on properly scaled data without necessitating transformation into pseudo-continuous representations. This characteristic significantly reduces preprocessing complexity and minimizes potential error introduction during data conversion steps [33].

The performance of MLPs varies significantly depending on the nature of the classification task, the dataset characteristics, and the comparative machine learning models [49]. In our study, MLPs exhibited lower diagnostic accuracy in binary classification tasks but demonstrated superior performance in multiclass problems. This aligns with findings from Wang et al., where MLPs outperformed SVMs, suggesting that their hierarchical learning structure may be better suited for complex, multi-category discrimination [50].

However, the effectiveness of MLPs is not universally consistent across studies. For instance, Long et al. reported that MLPs were less accurate than both RF and SVM models, contrasting with our observation that MLPs surpass Naive Bayes (NB) classifiers [11]. This discrepancy may stem from differences in dataset composition, feature selection, or model hyperparameter tuning. Interestingly, in the differential diagnosis of inflammatory myopathy subtypes, MLPs ranked below RF and SVM but still exceeded the performance of NB, reinforcing the notion that MLPs occupy an intermediate position among machine learning classifiers in certain biomedical applications [14].

Notably, MLPs exhibit strong diagnostic capabilities in specific clinical contexts. For example, in Parkinson’s disease detection, MLPs and XGBoost models achieve high classification accuracy, whereas RF and SVM underperform [15]. This suggests that neural network-based approaches may be particularly effective for neurodegenerative disorder diagnostics, possibly due to their ability to capture non-linear patterns in heterogeneous biomedical data.

CNNs have emerged as powerful tools for diagnostic applications, consistently outperforming traditional machine learning methods across multiple studies [51,52,53,54,55]. However, CNN efficacy is highly dependent on dataset characteristics, with sample size being a critical limiting factor. Several studies have reported significant performance degradation when CNNs are applied to smaller datasets [56,57,58]. This data-hungry nature of deep learning architectures means that in resource-constrained scenarios with limited sample availability, simpler machine learning methods may offer comparable diagnostic accuracy while providing additional benefits in terms of computational efficiency and interpretability [33].

Notably, our findings indicate that CNNs consistently outperform MLPs, particularly when the input data can be effectively transformed into an image-like representation. This performance gap highlights the importance of proper data structuring for neural network applications. The process of converting conventional tabular data into artificial image formats, while computationally intensive, appears justified by the subsequent improvements in model accuracy [59].

This study advances ovarian cancer diagnostics through a rigorous, large-scale integration of metabolomic and lipidomic data with machine learning. Our rigorous approach encompasses several strengths. First, the study leverages a substantial clinical cohort of 229 patients, including carefully matched comparison groups of benign ovarian neoplasms and healthy controls, enabling rigorous differential diagnosis evaluation. Second, the implementation of strict selection criteria—including sample collection prior to any therapeutic intervention or surgical procedure—ensures minimal confounding from treatment effects while providing a clear window into disease-specific metabolic alterations.

The usage of blood plasma as a biospecimen offers particular clinical advantages, being both minimally invasive and readily accessible for potential diagnostic implementation. Our deep metabolic profiling approach combines complementary analytical platforms: comprehensive lipidomic analysis using HPLC-MS in both ionization modes with MS/MS identification, coupled with NMR-based characterization of the low-molecular-weight metabolome. This dual-platform strategy provides exceptional coverage of both hydrophobic and hydrophilic metabolite fractions, capturing a more complete metabolic signature than single-platform approaches.

From a computational perspective, the study makes three key contributions: (1) an exhaustive systematic comparison of feature selection methods and classification algorithms, revealing context-dependent performance advantages; (2) a sophisticated evaluation of binary versus multiclass classification strategies, including OvO architecture benchmarking; and (3) implementation of PSO to efficiently explore over 400 hyperparameter combinations per method-task pairing, ensuring robust model configuration. The resulting models demonstrated high discriminatory power, with XGBoost achieving 96% accuracy in benign versus control classification and CNNs reaching 81% accuracy in malignant versus benign differentiation. Beyond diagnostic performance, these models provide valuable biological insights into ovarian cancer metabolism through their identified feature signatures. Furthermore, the study establishes a methodological framework that could be extended to other cancers or multi-omics investigations.

Several considerations warrant attention when interpreting these findings. The cohort, while substantial, included a relatively small number of healthy controls, which may affect the generalizability of results. The observed variability in selected features across different methods suggests that biomarker robustness requires further validation. Although the machine learning models performed well, the inherent complexity of some algorithms, particularly deep learning approaches, may pose challenges for clinical implementation where model interpretability is valued. Practical barriers to clinical translation, including assay standardization and integration with existing workflows, were not addressed in this study. Additionally, the absence of external validation limits confidence in the models’ performance across diverse populations. Future research incorporating larger, multi-center cohorts and additional -omics layers would strengthen the findings and enhance their translational potential.

4. Materials and Methods

4.1. Study Design

The OC cohort comprised patients who underwent cytoreductive surgery at the V.I. Kulakov National Medical Research Center for Obstetrics, Gynecology, and Perinatology (NMRC for OGP, Moscow, Russia) between November 2019 and July 2020. The study included 229 participants divided into three distinct groups. The OC group consisted of 103 patients with histologically verified serous ovarian tumors, comprising 59 cases of high-grade serous carcinoma (including 10 at FIGO stages IA-IIB and 49 at stages IIC-IVA), 16 low-grade serous carcinomas, and 28 serous borderline tumors. For comparison, 107 patients with benign ovarian pathologies were enrolled, including 30 serous cystadenomas, 56 endometrioid cysts, and 21 mature teratomas. Additionally, 19 healthy women without any ovarian pathology formed the control group, with their status confirmed through comprehensive clinical evaluation involving detailed medical history, pelvic ultrasound examination, complete blood tests (both clinical and biochemical parameters), and assessment of specific tumor markers (CA125 and HE4) accompanied by ROMA and RMI index calculations.

The study adhered to the ethical standards of the institutional research committee, Russian federal laws, and the 1964 Helsinki Declaration (and its later amendments). Written informed consent was obtained from all participants, and the study protocol (No. 10, December 5, 2019) was approved by the NMRC for OGP Ethics Committee.

Patients in the OC group were required to have histologically confirmed serous ovarian carcinoma within FIGO stages I-IV, while those in the comparison group needed histological verification of benign ovarian lesions (serous cystadenomas, endometrioid cysts, or mature teratomas).

Uniform exclusion criteria applied to all study groups included: (1) age <18 years; (2) current or recent (≤6 months) hormonal therapy (oral contraceptives or hormone replacement therapy); (3) confirmed BRCA mutations; and (4) significant comorbidities including diabetes mellitus, active inflammatory/infectious diseases, or current pregnancy. The OC group had additional exclusions for patients with primary multiple malignancies or mixed epithelial ovarian tumor histologies. Both the benign lesion comparison group and healthy controls were excluded for any history of pelvic surgeries or prior malignancy diagnoses.

Blood samples were collected preoperatively in K2EDTA vacutainer tubes prior to administration of any perioperative medications (including antibiotics and analgesics). Whole blood was immediately processed by two-step centrifugation: first at 300 × g for 20 minutes at 4°C to separate cellular components, followed by collection of the supernatant which underwent secondary centrifugation at 12,000 × g for 10 minutes at room temperature to obtain platelet-poor plasma. The final plasma aliquots were transferred to pre-labeled cryovials using wide-bore pipette tips to minimize shear stress, and immediately stored at -80°C in a monitored freezer until analysis.

4.2. Lipidomic Analysis of Blood Plasma Samples (HPLC-MS)

Lipidomic profiling was conducted using an established laboratory protocol [13,60,61]. Plasma lipid extraction was performed via a modified Folch method where 40 μL of plasma was mixed with 480 μL of chloroform:methanol (2:1, v/v) and vortexed in an ultrasonic bath for 10 minutes. After adding 150 μL of deionized water, the mixture was centrifuged at 13,000 × g for 5 minutes at 20°C. The organic phase containing lipids was collected, evaporated under a gentle nitrogen stream, and reconstituted in 200 μL of isopropanol:acetonitrile (2:1, v/v).

To ensure analytical reliability, pooled quality control (QC) samples were prepared by combining equal 50 μL aliquots from all study participants’ samples, creating a representative reference matrix. Blank samples were prepared with isopropanol:acetonitrile (2:1, v/v) solvent mixture. QC samples were systematically injected every 10 study samples throughout the HPLC-MS batch runs. For each sample batch analysis, the first three samples analyzed were blanks, and before each QC samples blank sample was analyzed.

Chromatographic separation was achieved using an Ultimate 3000 HPLC system (Thermo Scientific, Germany) coupled to a Maxis Impact qTOF mass spectrometer (Bruker Daltonics, Germany). Separation was performed on a Zorbax XDB-C18 column (250 × 0.5 mm, 5 μm; Agilent, USA) maintained at 50°C with a flow rate of 35 μL/min. The mobile phase consisted of Eluent A (10 mM ammonium formate with 0.1% formic acid in water:acetonitrile [40:60, v/v]) and Eluent B (10 mM ammonium formate with 0.1% formic acid in isopropanol:acetonitrile:water [90:8:2, v/v/v]). A linear gradient increased Eluent B from 30% to 95% over 25 minutes.

MS analysis of study samples was performed in both positive (400-1500 m/z) and negative (100-1000 m/z) ionization modes with capillary voltages of +4.1 kV and -3.0 kV, respectively. The nebulizer gas pressure was maintained at 0.7 bar with a dry gas flow of 6 L/min at 200°C. For comprehensive lipid identification, data-dependent MS/MS acquisition was performed on QC samples. The instrument dynamically selected the top three most intense precursor ions from each full scan for fragmentation, applying a normalized collision energy of 35 eV. A dynamic exclusion window of 60 seconds was implemented to prevent repeated fragmentation of dominant ions, ensuring broader coverage of lower-abundance species.

Lipid identification was performed using LipidMatch [62] after data preprocessing, with inter-batch normalization by autoscaling [63]. All lipid species are reported according to LIPID MAPS classification [64].

4.3. Metabolomic Analysis by NMR Spectroscopy

Plasma metabolomic profiling was performed using 700 MHz NMR spectroscopy. Two phosphate buffer systems were prepared: Buffer A consisted of 80:20 H₂O/D₂O (v/v) sodium-phosphate buffer (pH 7.4) containing 6.15 mM sodium azide (NaN₃) and 4.64 mM 3-(trimethylsilyl)propionic-2,2,3,3-d₄ acid (TSP, Cambridge Isotope Laboratories Inc., UK) sodium salt as an internal reference. Buffer B contained sodium-phosphate buffer in D₂O (pH 7.4) with 1.5 M K₂HPO₄, 2 mM NaN₃, and 4 mM TSP. For analysis, 120 μL of plasma was mixed with 120 μL of buffer solution, and 190 μL of this mixture was transferred to 5 mm NMR tubes (Bruker Biospin Ltd., Germany) and maintained at 6°C until measurement.

All ¹H-NMR spectra were acquired on a Bruker 700 MHz AVANCE NEO spectrometer (Bruker BioSpin, Germany) equipped with a Prodigy cryogenic probe at 37°C, with temperature calibration performed using d₄-methanol (99.8% purity). The acquisition employed a Carr-Purcell-Meiboom-Gill (CPMG) pulse sequence with presaturation for water suppression, incorporating 128 refocusing pulses (0.6 ms echo delay each) for a total T₂ filtering period of 78 ms. Following 4 dummy scans, spectra were collected with 73,728 data points across a 12,019 Hz spectral width.

Metabolite identification was performed using Bruker Biorefcode (Bruker BioSpin, Germany) by matching both 1D and 2D J-resolved spectra against reference libraries. Semi-automated quantification was conducted using Chenomx NMR Suite 9.0 (Chenomx Inc., Canada), with metabolite concentrations calculated relative to the 0.4 mM TSP reference standard [36].

4.4. Feature Selection and Stability Analysis

Lipidomic and metabolomic data were integrated and processed to reduce feature dimensionality. For binary classification, seven feature selection methods were employed: (1) Wilcoxon-Mann-Whitney test (p<0.05), (2) Welch’s t-test (p<0.05), (3) OPLS-DA with VIP >1 [65], (4) RF (top √n features by Gini index) [66], SVM-RFE based on SVM weights( iterative elimination until model accuracy decreases) [67], LASSO (non-zero coefficients) [68], and (7) Boruta (all-relevant selection) [69]. For multiclass classification, five methods were used: (1) Kruskal-Wallis test (p<0.05), (2) PLS-DA, VIP >1 (3) RF (top √n features) [66], (4) LASSO (non-zero coefficients) [68], and (5) Boruta (all-relevant selection) [69] (Figure 5(A)).

Method stability was assessed through 100 iterations on randomly selected 70% subsamples. Features consistently selected in all iterations were considered robust group discriminators. The robustness of the feature selection method was further quantified using Koch’s biotic diversity index [70]. A PC space was constructed from the final selected features, and three cluster validation metrics were computed: (1) Hubert-Levin’s normalized C-index [71], (2) Davies-Bouldin index [72], and (3) Calinski-Harabasz pseudo-F statistic [73] (Figure 5(A)). Missing values were imputed as 100 (C-index/Davies-Bouldin) or -100 (Calinski-Harabasz).

Each method was ranked (7/5 to 1 for binary/multiclass) across all metrics, with the highest-scoring method’s features advancing to model selection.

4.5. Classification Model Selection

To address the significant class imbalance (1:5.6:5.4 ratio) in our dataset, we employed the Synthetic Minority Over-sampling Technique (SMOTE) [76], generating 50 synthetic control samples to improve model training [74]. The balanced dataset was then partitioned into training (70%) and test (30%) sets while preserving the original distribution patterns. For binary classification, eleven methods were evaluated: NB, OPLS-DA, RFt, SVM (with linear, polynomial, radial, and sigmoid kernels), XGBoost, MLP, CNN, and ResNet. Multiclass classification was performed using seven selected methods: NB, PLS-DA, RF, XGBoost, MLP, CNN, and ResNet (Figure 5(B), 5(C)). Additionally, we implemented a OvO strategy for binary classifiers, where classification accuracy scores from individual binary models were aggregated to enhance multiclass prediction performance.

For CNN and ResNet models, input data were transformed into 2D representations using the DeepInsight methodology [75]. These models incorporated a GELU activation layer followed by dropout (rate = 0.1). All neural networks were trained with an initial learning rate of 0.01 and Adamax optimization; the learning rate decayed at 0.5 for all models except the binary ResNet, which used a decay rate of 0.9. Hyperparameter tuning for SVMs, XGBoost, and the architectures of MLP, CNN, and ResNet was performed via PSO [76].

All analyses were conducted in R 4.3.3 using the following packages: ropls [77], RandomForest [78], e1071 [79], glmnet [80], boruta [69], xgboost [81], clustersim [82], smotefamily [83], lsa [84], tsne [85], cxhull [86], caret [87], keras [88].

5. Conclusions

The integration of multi-omics data with advanced machine learning algorithms represents a transformative approach in ovarian cancer diagnostics. Our large-scale analysis of plasma metabolites and lipids systematically evaluated different feature selection strategies and machine learning models for tumor classification, comparing their performance in both binary and multiclass settings. Among binary classification approaches, SVM-RFE and Mann-Whitney methods demonstrated comparable performance scores. However, SVM-RFE emerged as the preferred choice due to its significantly higher stability (mean stability score: 0.75 vs. 0.40), despite limited biomarker overlap between the two methods.

For multiclass classification, Kruskal-Wallis and PLS-DA-based selection methods demonstrated equivalent performance metrics, with Kruskal-Wallis showing slightly superior feature selection stability (0.47 compared to 0.46). These methods exhibited substantial feature overlap, sharing more than 25% common markers while identifying seven consensus biomarkers across all selection approaches.

The machine learning model analysis yielded several critical findings. CNN architectures utilizing Mann-Whitney selected features achieved optimal performance in malignant versus benign classification, attaining 81% accuracy and mean recall. XGBoost models utilizing SVM-RFE features excelled in benign versus control classification, achieving exceptional 95% accuracy with perfect 100% sensitivity. In multiclass evaluation, XGBoost models incorporating Kruskal-Wallis selected features reached the highest classification accuracy of 78%, representing statistically significant improvement over alternative methods.

The clinical implementation of these advanced diagnostic models will require careful consideration of practical factors such as assay standardization, reproducibility across platforms, and integration with existing clinical workflows. Nevertheless, the demonstrated performance of machine learning-driven multi-omics analysis offers a promising path toward more accurate, earlier, and potentially more accessible ovarian cancer detection, addressing a critical unmet need in women’s health care.

From a translational perspective, these findings open new possibilities for developing liquid biopsy tests that could complement or potentially reduce reliance on current diagnostic methods. The identification of robust metabolic and lipidomic signatures through machine learning approaches may also provide insights into ovarian cancer pathogenesis and reveal new therapeutic targets. As the field progresses, continued refinement of these models through larger multicenter studies and the incorporation of additional -omics layers (such as proteomics and transcriptomics) may further enhance their diagnostic and prognostic utility.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org. Table S1: Potential markers, selected by each feature selection method in three binary comparisons; Table S2. Potential markers, selected by each feature selection method in multiclass comparisons; Table S3. Molecular features, common for each combination of sets, selected by feature selection methods in multiclass tasks; Table S4. Hyperparameters of the classification models for each set of markers, optimized by PSO; Table S5. Model accuracy based on binary marker sets and final one-vs-one (OvO) classification; Table S6. Mean recall of models based on binary marker sets and accuracy of final one-vs-one (OvO) classification; Table S7. Performance evaluation (accuracy and mean recall) of machine learning models comparing feature selection strategies for binary and multiclass classification. Best results are highlighted in bold. Supplementary dataset.xlsx: omics data, included in the study; train dataset; test dataset.

Author Contributions

Conceptualization, A.T., M.I. and N.S.; data curation, M.I., A.N., V.C., E.K., and V.F.; formal analysis, A.T., M.I., N.S., E.K. and V.C.; funding acquisition, M.I., V.F. and G.S.; investigation, M.I., N.S., V.C., A.N. and E.K.; methodology, N.S., A.T., V.F. and G.S.; project administration, M.I., V.F. and G.S.; resources, V.F., V.C., N.S. and G.S.; software, A.T., E.K., and A.N.; supervision, M.I., V.F. and G.T.; visualization, N.S., V.C., A.T., E.K. and A.N.; writing—original draft, N.S., M.I., A.T., V.C., A.N, and E.K.; writing—review and editing, V.F. and G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Russian Science Foundation (No. 24-25-00407): https://rscf.ru/en/project/24-25-00407/ (accessed on 27 May 2025).

Institutional Review Board Statement

This study was approved by the Ethical Committee of the National Medical Research Center for Obstetrics, Gynecology, and Perinatology named after Academician V.I. Kulakov (protocol No. 10, dated 05 December 2019).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

Data are contained within the Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial neural network.
CE	Cholesterol esters
Cer(P)	Ceramide (phosphate)
CNN	Convolutional Neural Network
DG	Diacylglycerols
FIGO	International Federation of Gynecology and Obstetrics
LASSO	Least Absolute Shrinkage and Selection operator
MLP	Multilayer Perceptron
NB	Naive Bayes
OPLS-DA	Orthogonal Projection on Latent Structures-Discriminant Analysis
PLS-DA	Partial Least Squares Discriminant Analysis
PC (P-/O-)	(plasmenyl-/plasmanyl) Phosphatidylcholine
PE (P-/O-)	(plasmenyl-/plasmanyl) Phosphatidylethanolamine
PSO	Particle Swarm Optimization
ResNet	Residual Convolutional Neural Network
RFE	Recursive Feature Elimination
RF	Random Forest
SM	Sphingomyelins
SVM	Support Vector Machine
TG	Triacylglycerol
XGBoost	Extreme Gradient Boosting
VIP	Variable Importance projection

References

International Agency for Research on Cancer Global cancer statistics; 2022.
Liest, A.L.; Omran, A.S.; Mikiver, R.; Rosenberg, P.; Uppugunduri, S. RMI and ROMA are equally effective in discriminating between benign and malignant gynecological tumors: A prospective population-based study. Acta Obstet. Gynecol. Scand. 2019, 98, 24–33. [Google Scholar] [CrossRef] [PubMed]
Henderson, J.T.; Webber, E.M.; Sawaya, G.F. Screening for Ovarian Cancer: An Updated Evidence Review for the U.S. Preventive Services Task Force; Agency for Healthcare Research and Quality (US): Rockville (MD), 2018. [Google Scholar]
Matsas, A.; Stefanoudakis, D.; Troupis, T.; Kontzoglou, K.; Eleftheriades, M.; Christopoulos, P.; Panoskaltsis, T.; Stamoula, E.; Iliopoulos, D.C. Tumor Markers and Their Diagnostic Significance in Ovarian Cancer. Life 2023, 13, 1689. [Google Scholar] [CrossRef]
Gupta, R.K.; Dholariya, S.; Radadiya, M.; Agarwal, P. NGAL/MMP-9 as a Biomarker for Epithelial Ovarian Cancer: A Case–Control Diagnostic Accuracy Study Rohit. Saudi J. Med. Med. Sci. 2022, 10, 25–30. [Google Scholar] [CrossRef] [PubMed]
Pawlik, W.; Pawlik, J.; Kozłowski, M.; Łuczkowska, K.; Kwiatkowski, S.; Kwiatkowska, E.; Machaliński, B.; Cymbaluk-Płoska, A. The clinical importance of il-6, il-8, and tnf-α in patients with ovarian carcinoma and benign cystic lesions. Diagnostics 2021, 11, 1625. [Google Scholar] [CrossRef]
Sakares, W.; Wongkhattiya, W.; Vichayachaipat, P.; Chaiwut, C.; Yodsurang, V.; Nutthachote, P. Accuracy of CCL20 expression level as a liquid biopsy-based diagnostic biomarker for ovarian carcinoma. Front. Oncol. 2022, 12, 1038835. [Google Scholar] [CrossRef]
De Silva, S.; Alli-Shaik, A.; Gunaratne, J. Machine Learning-Enhanced Extraction of Biomarkers for High-Grade Serous Ovarian Cancer from Proteomics Data. Sci. Data 2024, 11, 685. [Google Scholar] [CrossRef] [PubMed]
Ning, L.; Lang, J.; Wu, L. Plasma circN4BP2L2 is a promising novel diagnostic biomarker for epithelial ovarian cancer. BMC Cancer 2022, 22, 6. [Google Scholar] [CrossRef]
Rong, J.; Sun, G.; Zhu, J.; Zhu, Y.; Chen, Z. Combination of plasma-based lipidomics and machine learning provides a useful diagnostic tool for ovarian cancer. J. Pharm. Biomed. Anal. 2025, 253, 116559. [Google Scholar] [CrossRef]
Long, F.; Pu, X.Y.; Wang, X.; Ma, D.X.; Gao, S.H.; Shi, J.; Zhong, X.C.; Ran, R.; Wang, L.L.; Chen, Z.; et al. A metabolic fingerprint of ovarian cancer: a novel diagnostic strategy employing plasma EV-based metabolomics and machine learning algorithms. J. Ovarian Res. 2025, 18, 26. [Google Scholar] [CrossRef]
Chagovets, V.; Starodubtseva, N.; Tokareva, A.; Novoselova, A.; Patysheva, M.; Larionova, I.; Prostakishina, E.; Rakina, M.; Kazakova, A.; Topolnitskiy, E.; et al. Specific changes in amino acid profiles in monocytes of patients with breast, lung, colorectal and ovarian cancers. Front. Immunol. 2023, 14, 1332043. [Google Scholar] [CrossRef]
Iurova, M. V.; Chagovets, V. V.; Pavlovich, S. V.; Starodubtseva, N.L.; Khabas, G.N.; Chingin, K.S.; Tokareva, A.O.; Sukhikh, G.T.; Frankevich, V.E. Lipid Alterations in Early-Stage High-Grade Serous Ovarian Cancer. Front. Mol. Biosci. 2022, 9, 770983. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Zhao, L.; Jiang, Y.; Li, L.; Guo, M.; Mu, Y.; Zhu, H. Integrated analysis of plasma and urine reveals unique metabolomic profiles in idiopathic inflammatory myopathies subtypes. J. Cachexia. Sarcopenia Muscle 2022, 13, 2456–2472. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.D.; Xue, C.; Kolachalama, V.B.; Donald, W.A. Interpretable Machine Learning on Metabolomics Data Reveals Biomarkers for Parkinson’s Disease. ACS Cent. Sci. 2023, 9, 1035–1045. [Google Scholar] [CrossRef]
Yan, Q.; He, D.; Walker, D.I.; Uppal, K.; Wang, X.; Orimoloye, H.T.; Jones, D.P.; Ritz, B.R.; Heck, J.E. The neonatal blood spot metabolome in retinoblastoma. EJC Paediatr. Oncol. 2023, 2, 1–16. [Google Scholar] [CrossRef]
Pyragius, C.E.; Fuller, M.; Ricciardelli, C.; Oehler, M.K. Aberrant lipid metabolism: An emerging diagnostic and therapeutic target in ovarian cancer. Int. J. Mol. Sci. 2013, 14, 7742–7756. [Google Scholar] [CrossRef]
Salim, Y.A.-; Galazis, N.; Bracewell-, T.; Phelps, D.L.; Jones, B.P.; Chan, M.; Munoz-, M.D.; Matsuzono, T.; Smith, J.R.; Yazbek, J.; et al. The application of metabolomics in ovarian cancer management : a systematic review. Int. J. Gynecol. Cancer 2021, 31, 754–774. [Google Scholar] [CrossRef]
Fan, L.; Zhang, W.; Yin, M.; Zhang, T.; Wu, X.; Zhang, H.; Sun, M.; Li, Z.; Hou, Y.; Zhou, X.; et al. Identification of metabolic biomarkers to diagnose epithelial ovarian cancer using a UPLC/QTOF/MS platform. Acta Oncol. (Madr). 2012, 51, 473–479. [Google Scholar] [CrossRef] [PubMed]
Wörheide, M.A.; Krumsiek, J.; Kastenmüller, G.; Arnold, M. Multi-omics integration in biomedical research – A metabolomics-centric review. Anal. Chim. 2021, 1141, 144–162. [Google Scholar] [CrossRef]
Papoutsoglou, G.; Tarazona, S.; Lopes, M.B.; Klammsteiner, T.; Ibrahimi, E.; Eckenberger, J.; Novielli, P.; Tonda, A.; Simeon, A.; Shigdel, R.; et al. Machine learning approaches in microbiome research: challenges and best practices. Front. Microbiol. 2023, 14, 1261889. [Google Scholar] [CrossRef]
Brix, F.; Demetrowitsch, T.; Jensen-Kroll, J.; Zacharias, H.U.; Szymczak, S.; Laudes, M.; Schreiber, S.; Schwarz, K. Evaluating the Effect of Data Merging and Postacquisition Normalization on Statistical Analysis of Untargeted High-Resolution Mass Spectrometry Based Urinary Metabolomics Data. Anal. Chem. 2024, 96, 33–40. [Google Scholar] [CrossRef]
Chua, A.E.; Pfeifer, L.D.; Sekera, E.R.; Hummon, A.B.; Desaire, H. Workflow for Evaluating Normalization Tools for Omics Data Using Supervised and Unsupervised Machine Learning. J. Am. Soc. Mass Spectrom. 2023, 34, 2775–2784. [Google Scholar] [CrossRef] [PubMed]
Tokareva, A.; Starodubtseva, N.; Frankevich, V.; Silachev, D. Minimizing Cohort Discrepancies: A Comparative Analysis of Data Normalization Approaches in Biomarker Research. Computation 2024, 12, 137. [Google Scholar] [CrossRef]
Tokareva, A.O.; Chagovets, V. V.; Kononikhin, A.S.; Starodubtseva, N.L.; Nikolaev, E.N.; Frankevich, V.E. Comparison of the effectiveness of variable selection method for creating a diagnostic panel of biomarkers for mass spectrometric lipidome analysis. J. Mass Spectrom. 2021, 56, e4702. [Google Scholar] [CrossRef] [PubMed]
Abd-Elnaby, M.; Alfonse, M.; Roushdy, M. Classification of breast cancer using microarray gene expression data: A survey. J. Biomed. Inform. 2021, 117, 103764. [Google Scholar] [CrossRef]
Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Front. Bioinforma. 2022, 2, 927312. [Google Scholar] [CrossRef]
Wu, Z.; Chen, H.; Ke, S.; Mo, L.; Qiu, M.; Zhu, G.; Zhu, W.; Liu, L. Identifying potential biomarkers of idiopathic pulmonary fibrosis through machine learning analysis. Sci. Rep. 2023, 13, 16559. [Google Scholar] [CrossRef]
Tian, Y.; Tao, K.; Li, S.; Chen, X.; Wang, R.; Zhang, M.; Zhai, Z. Identification of m6A-Related Biomarkers in Systemic Lupus Erythematosus: A Bioinformation-Based Analysis. J. Inflamm. Res. 2024, 17, 507–526. [Google Scholar] [CrossRef]
Zhu, T.; Ma, Y.; Wang, J.; Xiong, W.; Mao, R.; Cui, B.; Min, Z.; Song, Y.; Chen, Z. Serum Metabolomics Reveals Metabolomic Profile and Potential Biomarkers in Asthma. Allergy, Asthma Immunol. Res. 2024, 16, 235–252. [Google Scholar] [CrossRef]
Chardin, D.; Humbert, O.; Bailleux, C.; Burel-Vandenbos, F.; Rigau, V.; Pourcher, T.; Barlaud, M. Primal-dual for classification with rejection (PD-CR): a novel method for classification and feature selection—an application in metabolomics studies. BMC Bioinformatics 2021, 22, 594. [Google Scholar] [CrossRef]
Zhou, D.; Zhu, W.; Sun, T.; Wang, Y.; Chi, Y.; Chen, T.; Lin, J. iMAP: A Web Server for Metabolomics Data Integrative Analysis. Front. Chem. 2021, 9, 1–13. [Google Scholar] [CrossRef]
Alamro, H.; Thafar, M.A.; Albaradei, S.; Gojobori, T.; Essack, M.; Gao, X. Exploiting machine learning models to identify novel Alzheimer’s disease biomarkers and potential targets. Sci. Rep. 2023, 13, 4979. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Greenwood, P.; Klein, M.S. Deep learning for rapid identification of microbes using metabolomics profiles. Metabolites 2021, 11, 863. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Zhou, Z.; Dong, J.; Fu, Y.; Li, Y.; Luan, Z.; Peng, X. Predicting breast cancer 5-year survival using machine learning: A systematic review. PLoS One 2021, 16, e0250370. [Google Scholar] [CrossRef]
Chagovets, V. V.; Vasil’ev, V.G.; Iurova, M. V.; Khabas, G.N.; Pavlovich, S. V.; Starodubtseva, N.L.; Mayboroda, O.A. Metabolic “footprints” of the circulating cancer mucins: CA125 in the high-grade ovarian cancer. Bull. Russ. State Med. Univ. 2021, 10–16. [Google Scholar] [CrossRef]
López, N.C.; García-Ordás, M.T.; Vitelli-Storelli, F.; Fernández-Navarro, P.; Palazuelos, C.; Alaiz-Rodríguez, R. Evaluation of feature selection techniques for breast cancer risk prediction. Int. J. Environ. Res. Public Health 2021, 18, 10670. [Google Scholar] [CrossRef]
Barbieri, M.C.; Grisci, B.I.; Dorn, M. Analysis and comparison of feature selection methods towards performance and stability. Expert Syst. Appl. 2024, 249, 123667. [Google Scholar] [CrossRef]
Ban, D.; Housley, S.N.; Matyunina, L. V.; McDonald, L.D.E.; Bae-Jump, V.L.; Benigno, B.B.; Skolnick, J.; McDonald, J.F. A personalized probabilistic approach to ovarian cancer diagnostics. Gynecol. Oncol. 2024, 182, 168–175. [Google Scholar] [CrossRef]
Wu, Z.; Zhu, M.; Kang, Y.; Leung, E.L.H.; Lei, T.; Shen, C.; Jiang, D.; Wang, Z.; Cao, D.; Hou, T. Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets. Brief. Bioinform. 2021, 22, 1–17. [Google Scholar] [CrossRef]
Farzaneh, F.; Salimnezhad, M.; Hosseini, M.S.; Ganjoei, T.A.; Arab, M.; Talayeh, M. D-dimer, Fibrinogen and Tumor Marker Levels in Patients with benign and Malignant Ovarian Tumorsneovascularization. Asian Pacific J. Cancer Prev. 2023, 24, 4263–4268. [Google Scholar] [CrossRef]
Hasenburg, A.; Eichkorn, D.; Vosshagen, F.; Obermayr, E.; Geroldinger, A.; Zeillinger, R.; Bossart, M. Biomarker-based early detection of epithelial ovarian cancer based on a five-protein signature in patient’s plasma – a prospective trial. BMC Cancer 2021, 21, 1037. [Google Scholar] [CrossRef]
Shan, D.; Cheng, S.; Ma, Y.; Peng, H. Serum levels of tumor markers and their clinical significance in epithelial ovarian cancer. J. Cent. South Univ. (Medical Sci. 2023, 48, 1039–1049. [Google Scholar] [CrossRef]
Periyasamy, A.; Gopisetty, G.; Subramanium, M.J.; Velusamy, S.; Rajkumar, T. Identification and validation of differential plasma proteins levels in epithelial ovarian cancer. J. Proteomics 2020, 226, 103893. [Google Scholar] [CrossRef] [PubMed]
Nazarizadeh, A.; Banirostam, T.; Biglari, T.; Kalantarhormozi, M.; Chichagi, F.; Behnoush, A.H.; Habibi, M.A.; Shahidi, R. Integrated neural network and evolutionary algorithm approach for liver fibrosis staging: Can artificial intelligence reduce patient costs? JGH Open 2024, 8, e13075. [Google Scholar] [CrossRef] [PubMed]
Qaderi, K.; Sharifipour, F.; Dabir, M.; Shams, R.; Behmanesh, A. Artificial intelligence ( AI ) approaches to male infertility in IVF : a mapping review. Eur. J. Med. Res. 2025, 30, 246. [Google Scholar] [CrossRef]
Nahar, A.; Paul, S.; Saikia, M.J. A systematic review on machine learning approaches in cerebral palsy research. PeerJ 2024, 12, 1–21. [Google Scholar] [CrossRef]
Smiley, A.; Villarreal-Zegarra, D.; Reategui-Rivera, C.M.; Escobar-Agreda, S.; Finkelstein, J. Methodological and reporting quality of machine learning studies on cancer diagnosis, treatment, and prognosis. Front. Oncol. 2025, 15. [Google Scholar] [CrossRef]
Gómez-Pascual, A.; Naccache, T.; Xu, J.; Hooshmand, K.; Wretlind, A.; Gabrielli, M.; Lombardo, M.T.; Shi, L.; Buckley, N.J.; Tijms, B.M.; et al. Paired plasma lipidomics and proteomics analysis in the conversion from mild cognitive impairment to Alzheimer’s disease. Comput. Biol. Med. 2024, 176, 108588. [Google Scholar] [CrossRef]
Wang, K.; Theeke, L.A.; Liao, C.; Wang, N.; Lu, Y.; Xiao, D.; Xu, C. Deep learning analysis of UPLC-MS/MS-based metabolomics data to predict Alzheimer’s disease. J. Neurol. Sci. 2023, 453, 120812. [Google Scholar] [CrossRef]
Zhang, T.H.; Hasib, M.M.; Chiu, Y.C.; Han, Z.F.; Jin, Y.F.; Flores, M.; Chen, Y.; Huang, Y. Transformer for Gene Expression Modeling (T-GEM): An Interpretable Deep Learning Model for Gene Expression-Based Phenotype Predictions. Cancers (Basel). 2022, 14, 4763. [Google Scholar] [CrossRef]
Kalkan, H.; Akkaya, U.M.; Inal-Gültekin, G.; Sanchez-Perez, A.M. Prediction of Alzheimer’s Disease by a Novel Image-Based Representation of Gene Expression. Genes (Basel). 2022, 13, 1406. [Google Scholar] [CrossRef]
El-Melegy, M.; Mamdouh, A.; Ali, S.; Badawy, M.; El-Ghar, M.A.; Alghamdi, N.S.; El-Baz, A. Prostate Cancer Diagnosis via Visual Representation of Tabular Data and Deep Transfer Learning. Bioengineering 2024, 11, 635. [Google Scholar] [CrossRef] [PubMed]
Karim, A.; Su, Z.; West, P.K.; Keon, M.; The NYGC ALS Consortium, J.S.; Brennan, S.; Wong, T.; Milicevic, O.; Teunisse, G.; Rad, H.N.; et al. Molecular Classification and Interpretation of Amyotrophic Lateral Sclerosis Using Deep Convolution Neural Networks and Shapley Values. Genes (Basel). 2021, 12, 1754. [Google Scholar] [CrossRef] [PubMed]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Ng, W.; Minasny, B.; de Sousa Mendes, W.; Melo Demattê, J.A. The influence of training sample size on the accuracy of deep learning models for the prediction of soil properties with near-infrared spectroscopy data. SOIL 2020, 6, 565–578. [Google Scholar] [CrossRef]
Yilmaz, E.O.; Kavzoglu, T. Analysis of the effect of training sample size on the performance of 2D CNN models. Intercont. Geoinf. Days 2021, 2, 241–244. [Google Scholar]
Kim, D.; Seo, S.B.; Yoo, N.H.; Shin, G. A Study on Sample Size Sensitivity of Factory Manufacturing Dataset for CNN-Based Defective Product Classification. Computation 2022, 10, 142. [Google Scholar] [CrossRef]
Alenizy, H.A.; Berri, J. Transforming tabular data into images via enhanced spatial relationships for CNN processing. Sci. Rep. 2025, 15, 17004. [Google Scholar] [CrossRef]
Starodubtseva, N.L.; Tokareva, A.O.; Rodionov, V. V; Brzhozovskiy, A.G.; Bugrova, A.E.; Chagovets, V. V; Kometova, V. V; Kukaev, E.N.; Soares, N.C.; Kovalev, G.I.; et al. Integrating Proteomics and Lipidomics for Evaluating the Risk of Breast Cancer Progression: A Pilot Study. Biomedicines 2023, 11, 1786. [Google Scholar] [CrossRef]
Tonoyan, N.M.; Chagovets, V. V.; Starodubtseva, N.L.; Tokareva, A.O.; Chingin, K.; Kozachenko, I.F.; Adamyan, L. V.; Frankevich, V.E. Alterations in lipid profile upon uterine fibroids and its recurrence. Sci. Rep. 2021, 11, 11447. [Google Scholar] [CrossRef]
Koelmel, J.P.; Kroeger, N.M.; Ulmer, C.Z.; Bowden, J.A.; Patterson, R.E.; Cochran, J.A.; Beecher, C.W.W.; Garrett, T.J.; Yost, R.A. LipidMatch: An automated workflow for rule-based lipid identification using untargeted high-resolution tandem mass spectrometry data. BMC Bioinformatics 2017, 18, 331. [Google Scholar] [CrossRef]
Tokareva, A.O.; Chagovets, V. V.; Kononikhin, A.S.; Starodubtseva, N.L.; Nikolaev, E.N.; Frankevich, V.E. Normalization methods for reducing interbatch effect without quality control samples in liquid chromatography-mass spectrometry-based studies. Anal. Bioanal. Chem. 2021, 413, 3479–3486. [Google Scholar] [CrossRef] [PubMed]
Sud, M.; Fahy, E.; Cotter, D.; Brown, A.; Dennis, E.A.; Glass, C.K.; Merrill, A.H.; Murphy, R.C.; Raetz, C.R.H.; Russell, D.W.; et al. LMSD: LIPID MAPS structure database. Nucleic Acids Res. 2007, 35, 527–532. [Google Scholar] [CrossRef] [PubMed]
Galindo-Prieto, B.; Eriksson, L.; Trygg, J. Variable influence on projection (VIP) for orthogonal projections to latent structures (OPLS). J. Chemom. 2014, 28, 623–632. [Google Scholar] [CrossRef]
Menze, B.H.; Kelm, B.M.; Masuch, R.; Himmelreich, U.; Bachert, P.; Petrich, W.; Hamprecht, F.A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 2009, 10, 1–16. [Google Scholar] [CrossRef]
Guyon, I.; Weston, J.; Barhill, S. Gene selection for cancer classification using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Kursa, M.B.; Rudnicki, W.R. Feature selection with the boruta package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
Koch, L.F. Index of Biotal Dispersity. Ecology 1957, 38, 145–148. [Google Scholar] [CrossRef]
Hubert, L.J.; Levin, J.R. A general statistical framework for assessing categorical clustering in free recall. Psychol. Bull. 1976, 83, 1072–1080. [Google Scholar] [CrossRef]
Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 1, 224–227. [Google Scholar] [CrossRef]
Caliñski, T.; Harabasz, J. A Dendrite Method Foe Cluster Analysis. Commun. Stat. 1974, 3, 1–27. [Google Scholar] [CrossRef]
Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 2009, 5476 LNAI, 475–482. [Google Scholar] [CrossRef]
Sharma, A.; Vans, E.; Shigemizu, D.; Boroevich, K.A.; Tsunoda, T. DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 2019, 9, 1–7. [Google Scholar] [CrossRef]
Clerc, M.; Kennedy, J. The Particle Swarm—Explosion, Stability, and Convergence in a Multidimensional Complex Space. Mutat. Res. DNAging 2002, 6, 58–73. [Google Scholar] [CrossRef]
Thévenot, E.A.; Roux, A.; Xu, Y.; Ezan, E.; Junot, C. Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses. J. Proteome Res. 2015, 14, 3322–3335. [Google Scholar] [CrossRef] [PubMed]
Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Meyer, D. Support Vector Machines. The Interface to libsvm in package 2024, 8, e1071. [Google Scholar]
Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. 2016, 13–17, 785–794. [Google Scholar] [CrossRef]
Dudek, A.; Walesiak, M. The Choice of Variable Normalization Method in Cluster Analysis. Educ. Excell. Innov. Manag. A 2025 Vis. to Sustain Econ. Dev. Dur. Glob. Challenges 2020, 325–340. [Google Scholar]
Siriseriwan, W. A Collection of Oversampling Techniques for Class Imbalance Problem Based on SMOTE 2024, 16.
Wild, F. Latent Semantic Analysis 2022.
Donaldson, J. T-Distributed Stochastic Neighbor Embedding for R (t-SNE) Version 2022, 4.
Barber, C.B. Convex hull in arbitrary dimension 2018.
Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
Kalinowski, T.; Falbe, D.; Allaire, J.; Chollet, F.; RStudio; Google; Tang, Y.; Bijl, W. Van Der; Studer, M.; Keydana, S. R Interface to “Keras” 2024.

Figure 1. Comparative analysis (Venn diagrams) of biomarker sets identified by Mann-Whitney U test (blue) and SVM-RFE feature selection (yellow) methods in binary classification tasks: (A) Malignant versus benign tumor discrimination. (B) Benign tumor versus healthy control differentiation. (C) Malignant versus control classification.

Figure 2. Comparative biomarker discovery in ovarian cancer /benign tumor/control classification using multivariate feature selection approaches: (A) Kruskall-Wallis and PLS-DA; (B) Kruskall-Wallis, Mann-Whitney, PLS-DA and SVM-RFE.

Figure 3. Performance evaluation (accuracy and mean recall) of machine learning models using different feature selection strategies: SVM-RFE binary feature selection ((A), (B)), and Mann-Whitney feature selection ((C), (D)) for binary classification and multiclass classification ((E), (F)). BvM – benign versus malignant tumor separation, CvB – control versus benign group separation, CvM – control versus malignant group separation, OvO – one-versus-one classification.

Figure 4. Receiver operating characteristic (ROC) analysis of optimal binary classification model for ovarian cancer detection. AUC – area under the curve. Green color indicates XGBoost with SVM-RFE-selected features for control versus benign group separation; black represents RF with SVM-RFE features for control versus malignant group separation; blue corresponds to CNN with Mann-Whitney-selected features for benign versus malignant tumor separation; and red denotes XGBoost with Mann-Whitney features for control versus malignant group separation.

Figure 5. (A) Pipeline of feature selection methods. Selection_B – methods for binary class selection features, selection_{M –} methods for multiply feature selection. (B) Pipeline of binary classification methods. PSO – methods with particle swarm optimization tuning. (C) Pipeline of multiply classification methods. PSO – methods with particle swarm optimization tuning.

Table 1. Clinical characteristics of study participants.

Variable	Ovarian cancer (N = 103)	Benign tumor (N = 107)	Control group (N = 19)	P Value (Kruskal–Wallis H test)
Age, years, Median(Q1;Q3)	51.0(39.0;60.0)	38.0(34.0;45.0)	39.5(34.0;60.3)	< 0.001
BMI (kg/m²), Median(Q1;Q3)	25.0(22.0;27.8)	23.5(21.0;27.0)	22.5(20.8;25.3)	0.27
Benign ovarian tumors, n(%)	-	cystadenoma – 30(28%) endometrioid cyst – 56(52%) mature teratoma - 21(20%)	-	-
Borderline tumors, n(%)	28(27%)	-	-	-
Low-grade OC, n(%)	16(16%)	-	-	-
FIGO stage (high-grade OC), n(%)	IA - 5(2.7) IIB - 5(2.7) IIC - 1(5.4) IIIA - 4(5.4 IIIC - 40(54) IVA - 4(5.4)	-	-

Table 2. Comparative stability analysis of feature selection methods using Koch’s biotic diversity index for binary classification tasks. OPLS-DA - Orthogonal Projection on Latent Structures-Discriminant Analysis; RF – Random Forest; SVM-RFE - Support Vector Machine-Recursive Feature Elimination; LASSO - least absolute shrinkage and selection operator.

Method	Benign vs malignant	Control vs benign	Control vs malignant	mean (score)
Mann-Whitney	0.41	0.31	0.47	0.40 (5)
Welch	0.35	0.27	0.35	0.32 (4)
OPLS-DA	0.50	0.51	0.57	0.53 (6)
RF	0.19	0.20	0.16	0.18 (3)
SVM-RFE	0.94	0.59	0.71	0.75(7)
LASSO	0.20	0.14	0.06	0.14 (1)
Boruta	0.17	0.14	0.12	0.14 (2)

Table 3. Cluster separation metrics in PC space for clinical groups using selected features.

Metric	Method	Benign vs malignant	Control vs benign	Control vs malignant	Combined Feature Set	Mean (score)
Hubert-Levin’s C index	Mann-Whitney	0.46	0.42	0.45	0.44	0.44 (7)
	Welch	0.46	0.42	0.46	0.44	0.44 (6)
	OPLS-DA	0.45	0.46	0.47	0.47	0.46 (4)
	RF	100.00	100.00	100.00	0.47	75.12 (1)
	SVM-RFE	0.47	0.44	0.46	0.45	0.45 (5)
	LASSO	0.46	100.00	100.00	0.46	50.23 (2)
	Boruta	0.44	100.00	100.00	0.44	50.22 (3)
Davies-Bouldin’s index	Mann-Whitney	3.35	4.09	3.87	3.95	3.81 (7)
	Welch	3.33	4.37	4.46	4.49	4.16 (6)
	OPLS-DA	4.76	16.44	15.23	13.39	12.45 (4)
	RF	100.00	100.00	100.00	7.55	76.89 (1)
	SVM-RFE	10.75	7.43	7.66	8.77	8.65 (5)
	LASSO	2.80	100.00	100.00	2.80	51.40 (3)
	Boruta	2.72	100.00	100.00	2.72	51.36 (2)
Calinski-Harabasz pseudo-F statistic	Mann-Whitney	-12.79	47.35	-4.53	1.53	7.89 (4)
	Welch	-13.42	45.04	1.10	7.99	10.18 (5)
	OPLS-DA	4.25	30.06	24.15	23.39	20.46 (7)
	RF	-100.00	-100.00	-100.00	15.92	-71.02 (1)
	SVM-RFE	11.66	11.86	20.36	10.89	13.69 (6)
	LASSO	-14.29	-100.00	-100.00	-14.29	-57.14 (2)
	Boruta	8.82	-100.00	-100.00	8.82	-45.59 (3)

Table 4. Robustness and discriminatory power assessment of selected biomarker panels in multiclass ovarian cancer classification.

Method	Koch’s index (score)	Hubert-Levin’s C index (score)	Davies-Bouldin’s index (score)	Calinski-Harabasz pseudo-F statistic (score)	Sum score (rank)
Kruskall-Wallis	0.47 (5)	0.44 (4)	3.95 (5)	1.37 (4)	18 (1)
PLS-DA	0.46 (4)	0.42 (5)	7.69 (4)	15.19 (5)	18 (1)
RF	0.18 (1)	100.00 (3)	100.00 (3)	-100.00 (3)	10 (2)
LASSO	0.21 (3)	100.00 (2)	100.00 (2)	-100.00 (2)	9 (3)
Boruta	0.18 (2)	100.00 (1)	100.00 (1)	-100.00 (1)	5 (4)

Table 5. Prognostic performance of the best combinations of classification model and feature selection method across clinical groups.

Model, feature selection method	Predicted outcome	Clinical group
		control (n=20)	benign (n=32)	malignant (n=30)
		control (n=20)	benign (n=32)	malignant (n=30)	XGBoost, Kruskal-Wallis set	control	18 (90%)	3	2
benign	2	27 (84%)	10
malignant	0	2	18 (60%)
OvO CNN, Mann-Whitney set	control	17(85%)	5	3
	benign	0	20 (63%)	1
	malignant	3	7	26(87%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.