Preprint
Article

This version is not peer-reviewed.

A First-Trimester Serum Proteomic Signature for Early Prediction of Preeclampsia: Integrated Untargeted and Targeted Mass Spectrometry with Machine Learning

Submitted:

08 June 2026

Posted:

09 June 2026

You are already at the latest version

Abstract
First-trimester prediction of preeclampsia (PE) remains a major clinical challenge, particularly outside specialized fetal medicine centers. This study aimed to identify and validate serum protein biomarkers for early PE prediction using an integrated proteomic approach. A prospective cohort of 64 first-trimester singleton pregnancies (32 future PE cas-es, 32 matched controls) was analyzed. Untargeted proteomics was performed using DIA-PASEF-MS, followed by targeted validation with MRM-MS. Machine learning classifiers (support vector machines, SVM, and random forest) were trained on differentially abundant proteins (FDR < 0.01, VIP > 1.5). DIA-MS identified 33 protein markers associated with complement activation, IGF transport regulation, and platelet degranulation. An SVM model with a linear kernel achieved 95% accuracy (AUC = 0.95, sensitivity = 95%, specificity = 97%). Four markers (AFM, AHSG, C8A, IGHG1) were validated across platforms, confirming the discovery findings. Cross-platform correlation was high: 71% of overlapping proteins showed r > 0.5 (p < 0.001), with the highest concordance observed for potential PE marker AHSG (r = 0.8, p < 0.001). PRSS1 correlated negatively with proteinuria (r = −0.74), and IGHV1-46 correlated positively with gestational age at delivery (r = 0.72), linking the proteomic signature to clinical severity. Integrated DIA-MS and MRM-MS proteomics yields a reproducible, high-performance serum signature for first-trimester PE prediction. The identified markers reflect core pathophysiological pathways and offer potential to augment current FMF-based screening algorithms.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

Preeclampsia (PE) is a leading cause of maternal and perinatal morbidity and mortality globally, particularly when it develops early and leads to indicated preterm delivery [1]. Consequently, the reliable first-trimester identification of high-risk women is a paramount objective in modern obstetrics. This remains challenging, especially for nulliparous patients, in whom traditional risk-factor-based screening performs suboptimally [2,3]. Current guidelines rely primarily on maternal demographic and historical factors, an approach that limits detection rates and leaves a substantial proportion of cases undetected [3,4,5,6,7,8,9,10].
While more sophisticated first-trimester algorithms, such as the Fetal Medicine Foundation (FMF) model—which integrates maternal factors with uterine artery Doppler and biochemical markers—can detect 75–90% of early or preterm preeclampsia at a 10–16% false-positive rate, their performance is imperfect and varies across populations and settings [3,4,11,12,13]. Furthermore, the implementation of these multivariable tools requires standardized measurements, specialized software, and trained personnel [7,14,15]. Critically, they may not fully capture the complex systemic pathophysiological shifts that precede clinical disease [16,17]. Maternal serum offers a compelling matrix for biomarker discovery due to its minimally invasive collection and its reflection of systemic processes underpinning preeclampsia, including placental dysfunction, endothelial activation, and inflammation [16].
Modern mass-spectrometry-based proteomics enables the specific, reproducible, and multiplexed quantification of hundreds of serum proteins, often surpassing conventional immunoassays in analytical performance [16,18,19,20,21,22]. Data-independent acquisition (DIA) provides unbiased, broad-coverage discovery of the serum proteome, while targeted multiple reaction monitoring (MRM) mass spectrometry permits precise validation of candidate proteins across cohorts, establishing a powerful complementary pipeline [16,19,20,21,22,23,24]. The integration of these advanced proteomic approaches in early pregnancy holds promise for uncovering molecular signatures of subsequent preeclampsia, refining individual risk stratification beyond existing algorithms, and informing more effective preventive strategies [17,25,26].
This study employs an integrated DIA-PASEF-MS and targeted MRM-MS strategy on first-trimester serum samples from women who later developed preeclampsia and matched controls. The objective is to identify and validate robust protein biomarkers and to construct predictive models with the potential to augment current FMF-based screening in routine clinical practice.

2. Materials and Methods

2.1. Study Design

This prospective cohort study was conducted at the I. Kulakov National Medical Research Center for Obstetrics, Gynecology, and Perinatology (Moscow, Russia) from January to December 2022. The analytical cohort was selected from an initial screening population of 1869 women aged 18–45 years who underwent routine first-trimester FMF-based screening between 11+2 and 14+2 weeks of gestation. The final cohort comprised 64 women with singleton pregnancies, including 32 who subsequently developed PE and 32 age-matched controls with uncomplicated pregnancies. Exclusion criteria were pre-existing diabetes mellitus, autoimmune disease, a history of solid-organ transplantation, known malignancy, or fetal chromosomal abnormalities. Preeclampsia was defined according to contemporary guidelines as de novo hypertension after 20 weeks of gestation accompanied by proteinuria, maternal organ dysfunction (renal, hepatic, hematologic, or neurologic), or uteroplacental dysfunction including fetal growth restriction [14]. The control group consisted of age-matched (p>0.05) women with uncomplicated singleton pregnancies who conceived spontaneously and did not develop any hypertensive disorder, gestational diabetes, preterm delivery before 37 weeks, or intrauterine growth restriction (estimated fetal weight <10th percentile), and did not have high FMF-based first trimester risk of PE.
Written informed consent was obtained from all participants at enrollment. A standardized first-trimester assessment included measurement of maternal weight, height, and blood pressure [14], along with collection of detailed obstetric and medical history. Ultrasonographic evaluation involved transabdominal color Doppler assessment of the uterine arteries with calculation of the pulsatility index (UtA-PI) [15]. Venous blood was collected into Serum Z/9 tubes (Monovette, Sarstedt, Germany) and processed within 2 hours. After centrifugation at 300×g for 20 minutes at room temperature, the serum supernatant was aliquoted and stored. Serum concentrations of placental growth factor (PlGF) and pregnancy-associated plasma protein-A (PAPP-A) were measured using the Delfia Xpress system (PerkinElmer, USA) per the manufacturer’s instructions. Residual serum was aliquoted and stored at −80°C for subsequent untargeted DIA-MS and targeted MRM-MS proteomic analyses of first-trimester biomarkers [23]. The study protocol was approved by the Institutional Review Board of the I. Kulakov Center (protocol No. 2, 9 March 2017). All procedures followed the principles of the Declaration of Helsinki and Good Clinical Practice guidelines.

2.2. Serum Preparation

Serum samples and a surrogate matrix of bovine serum albumin (BSA, 10 mg/mL in PBS) were subjected to tryptic digestion following standard procedures [21,23,24,27]. Briefly, 10 µL of serum were diluted in denaturation buffer containing 7.2 M urea, 16 mM dithiothreitol, and 240 mM Tris-HCl (pH 8.0) and incubated for 30 minutes at 37°C. Al-kylation was performed with iodoacetamide at a final concentration of 40 mM for 30 minutes at room temperature in the dark. Trypsin (Trypsin Gold, Promega, USA) was added at a 1:25 enzyme-to-protein ratio, and digestion proceeded overnight at 37°C. Total protein concentration was determined using a bicinchoninic acid (BCA) assay kit (Thermo Fisher Scientific, USA). Digestion was terminated by acidification with 1.0% formic acid to pH ≤ 2. The resulting peptide mixture was adjusted to 1 µg/µL, kept on ice, and analyzed by mass spectrometry on the same day.
For targeted MRM-MS analysis of 139 serum proteins, an in-house panel based on the BAK125/270 MRM kit (Canada) was employed. This panel, previously developed by Prof. Borchers’ group, enables multiplexed quantification of potential plasma protein bi-omarkers across various diseases including neurodegenerative, autoimmune, cardiovas-cular, and renal disorders, as well as diabetes [21,22,23]. Stable isotope-labeled internal standard (SIS) peptides and corresponding native (NAT) peptides were synthesized and characterized at the Omics Laboratory of Skolkovo Institute of Science and Technology (Skoltech) [21,22]. The SIS peptide panel featured isotopically labeled C-terminal lysine (+8 Da) or arginine (+10 Da) residues.
The SIS peptide mixture was dissolved and diluted to a working solution at 10× the lower limit of quantification (LLOQ). For each sample, 40 µL of serum digest were com-bined with 10 µL of the SIS peptide mixture. Peptide purification was performed via sol-id-phase extraction using Oasis HLB 96-well plates. After conditioning and sample load-ing, wells were washed and peptides were eluted with 70% acetonitrile/0.1% formic acid. Eluates were dried in a vacuum concentrator and stored at −80°C until analysis.

2.3. Targeted HPLC-MS Analysis

For HPLC-MRM-MS analysis, dried peptide samples were reconstituted in 0.1% for-mic acid to a final concentration of 1 µg/µL. Aliquots of 10 µL from each reconstituted se-rum digest, quality-control (QC) sample, and calibration standard were injected onto a Zorbax Eclipse Plus reversed-phase UHPLC column (Agilent Technologies, USA) con-nected to an ExionLC™ system (Thermo Fisher Scientific, USA). Peptides were separated at a flow rate of 0.4 mL/min over a 60-minute multistep gradient [23].
For calibration standards and QC samples, 40 µL of surrogate BSA digest were mixed with 10 µL of SIS mixture and 10 µL of native peptide mixture. A lyophilized mixture of native peptides, pre-balanced at the LLOQ for each analyte, was dissolved and serially diluted to generate eight calibration levels: 100×, 40×, 16×, 4×, 2×, 0.5×, 0.25× and 0.1× LLOQ. QC samples at three concentration levels (0.35× LLOQ, QC-A; 3.5× LLOQ, QC-B; and 35× LLOQ, QC-C) were analyzed in triplicate.
MRM data were processed using Skyline software [28]. Quantification adhered to ICH guidelines for Bioanalytical Method Validation [29], employing weighted (1/x2) linear regression of SIS-to-native peak area ratios. Calibration curve performance was assessed in Skyline, with accuracy considered acceptable when measured concentrations for at least 6 of 9 calibration points and at least 1 of 3 QC replicates at each level were within ±20% of nominal values.

2.4. Non-Targeted DIA-PASEF-MS Analysis

Tryptic peptide fractions were analyzed on a Dionex Ultimate 3000 nano-LC system (Thermo Fisher Scientific, USA) coupled with a timsTOF Pro mass spectrometer (Bruker Daltonics, Billerica, MA, USA). 1 µL of each peptide sample was injected onto a packed emitter C18 column (25 cm × 75 µm, 1.6 µm; Ion Optics, Parkville, Australia). LC separa-tion was carried out at a flow rate of 400 nl/min using a 90-minute linear gradient from 2% to 37% solvent B (0.1% FA in ACN), followed by a column wash step (10-minute iso-cratic elution with 90% solvent B) and equilibration (15 minutes, isocratic elution with 2% solvent B).
MS data were acquired using the DIA-PASEF method. The electrospray ionization (ESI) source parameters were set as follows: capillary voltage 1400 V, dry gas flow — 3.0 l/min at 180 °C. MS and MS/MS spectra were recorded in the range of 100 to 1700 m/z and in the ion mobility range from 0.6 to 1.6 V*s/cm². The scan time (ramp) was set to 100 ms. Collision energy changed linearly depending on mobility: from 59 eV at 1/K0 = 1.6 V·s/cm² to 20 eV at 1/K0 = 0.6 V·s/cm².
The LC-MS/MS spectra were analyzed using the DIA NN software (Data-Independent Acquisition by Neural Networks, version 1.8.1) in library-free mode with the following parameters: mass accuracy – 20 ppm; MS1 accuracy – 20 ppm; peptide length range – from 7 to 30 amino acids [7,8,9]. The search was performed against the SwissProt Human database with carbamidomethylation (C) and oxidation (M) specified as variable modifi-cations. The false discovery rate threshold was set at 0.1%. Data filtering and calculation of LFQ values were performed using the R package DIAgui [30].

2.5. Statistical Analysis

Continuous clinical variables were summarized as median (first quartile; third quartile) and categorical variables as counts (percentage). Between-group differences were assessed using the Mann–Whitney U test for continuous variables and Pearson’s chi-square test for categorical variables, with statistical significance defined as p < 0.05.
For untargeted DIA-MS data, only proteins detected in ≥70% of samples were analyzed. Missing values were imputed in Perseus (MaxQuant environment) using a normal distribution-based procedure [31].
For targeted MRM-MS data to correct for batch effects and analytical variability, data normalization was performed using RobNorm methods [32]. For each protein, the mean CV was calculated across standard and quality control samples between batches; proteins with a CV exceeding 50% were excluded from further analysis.
Significant proteins from both targeted and untargeted datasets were identified using the Mann–Whitney U test with Benjamini–Hochberg correction (false discovery rate (FDR)-adjusted p-value < 0.01). Orthogonal projections to latent structures discriminant analysis (OPLS-DA) was performed on each proteomic profile, and features with variable importance in projection (VIP) > 1.5 and FDR < 0.01 were retained as protein markers [33]. For DIA-MS data, additional machine learning classifiers were trained, including support vector machines (SVM) with linear, polynomial, radial and sigmoid kernels, as well as random forest models [34,35]. Hyperparameters of SVM with polynomial, radial and sigmoid kernels were optimized using particle swarm optimization (PSO) [36]. Model performance was evaluated by 10-fold cross-validation, calculating sensitivity, specificity, accuracy, area under the receiver-operating characteristic curve (AUC), positive predictive value (PPV), negative predictive value (NPV) and F-score.
Pathway enrichment analysis was conducted using STRING (FDR < 0.01) [37]. Associations between DIA-MS and MRM-MS protein levels were assessed using Pearson correlation, while correlations between DIA-MS protein levels and clinical parameters were evaluated using Spearman’s rank correlation.
Statistical analysis was performed by scripts based on R version 4.3.2 [38] the RStudio environment (version 2023.09.1) [39]. A comprehensive suite of R packages was employed to support both statistical modeling and data visualization. For modeling and machine learning tasks, the packages ropls 1.34.0 [40], effsize 0.8.1 [41], pwr 1.3-0 [42], e1071 1.7-16 [43], caret 7.0-1[44], dplyr 1.1.4 [45], RandomForest [35] were utilized. Visualization of the results was accomplished using ggplot2 3.5.2 [46], reshape2 1.4.4 [47], forcats 1.0.0 [48], ggrepel 0.9.6 [49], pheatmap 1.013 [50] and pROC 1.18.5[51].

3. Results

3.1. Clinical Characteristics

The final cohort comprised 32 women who subsequently developed PE and 32 age-matched controls with uncomplicated pregnancies. The clinical characteristics of both groups are summarized in Table 1.
Maternal age, body mass index (BMI), gestational age at sample collection, mean arterial pressure (MAP) multiples of median (MoM), and the use of assisted reproductive technologies (IVF) did not differ significantly between the groups (all p > 0.05). However, women who later developed PE were more frequently nulliparous (72% vs. 34%, p < 0.001), had a higher prevalence of previous PE (22% vs. 0%, p = 0.02), habitual miscarriage (38% vs. 0%, p < 0.001), and previous preterm delivery (25% vs. 0%, p = 0.01) compared to controls.
First-trimester FMF-based screening identified a high risk for PE in 53% of women who subsequently developed the disease, whereas none of the controls were classified as high-risk (p < 0.001). First-trimester placental growth factor (PlGF) levels (MoM) were significantly lower in the PE group compared to controls (median 0.55 vs. 0.74, p = 0.002).
Regarding pregnancy outcomes, women in the PE group delivered earlier than controls (median 37.2 vs. 39.4 weeks, p < 0.001) and had higher maximum systolic and diastolic blood pressures (135 vs. 115 mmHg and 89 vs. 70 mmHg, respectively, both p < 0.001). The PE group also demonstrated higher umbilical artery pulsatility index (0.95 vs. 0.79, p = 0.002) and lower cerebro-placental ratio (1.44 vs. 1.89, p < 0.001).
Markers of disease severity were significantly elevated in the PE group, including 24-hour proteinuria (1.1 vs. 0 g/L, p < 0.001) and serum creatinine (80.8 vs. 66.6 µM/L, p < 0.001). Intrapartum blood loss was higher in the PE group (700 vs. 300 ml, p < 0.001), and emergency cesarean section was more frequent (59% vs. 3%, p < 0.001). Neonatal outcomes were less favorable in the PE group, with lower birthweight (2745 vs. 3400 g, p < 0.001) and lower Apgar scores at both 1 minute (p = 0.002) and 5 minutes (p = 0.001) compared to controls.

3.2. Non-Targeted DIA-MS Proteome Profile

Non-targeted high-performance liquid chromatography HPLC-DIA-MS analysis of first-trimester serum identified over 455 protein groups. Statistical analysis was per-formed on 274 proteins consistently detected in at least 70% of samples (Table S1). Principal component analysis (PCA) revealed clear separation between samples from women who later developed PE and controls. This separation was further supported by an OPLS-DA model demonstrating high explanatory and predictive capability (R2Y = 0.96, Q2Y = 0.78) (Figure 1a,b).
Differential abundance analysis identified 69 proteins with significant unadjusted p-values (p<0.01), of which 48 remained significant following FDR correction (FDR<0.01). From this set, 33 proteins with a VIP score exceeding 1.5 were designated as candidate PE markers (Figure 1c, Table S1). The majority of these proteins were down-regulated in serum from women who later developed PE (CFHR4, PRSS1, IGHV1-46, C1QB, KRT1, NOTUM, AHSG, GPX3, LBP, PRG2, SAA4, ORM2, FBLN1, SERPINC1, SERPINA7, C1S, C9, SERPINA3, C3, TF, GC, ALB, APOA1 and SERPINA1), while a smaller subset was up-regulated (CFD, CPB2, BCHE, IL1RAP, LTBP1, PROC, INHBC, SELENOP and VNN1).
When these 33 DIA-derived markers were used as input features for classification models, SVM with linear and polynomial kernels and a random forest classifiers achieved the highest quality metrics (PPV and NPV of 97% and 95%, respectively) during 10-fold cross-validation (Table 2, Figure 1d). Pathway enrichment analysis demonstrated that this protein set was strongly associated with complement activation pathways and also mapped to biological processes including post-translational protein phosphorylation, regulation of insulin-like growth factor (IGF) transport and uptake, platelet degranulation, and selenium-related micronutrient networks (Figure 2, Table S2).

3.3. Targeted MRM-MS Validation

Targeted HPLC-MRM-MS quantification was successfully performed for 104 plasma proteins across all serum samples (n=64) (Table S4). Unsupervised PCA again resolved two distinct clusters corresponding to future clinical outcomes. A subsequent OPLS-DA model based on the targeted panel demonstrated robust discriminatory power (R2Y = 0.94, Q2Y = 0.74) (Figure 3a, b).
Among the quantified proteins, 12 exhibited significant unadjusted p-values (p < 0.01), with 9 proteins remaining significant after FDR correction (FDR<0.01) and possessing high VIP scores (VIP>1.5) in the OPLS-DA model (Table S3, Figure 3c). Notably, four of these nine validated proteins (AFM, AHSG, C8A, IGHG1) overlapped with candidates identified in the discovery phase, providing cross-platform confirmation.
Correlation analysis between DIA-MS and MRM-MS measurements demonstrated significant concordance for the majority of overlapping proteins: 71% showed Pearson correlation coefficients greater than 0.5 with p < 0.001, indicating good reproducibility be-tween the semi-quantitative and quantitative approaches (Table S4). A correlation heatmap highlighted four coherent clusters of markers across platforms, including a tight subcluster comprising AHSG, C8A and F10 from the MRM panel and AHSG, C1QB and PRSS1 from the DIA panel, all with correlation coefficients above 0.6. AHSG levels measured by DIA-MS and MRM-MS were particularly consistent, with a correlation coefficient of 0.82 (p < 0.001), supporting its role as a robust early marker candidate (Figure 4a).

3.4. Associations Between Protein Markers and Clinical Parameters

Integration of DIA-MS protein markers with clinical data revealed two main clusters of clinical variables on the correlation heatmap (Figure 4b). Gestational age at delivery and birthweight grouped together and showed correlation patterns opposite to those of diastolic and systolic blood pressure before delivery, duration of post-partum hospitalization and proteinuria level, which formed a second cluster.
Several candidate protein markers (GPX3, PRSS1, TF, AHSG, C3, KRT1, GC, C1QB and IGHV1-46) were inversely associated with clinical parameters indicative of disease severity, including umbilical artery pulsatility indices, intrapartum blood loss, and pre-delivery creatinine levels. Conversely, these same proteins demonstrated positive correlations with factors such as parity, first-trimester placental growth factor levels, Apgar scores and the cerebral-placental ratio before delivery. The strongest individual associations were observed for PRSS1, which correlated negatively with proteinuria (r = −0.74), and for IGHV1-46, which correlated positively with gestational age at delivery (r = 0.72), linking the proteomic signature to clinically meaningful manifestations of PE.

4. Discussion

The FMF algorithm detects 80.6% (95% CI 64.0–91.8) of preterm PE cases (<37 weeks) and 31.8% (95% CI 18.6–47.6) of term PE cases (≥37 weeks) and has a long-standing clinical track record [52]. Nevertheless, false-negative results persist in real-world practice. Some women without pronounced anamnestic risk factors who are classified as low-risk subsequently develop severe PE, leading to adverse outcomes [3,4,12,13,52,53,54]. This may be attributed to operator dependence in UtA-PI measurement, insufficient sphygmomanometer quality control, and lack of regular calibration. In the present cohort, the FMF model identified high risk in only 53% of women who later developed PE, underscoring the potential added value of a proteomic approach.
In contrast, integrated untargeted DIA-MS and targeted MRM-MS profiling (n = 64) of first-trimester maternal serum reliably identified women at risk for subsequent PE. In a discovery study, the analysis identified 33 DIA-MS markers (FDR < 0.01, VIP > 1.5) and achieved excellent predictive performance (AUC 0.95, sensitivity 95%, specificity 97%) using SVM and random forest classifiers. Targeted MRM-MS validated nine proteins, of which four — AFM, AHSG, C8A, and IGHG1 — confirmed the discovery findings. High cross-platform concordance (71% of overlapping proteins with r > 0.5, p < 0.001) underscores technical reproducibility. These findings, together with previous first-trimester proteomic studies [16,24,55,56] demonstrate the ability of this approach to detect early molecular changes preceding clinical disease. This capability is particularly important given the limited effectiveness of existing prevention strategies: low-dose aspirin reduces PE risk by only 18–48% [53,54]. This modest efficacy likely reflects biological heterogeneity, encompassing distinct molecular subclasses such as placental, metabolic, maternal anti-fetal, and extracellular matrix-related subtypes [17,57,58].
The biological coherence of the discovered markers strengthens their clinical plausibility. Pathway enrichment analysis prominently featured complement cascade activation, a central mechanism in PE pathogenesis [17,57,59,60]. Altered concentrations of complement-related proteins, including C1QB, C3, CFHR4, and other components of the classical pathway, may reflect their consumption due to early complement activation. Several complement-related proteins, including C1QB, C1S, C3, C8A, and C9, were significantly altered in women who later developed PE, consistent with a chronic low-grade inflammatory and complement-dysregulated state established in the first trimester [16,59,60,61,62]. Interestingly, a prospective study by He YD et al. (2020) demonstrated that by the second and third trimesters, blood differ from those in women with uncomplicated pregnancies [59]. Furthermore, Matsuyama T. et al. (2021) showed in vitro that an imbalance of pro-angiogenic and anti-angiogenic factors—observed as early as the first trimester in pregnancies subsequently complicated by PE—inhibits the synthesis of complement factor H by placental endothelial cells, leading to complement activation and endothelial dysfunction [60].
Similarly, several liver-derived and transport proteins—such as albumin (ALB), alpha-1-antitrypsin (SERPINA1), alpha-1-antichymotrypsin (SERPINA3), serotransferrin (TF), and vitamin D-binding protein (GC)—may reflect early changes in hepatic synthetic function, systemic inflammation, or redistribution processes previously associated with adverse pregnancy outcomes [16,24]. Reduced first-trimester ALB levels have demonstrated moderate prognostic value for PE; however, its diagnostic performance improves substantially when incorporated into combined predictive models [16,24]. SERPINA1 has been proposed as a potential component of proteomic panels for early PE prediction, demonstrating high diagnostic accuracy within multi-protein models [24]. Experimental evidence indicates its involvement in regulating trophoblast invasion through endoplasmic reticulum stress pathways [63]. Elevated levels of non-tryptic SERPINA1 peptides in urine are associated with the clinical course of preeclampsia and may represent a promising non-invasive marker of disease severity [61,64,65]. By contrast, SERPINA3—belonging to the same family of serine protease inhibitors—has not shown a significant association between genetic variants and PE risk [66], and its role in disease pathogenesis remains incompletely understood.
Thus, early disturbances in hepatic synthetic and metabolic function may play an important role in the pathogenesis of preeclampsia as early as the first trimester. Elevated first-trimester hepatic steatosis index (HSI) values were recently associated with an increased risk of gestational hypertension and PE in a study by Zhang et al (2025) [67], confirming the role of early structural liver changes closely linked to metabolic and functional dysfunction. The down-regulation of AHSG—a liver-synthesized protein involved in metabolic regulation—observed in both DIA-MS and MRM-MS platforms (Pearson r = 0.82, p < 0.001) aligns with previous reports linking low AHSG to preterm PE and to metabolic dysfunction-associated steatotic liver disease (MASLD)[68], which itself associates with adverse pregnancy outcomes [69,70]. A longitudinal study by Chaemsaithong P. et al. (2015) found that AHSG levels increased from the first trimester to 26 weeks of gestation and then declined, being significantly lower in the group that developed early-onset PE [71]. However, the direction of AHSG changes varies across studies: both elevated levels in manifest PE and reduced levels in early-onset disease and longitudinal observations have been reported, reflecting disease heterogeneity, differences in gestational timing, and population characteristics [71,72] Reduced levels of several proteins, including AHSG, may be attributable to multiple mechanisms: redistribution and local accumulation in placental tissue (as shown for SERPINA1) [61,73], consumption under conditions of immune cascade activation [16,24,62], as well as differences in disease stage and phenotypic heterogeneity [17,57].
Accumulating evidence points to a possible role of hemostasis imbalance in the development of PE. Pei-Pei Jin et al. (2023) conducted a prospective study consistent with the present findings, evaluating first-trimester antithrombin III (SERPINC1) levels in 853 pregnant women (322 who subsequently developed PE and 531 controls). That study demonstrated that reduced SERPINC1 levels were significantly associated with disease development after adjustment for age and BMI [74]. The role of other coagulation proteins, such as PROC and CPB2, in early PE prediction remains limited; nevertheless, hemostatic disturbances confirm their involvement in disease pathogenesis [62]. Furthermore, the down-regulation of GPX3, SERPINC1, and SERPINA1 points to impaired antioxidant capacity and coagulation regulation, both implicated in PE progression [63,66,72,73,74].
Proteins involved in extracellular matrix (ECM) organization and vascular remodeling, including fibulin-1 (FBLN1), may reflect impaired vascular adaptation and defective placentation. This aligns with evidence implicating ECM dysregulation in PE pathogenesis [75]. First-trimester proteomic studies have demonstrated the potential of ECM-related proteins as early disease biomarkers, and experimental data indicate that alterations in collagen composition may disrupt trophoblast function and contribute to defective placentation [24,75].
An important translational insight emerges from the correlation analysis between protein markers and clinical parameters (Figure 4b). PRSS1 showed a strong negative correlation with proteinuria (r = −0.74), and IGHV1-46 correlated positively with gestational age at delivery (r = 0.72), directly linking the proteomic signature to clinically meaningful endophenotypes of disease severity. Moreover, the inverse associations of PRSS1, IGHV1-46, AHSG, GPX3, C1QB and C3 with umbilical artery pulsatility indices and pre-delivery creatinine levels suggest that these markers capture not only the risk of developing PE but also the anticipated trajectory of placental and maternal organ dysfunction. This dual prognostic capacity—predicting both the occurrence and the likely severity of PE—represents an advance over many existing single-analyte or purely demographic models [3,60,72].
From a methodological perspective, the combination of DIA-PASEF-MS for broad discovery and MRM-MS for multiplexed, absolute quantification proved highly synergistic. The DIA-MS platform enabled unbiased interrogation of >450 protein groups in blood serum, while the targeted MRM panel—originally designed for other disease contexts but applied here to PE—permitted rapid, quantitative validation of candidate markers without requiring de novo assay development [19,21,22,23,24,25,27]. The high cross-platform correlation for more than 70% of overlapping proteins indicates that semi-quantitative DIA-MS data can be reliably used for initial screening, with MRM-MS serving as a confirmatory or clinical-grade quantification method. Of particular interest is the identification of coherent protein clusters demonstrating similar patterns of change across both methods. Specifically, a tightly connected subcluster was identified, including AHSG, C8A, and F10 (by MRM-MS) as well as AHSG, C1QB, and PRSS1 (by DIA-MS), all of which exhibited high correlation coefficients. This indicates coordinated changes in these proteins and likely reflects their involvement in common pathophysiological processes, such as complement activation, inflammatory response, and hemostatic disturbances. The highest cross-platform concordance was observed for AHSG, confirming its robustness to analytical variability and positioning it as one of the most reliable candidate early biomarkers of preeclampsia.
Several limitations warrant consideration. First, the sample size (n = 64), while adequately powered for discovery and for training machine learning models with cross-validation, requires external validation in larger, multi-center cohorts before clinical implementation. Second, the study was conducted at a single tertiary referral center with a predominantly nulliparous and Caucasian population; generalizability to other ethnic groups, multiparous women, and lower-resource settings remains to be established. Third, the MRM panel, though multiplexed, was not specifically optimized for PE; a purpose-built PE panel might yield even higher performance. Fourth, proteomic analysis was performed at a single time point in the first trimester, which does not allow assessment of the dynamics of molecular changes throughout pregnancy. Finally, while association with clinical severity parameters was assessed, the study was not designed to distinguish early- vs. late-onset PE or preterm vs. term PE; these molecular subtypes may have distinct proteomic profiles [16,17,57,58].
Despite these limitations, the present findings align with and extend a growing body of literature implicating complement, coagulation, and metabolic regulation pathways in early PE pathogenesis [16,17,55]. The four cross-platform validated markers (AFM, AHSG, C8A, IGHG1) represent particularly robust candidates for further development. Importantly, the ability to predict PE from first-trimester serum using quantitative MS—without requiring specialized ultrasonographic equipment or operator training—could democratize risk assessment, particularly in settings where Doppler expertise is limited. The high NPV (95%) of the SVM and random forest models also suggests that such an approach could safely reassure a large proportion of nulliparous women, reducing unnecessary surveillance and anxiety.

5. Conclusions

Integrated untargeted DIA-MS and targeted MRM-MS profiling of first-trimester maternal serum reliably identifies women at risk for subsequent preeclampsia. In a discovery-validation cohort (n = 64), the analysis identified 33 DIA-MS markers (FDR < 0.01, VIP > 1.5), validated four by MRM-MS (AFM, AHSG, C8A, IGHG1), and achieved excellent predictive performance (AUC 0.95, sensitivity 95%, specificity 97%) using SVM and random forest classifiers. High cross-platform concordance (71% of overlapping proteins with r > 0.5, p < 0.001) underscores technical reproducibility.
Pathway enrichment featured complement activation, IGF transport dysregulation, and platelet degranulation. PRSS1 correlated negatively with proteinuria (r = −0.74), and IGHV1-46 correlated positively with gestational age at delivery (r = 0.72), directly linking the proteomic signature to clinical severity.
The four cross-platform validated markers represent robust candidates for further development. The ability to predict PE from first-trimester serum using quantitative mass spectrometry—without specialized ultrasound equipment or operator training—could democratize risk assessment, particularly in settings where Doppler expertise is limited. The high negative predictive value (95%) of the SVM and random forest models suggests that such an approach could safely reassure a large proportion of nulliparous women, reducing unnecessary surveillance and anxiety.
Future directions include external validation in diverse populations, direct comparison with the FMF model, development of a clinically validated targeted panel, and investigation of whether the proteomic signature can guide aspirin prophylaxis. Integrating quantitative proteomics into first-trimester screening may improve early identification of women at risk for PE, enabling timely surveillance and preventive therapy.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org.

Author Contributions

Conceptualization, N.S., A.K., Z.Kh., E.N., and G.S.; data curation, N.S., A.P., A.T., E.K., and A.B. (Alexander Brzhozovskiy); formal analysis, A.P., A.T., Z.Kh., and E.K.; funding acquisition, E.N., N.S., and G.S.; investigation, A.K., A.B. (Anna Bugrova), A.B. (Alexander Brzhozovskiy), E.K., and N.S.; methodology, N.S., A.K., A.B. (Anna Bugrova), A.T., and Z.Kh.; project administration, E.N., N.S., A.K., Z.Kh., and G.S.; resources, N.S., G.S., E.N., and A.K.; software, A.T., A.P., A.B. (Anna Bugrova), E.K., and A.B. (Alexander Brzhozovskiy); supervision, N.S., A.K., E.N., Z.Kh. and G.S.; validation, A.T., A.B. (Alexander Brzhozovskiy), A.P., and E.K.; visualization, A.T., E.K., N.S., and A.B. (Anna Bugrova); writing—original draft, N.S., A.T., A.B. (Alexander Brzhozovskiy), A.B. (Anna Bugrova), A.P., and E.K.; writing—review and editing, A.K., E.N., Z.Kh., and G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Russian Science Foundation (№ 24-14-00140).

Institutional Review Board Statement

The study was approved by the Ethical Committee of the National Medical Research Center for Obstetrics, Gynecology and Perinatology named after Academician V.I. Kulakov (protocol No. 2, dated 9 March 2017).

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors are grateful to Maria I. Indeykina for the assistance with proteomics experiments raw data analysis and acknowledge the support of Laboratory of Mass Spectrometry at Skolkovo Institute of Science and Technology for targeted proteomic analysis of blood sample of patients.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AUC Area under receive operational curve
BMI Body mass index
MAP Mean arterial pressure
SBP Systolic blood pressure
DBP Diastolic blood pressure
DIA Data-independent acquisition
FDR False discovery rate
HPLC High-performance liquid
LLOQ Lowest limit of quantification
MoM Multiply of medians
MRM Multiply reaction monitoring
MS Mass spectrometry
NAT Natural synthetic proteotypic peptides
OPLS-DA Orthogonal projection on latent structures discriminant analysis
QC Quality control
PASEF Parallel accumulation-serial fragmentation
PCA Principal component analysis
PE Preeclampsia
CS Caesarian section
CPR cerebroplacental ratio
UtA-PI Uterine artery pulsatility index
UA-PI Umbilical artery pulsatility index
PlGF Placental growth factor
ROC Receive operational curve
SIS Stable isotope labeled standard
SVM Support vector machine
VIP Variable important projection

References

  1. Magee, L.A.; Brown, M.A.; Hall, D.R.; Gupte, S.; Hennessy, A.; Karumanchi, S.A.; Kenny, L.C.; McCarthy, F.; Myers, J.; Poon, L.C.; et al. The 2021 International Society for the Study of Hypertension in Pregnancy Classification, Diagnosis & Management Recommendations for International Practice. Pregnancy Hypertens. 2022, 27, 148–169. [Google Scholar] [CrossRef]
  2. Wright, D.; Syngelaki, A.; Akolekar, R.; Poon, L.C.; Nicolaides, K.H. Competing Risks Model in Screening for Preeclampsia by Maternal Characteristics and Medical History. Am. J. Obstet. Gynecol. 2015, 213, 62.e1–62.e10. [Google Scholar] [CrossRef] [PubMed]
  3. O’Gorman, N.; Wright, D.; Poon, L.C.; Rolnik, D.L.; Syngelaki, A.; de Alvarado, M.; Carbone, I.F.; Dutemeyer, V.; Fiolna, M.; Frick, A.; et al. Multicenter Screening for Pre-Eclampsia by Maternal Factors and Biomarkers at 11–13 Weeks’ Gestation: Comparison with NICE Guidelines and ACOG Recommendations. Ultrasound Obstet. Gynecol. 2017, 49, 756–760. [Google Scholar] [CrossRef] [PubMed]
  4. Rezende, K.B. de C.; Bornia, R.G.; Rolnik, D.L.; Amim, J.; Ladeira, L.P.; Teixeira, V.M.G.; da Cunha, A.J.L.A. Performance of the First-Trimester Fetal Medicine Foundation Competing Risks Model for Preeclampsia Prediction: An External Validation Study in Brazil. AJOG Glob. Rep. 2024, 4, 100346. [Google Scholar] [CrossRef] [PubMed]
  5. Yang, Y.; Xie, Y.; Li, M.; Mu, Y.; Chen, P.; Liu, Z.; Wang, Y.; Li, Q.; Li, X.; Dai, L.; et al. Characteristics and Fetal Outcomes of Pregnant Women with Hypertensive Disorders in China: A 9-Year National Hospital-Based Cohort Study. BMC Pregnancy Childbirth 2022, 22, 924. [Google Scholar] [CrossRef]
  6. Leonard, S.A.; Siadat, S.; Main, E.K.; Huybrechts, K.F.; El-Sayed, Y.Y.; Hlatky, M.A.; Atkinson, J.; Sujan, A.; Bateman, B.T. Chronic Hypertension during Pregnancy: Prevalence and Treatment in the United States, 2008-2021. Hypertension 2024, 81, 1716–1723. [Google Scholar] [CrossRef]
  7. Pembe, A.B.; Dwarkanath, P.; Kikula, A.; Raj, J.M.; Perumal, N.; Paulo, H.A.; Rajalakshmi, M.; Duggan, C.P.; Masanja, H.M.; Chopra, N.; et al. Hypertensive Disorders of Pregnancy and Perinatal Outcomes: Two Prospective Cohort Studies of Nulliparous Women in India and Tanzania. BMJ Glob. Heal. 2025, 10, e016339. [Google Scholar] [CrossRef]
  8. Antwi, E.; Amoakoh-Coleman, M.; Vieira, D.L.; Madhavaram, S.; Koram, K.A.; Grobbee, D.E.; Agyepong, I.A.; Klipstein-Grobusch, K. Systematic Review of Prediction Models for Gestational Hypertension and Preeclampsia. PLoS ONE 2020, 15, e0230955. [Google Scholar] [CrossRef]
  9. Akolekar, R.; Syngelaki, A.; Poon, L.; Wright, D.; Nicolaides, K.H. Competing Risks Model in Early Screening for Preeclampsia by Biophysical and Biochemical Markers. Fetal Diagn. Ther. 2013, 33, 8–15. [Google Scholar] [CrossRef]
  10. Rolnik, D.L.; Wright, D.; Poon, L.C.Y.; Syngelaki, A.; O’Gorman, N.; de Paco Matallana, C.; Akolekar, R.; Cicero, S.; Janga, D.; Singh, M.; et al. ASPRE Trial: Performance of Screening for Preterm Pre-Eclampsia. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2017, 50, 492–495. [Google Scholar] [CrossRef]
  11. Corrigendum. Ultrasound Obstet. Gynecol. Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 2017, 50, 807. [CrossRef]
  12. Guerby, P.; Audibert, F.; Johnson, J.-A.; Okun, N.; Giguère, Y.; Forest, J.-C.; Chaillet, N.; Mâsse, B.; Wright, D.; Ghesquiere, L.; et al. Prospective Validation of First-Trimester Screening for Preterm Preeclampsia in Nulliparous Women (PREDICTION Study). Hypertension 2024, 81, 1574–1582. [Google Scholar] [CrossRef] [PubMed]
  13. Riishede, I.; Rode, L.; Sperling, L.; Overgaard, M.; Ravn, J.D.; Sandager, P.; Skov, H.; Wagner, S.R.; Nørgaard, P.; Clausen, T.D.; et al. Pre-Eclampsia Screening in Denmark (PRESIDE): National Validation Study. Ultrasound Obstet. Gynecol. 2023, 61, 682–690. [Google Scholar] [CrossRef] [PubMed]
  14. Poon, L.C.Y.; Zymeri, N.A.; Zamprakou, A.; Syngelaki, A.; Nicolaides, K.H. Protocol for Measurement of Mean Arterial Pressure at 11-13 Weeks’ Gestation. Fetal Diagn. Ther. 2012, 31, 42–48. [Google Scholar] [CrossRef]
  15. Plasencia, W.; Maiz, N.; Bonino, S.; Kaihura, C.; Nicolaides, K.H. Uterine Artery Doppler at 11 + 0 to 13 + 6 Weeks in the Prediction of Preeclampsia. Ultrasound Obstet. Gynecol. 2007, 30, 742–749. [Google Scholar]
  16. Starodubtseva, N.; Poluektova, A.; Tokareva, A.; Kukaev, E.; Avdeeva, A.; Rimskaya, E.; Khodzayeva, Z. Proteome-Based Maternal Plasma and Serum Biomarkers for Preeclampsia: A Systematic Review and Meta-Analysis. Life 2025, 15, 776. [Google Scholar] [CrossRef]
  17. Than, N.G.; Posta, M.; Györffy, D.; Orosz, L.; Orosz, G.; Rossi, S.W.; Ambrus-Aikelin, G.; Szilágyi, A.; Nagy, S.; Hupuczi, P.; et al. Early Pathways, Biomarkers, and Four Distinct Molecular Subclasses of Preeclampsia: The Intersection of Clinical, Pathological, and High-Dimensional Biology Studies. Placenta 2022, 125, 10–19. [Google Scholar] [CrossRef] [PubMed]
  18. Anderson, N.L. The Clinical Plasma Proteome: A Survey of Clinical Assays for Proteins in Plasma and Serum. Clin. Chem. 2010, 56, 177–185. [Google Scholar] [CrossRef]
  19. Demichev, V.; Szyrwiel, L.; Yu, F.; Teo, G.C.; Rosenberger, G.; Niewienda, A.; Ludwig, D.; Decker, J.; Kaspar-Schoenefeld, S.; Lilley, K.S.; et al. Dia-PASEF Data Analysis Using FragPipe and DIA-NN for Deep Proteomics of Low Sample Amounts. Nat. Commun. 2022, 13, 3944. [Google Scholar] [CrossRef]
  20. Lange, V.; Picotti, P.; Domon, B.; Aebersold, R. Selected Reaction Monitoring for Quantitative Proteomics: A Tutorial. Mol. Syst. Biol. 2008, 4, 222. [Google Scholar] [CrossRef]
  21. Gaither, C.; Popp, R.; Mohammed, Y.; Borchers, C.H. Determination of the Concentration Range for 267 Proteins from 21 Lots of Commercial Human Plasma Using Highly Multiplexed Multiple Reaction Monitoring Mass Spectrometry. Analyst 2020, 145, 3634–3644. [Google Scholar] [CrossRef]
  22. Gaither, C.; Popp, R.; Borchers, S.P.; Skarphedinsson, K.; Eiriksson, F.F.; Thorsteinsdóttir, M.; Mohammed, Y.; Borchers, C.H. Performance Assessment of a 125 Human Plasma Peptide Mixture Stored at Room Temperature for Multiple Reaction Monitoring-Mass Spectrometry. J. Proteome Res. 2021, 20, 4292–4302. [Google Scholar] [CrossRef]
  23. Starodubtseva, N.; Tokareva, A.; Kononikhin, A.; Brzhozovskiy, A.; Bugrova, A.; Kukaev, E.; Poluektova, A.; Frankevich, V.; Nikolaev, E.; Sukhikh, G. Multiplexed Quantification of First-Trimester Serum Biomarkers in Healthy Pregnancy. Int. J. Mol. Sci. 2025, 26, 7970. [Google Scholar] [CrossRef]
  24. Starodubtseva, N.; Tokareva, A.; Kononikhin, A.; Brzhozovskiy, A.; Bugrova, A.; Kukaev, E.; Muminova, K.; Nakhabina, A.; Frankevich, V.E.; Nikolaev, E.; et al. First-Trimester Preeclampsia-Induced Disturbance in Maternal Blood Serum Proteome: A Pilot Study. Int. J. Mol. Sci. 2024, 25, 10653. [Google Scholar] [CrossRef] [PubMed]
  25. Camacho-Carrasco, A.; Montenegro-Martínez, J.; Miranda-Guisado, M.L.; Muñoz-Hernández, R.; Salsoso, R.; Fatela-Cantillo, D.; García-Díaz, L.; Stiefel García-Junco, P.; Mate, A.; Vázquez, C.M.; et al. Association of First-Trimester Maternal Biomarkers with Preeclampsia and Related Maternal and Fetal Severe Adverse Events. Int. J. Mol. Sci. 2025, 26, 6684. [Google Scholar] [CrossRef]
  26. Wu, P.; Van Den Berg, C.; Alfirevic, Z.; O’brien, S.; Röthlisberger, M.; Baker, P.N.; Kenny, L.C.; Kublickiene, K.; Duvekot, J.J. Early Pregnancy Biomarkers in Pre-Eclampsia: A Systematic Review and Meta-Analysis. Int. J. Mol. Sci. 2015, 16, 23035–23056. [Google Scholar] [CrossRef]
  27. Kononikhin, A.S.; Starodubtseva, N.L.; Brzhozovskiy, A.G.; Tokareva, A.O.; Kashirina, D.N.; Zakharova, N.V.; Bugrova, A.E.; Indeykina, M.I.; Pastushkova, L.K.; Larina, I.M.; et al. Absolute Quantitative Targeted Monitoring of Potential Plasma Protein Biomarkers: A Pilot Study on Healthy Individuals. Biomedicines 2024, 12, 2403. [Google Scholar] [CrossRef]
  28. MacLean, B.X.; Pratt, B.S.; Egertson, J.D.; MacCoss, M.J.; Smith, R.D.; Baker, E.S. Using Skyline to Analyze Data-Containing Liquid Chromatography, Ion Mobility Spectrometry, and Mass Spectrometry Dimensions. J. Am. Soc. Mass Spectrom. 2018, 29, 2182–2188. [Google Scholar] [CrossRef]
  29. European Medicines Agency ICH Guideline M10 on Bioanalytical Method Validation and Study Sample Analysis; 2022; Vol. 44. Vol. 44.
  30. Gerault, M.-A.; Camoin, L.; Granjeaud, S. DIAgui: A Shiny Application to Process the Output from DIA-NN. Bioinforma. Adv. 2024, 00, vbae001. [Google Scholar] [CrossRef] [PubMed]
  31. Tyanova, S.; Temu, T.; Sinitcyn, P.; Carlson, A.; Hein, M.Y.; Geiger, T.; Mann, M.; Cox, J. The Perseus Computational Platform for Comprehensive Analysis of (Prote)Omics Data. Nat. Methods 2016, 13, 731–740. [Google Scholar] [CrossRef] [PubMed]
  32. Wang, M.; Jiang, L.; Jian, R.; Chan, J.Y.; Liu, Q.; Snyder, M.P.; Tang, H. RobNorm: Model-Based Robust Normalization Method for Labeled Quantitative Mass Spectrometry Proteomics Data. Bioinformatics 2021, 37, 815–821. [Google Scholar] [CrossRef]
  33. Galindo-Prieto, B.; Eriksson, L.; Trygg, J. Variable Influence on Projection (VIP) for Orthogonal Projections to Latent Structures (OPLS). J. Chemom. 2014, 28, 623–632. [Google Scholar] [CrossRef]
  34. Burges, C.J.C. A Tutorial on Support Vector Machines for Pattern Recognition. Data Min. Knowl. Discov. 1998, 2, 121–167. [Google Scholar] [CrossRef]
  35. Kam Ho, T. Random Decision Forests, 2002.
  36. Clerc, M.; Kennedy, J. The Particle Swarm—Explosion, Stability, and Convergence in a Multidimensional Complex Space. Mutat. Res. DNAging 2002, 6, 58–73. [Google Scholar] [CrossRef]
  37. Szklarczyk, D.; Kirsch, R.; Koutrouli, M.; Nastou, K.; Mehryary, F.; Hachilif, R.; Gable, A.L.; Fang, T.; Doncheva, N.T.; Pyysalo, S.; et al. The STRING Database in 2023: Protein-Protein Association Networks and Functional Enrichment Analyses for Any Sequenced Genome of Interest. Nucleic Acids Res. 2023, 51, D638–D646. [Google Scholar] [CrossRef] [PubMed]
  38. CoreTeam, R. R: A Language and Environment for Statistical Computing Available online: https://www.r-project.org.
  39. R team R Studio: Integrated Development for R.; RStudio, Inc.: Boston, MA, 2016.
  40. Thévenot, E.A.; Roux, A.; Xu, Y.; Ezan, E.; Junot, C. Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses. J. Proteome Res. 2015, 14, 3322–3335. [Google Scholar] [CrossRef] [PubMed]
  41. Torchiano, M. Effsize: Efficient Effect Size Computation Available online: https://cran.r-project.org/package=effsize.
  42. Champely, S. Pwr: Basic Functions for Power Analysis Available online: https://github.com/heliosdrm/pwr.
  43. Meyer, D. Support Vector Machines. The Interface to Libsvm in Package E1071 2024, 8.
  44. Kuhn, M. Building Predictive Models in R Using the Caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar]
  45. Wickham, H.; François, R.; Henry, L.; Müller, K.; Vaughan, D. Dplyr: A Grammar of Data Manipulation 2025.
  46. Wickham, H. Elegant Graphics for Data Analysis: Ggplot2; 2008; ISBN 978-0-387-78170-9.
  47. Cai, C.; Zhang, Z.; Morales, M.; Wang, Y.; Khafipour, E.; Friel, J. Feeding Practice Influences Gut Microbiome Composition in Very Low Birth Weight Preterm Infants and the Association with Oxidative Stress: A Prospective Cohort Study. Free Radic. Biol. Med. 2019, 142, 146–154. [Google Scholar]
  48. Wickham, H. Forcats: Tools for Working with Categorical Variables (Factors) 2023.
  49. Slowikowski, K. Ggrepel: Automatically Position Non-Overlapping Text Labels with “Ggplot2” Available online: https://github.com/slowkow/ggrepel.
  50. Kolde, R. Pheatmap: Pretty Heatmaps Available online: https://github.com/raivokolde/pheatmap.
  51. Turck, N.; Vutskits, L.; Sanchez-Pena, P.; Robin, X.; Hainard, A.; Gex-Fabry, M.; Fouda, C.; Bassem, H.; Mueller, M.; Lisacek, F.; et al. PROC: An Open-Source Package for R and S+ to Analyze and Compare ROC Curves. BMC Bioinform. 2011, 8, 12–77. [Google Scholar]
  52. O’Gorman, N.; Wright, D.; Syngelaki, A.; Akolekar, R.; Wright, A.; Poon, L.C.; Nicolaides, K.H. Competing Risks Model in Screening for Preeclampsia by Maternal Factors and Biomarkers at 11-13 Weeks Gestation. Am. J. Obstet. Gynecol. 2016, 214, 103.e1–103.e12. [Google Scholar] [CrossRef]
  53. Rolnik, D.L.; Wright, D.; Poon, L.C.; O’Gorman, N.; Syngelaki, A.; de Paco Matallana, C.; Akolekar, R.; Cicero, S.; Janga, D.; Singh, M.; et al. Aspirin versus Placebo in Pregnancies at High Risk for Preterm Preeclampsia. N. Engl. J. Med. 2017, 377, 613–622. [Google Scholar] [CrossRef] [PubMed]
  54. Duley, L.; Meher, S.; Hunter, K.E.; Seidler, A.L.; Askie, L.M. Antiplatelet Agents for Preventing Pre-Eclampsia and Its Complications. Cochrane Database Syst. Rev. 2019, 2019, CD004659. [Google Scholar] [CrossRef]
  55. Starodubtseva, N.; Tokareva, A.; Kononikhin, A.; Bugrova, A.; Indeykina, M.; Kukaev, E.; Poluektova, A.; Brzhozovskiy, A.; Nikolaev, E.; Sukhikh, G. Machine Learning and Blood-Targeted Proteomics Enable Early Prediction and Etiological Discrimination of Hypertensive Pregnancy Disorders. Int. J. Mol. Sci. 2026, 27, 1402. [Google Scholar] [CrossRef]
  56. Starodubtseva, N.; Tokareva, A.; Frankevich, N.; Kononikhin, A.; Bugrova, A.; Indeykina, M.; Kukaev, E.; Derenko, A.; Frankevich, V.; Nikolaev, E.; et al. Integrated Clinical and Molecular Profiling of Fetal Growth Disorders in the First Trimester. Int. J. Mol. Sci. 2026, 27, 4192. [Google Scholar] [CrossRef]
  57. Than, N.G.; Romero, R.; Györffy, D.; Posta, M.; Bhatti, G.; Done, B.; Chaemsaithong, P.; Jung, E.; Suksai, M.; Gotsch, F.; et al. Molecular Subclasses of Preeclampsia Characterized by a Longitudinal Maternal Proteomics Study: Distinct Biomarkers, Disease Pathways and Options for Prevention. J. Perinat. Med. 2023, 51, 51–68. [Google Scholar] [CrossRef]
  58. Than, N.G.; Romero, R.; Posta, M.; Györffy, D.; Szalai, G.; Rossi, S.W.; Szilágyi, A.; Hupuczi, P.; Nagy, S.; Török, O.; et al. Classification of Preeclampsia According to Molecular Clusters with the Goal of Achieving Personalized Prevention. J. Reprod. Immunol. 2024, 161, 104172. [Google Scholar] [CrossRef]
  59. He, Y. dong; Xu, B. ning; Wang, M. lu; Wang, Y. qin; Yu, F.; Chen, Q.; Zhao, M. hui Dysregulation of Complement System during Pregnancy in Patients with Preeclampsia: A Prospective Study. Mol. Immunol. 2020, 122, 69–79. [Google Scholar] [CrossRef]
  60. Matsuyama, T.; Tomimatsu, T.; Mimura, K.; Yagi, K.; Kawanishi, Y.; Kakigano, A.; Nakamura, H.; Endo, M.; Kimura, T. Complement Activation by an Angiogenic Imbalance Leads to Systemic Vascular Endothelial Dysfunction: A New Proposal for the Pathophysiology of Preeclampsia. J. Reprod. Immunol. 2021, 145, 103322. [Google Scholar] [CrossRef]
  61. Starodubtseva, N.; Nizyaeva, N.; Baev, O.; Bugrova, A.; Gapaeva, M.; Muminova, K.; Kononikhin, A.; Frankevich, V.; Nikolaev, E.; Sukhikh, G. SERPINA1 Peptides in Urine as a Potential Marker of Preeclampsia Severity. Int. J. Mol. Sci. 2020, 21, 914. [Google Scholar] [CrossRef]
  62. Chen, H.; Aneman, I.; Nikolic, V.; Karadzov Orlic, N.; Mikovic, Z.; Stefanovic, M.; Cakic, Z.; Jovanovic, H.; Town, S.E.L.; Padula, M.P.; et al. Maternal Plasma Proteome Profiling of Biomarkers and Pathogenic Mechanisms of Early-Onset and Late-Onset Preeclampsia. Sci. Rep. 2022, 12, 19099. [Google Scholar] [CrossRef] [PubMed]
  63. Yoshida, K.; Kusama, K.; Tamura, K.; Fukushima, Y.; Ohmaru-Nakanishi, T.; Kato, K. Alpha-1 Antitrypsin-Induced Endoplasmic Reticulum Stress Promotes Invasion by Extravillous Trophoblasts. Int. J. Mol. Sci. 2021, 22, 3683. [Google Scholar] [CrossRef] [PubMed]
  64. Rood, K.M.; Buhimschi, C.S.; Dible, T.; Webster, S.; Zhao, G.; Samuels, P.; Buhimschi, I.A. Congo Red Dot Paper Test for Antenatal Triage and Rapid Identification of Preeclampsia. EClinicalMedicine 2019, 8, 47–56. [Google Scholar] [CrossRef]
  65. Buhimschi, I.A.; Nayeri, U.A.; Zhao, G.; Shook, L.L.; Pensalfini, A.; Funai, E.F.; Bernstein, I.M.; Glabe, C.G.; Buhimschi, C.S. Protein Misfolding, Congophilia, Oligomerization, and Defective Amyloid Processing in Preeclampsia. Sci. Transl. Med. 2014, 6, 245ra92. [Google Scholar] [CrossRef] [PubMed]
  66. Yang, H.H.; Baldauf, C.; Pickering, T.A.; Gjessing, H.K.; Ingles, S.A.; Wilson, M.L. Maternal and Fetal SERPINA3 Polymorphisms and Risk of Preeclampsia: A Dyad and Triad Based Case-Control Study. Curr. Issues Mol. Biol. 2025, 47, 952. [Google Scholar] [CrossRef]
  67. Zhang, L.; Gao, S.; Luan, Y.; Su, S.; Zhang, E.; Liu, J.; Xie, S.; Zhang, Y.; Yue, W.; Liu, R.; et al. Predictivity of Hepatic Steatosis Index for Gestational Hypertension and Preeclampsia: A Prospective Cohort Study. Int. J. Med. Sci. 2025, 22, 834–844. [Google Scholar] [CrossRef]
  68. Elhoseeny, M.M.; Abdulaziz, B.A.; Mohamed, M.A.; Elsharaby, R.M.; Rashad, G.M.; Othman, A.A.A. Fetuin-A: A Relevant Novel Serum Biomarker for Non-Invasive Diagnosis of Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD): A Retrospective Case-Control Study. BMC Gastroenterol. 2024, 24, 226. [Google Scholar] [CrossRef]
  69. Sarkar, M.; Grab, J.; Dodge, J.L.; Gunderson, E.P.; Rubin, J.; Irani, R.A.; Cedars, M.; Terrault, N. Non-Alcoholic Fatty Liver Disease in Pregnancy Is Associated with Adverse Maternal and Perinatal Outcomes. J. Hepatol. 2020, 73, 516–522. [Google Scholar] [CrossRef]
  70. El Jamaly, H.; Eslick, G.D.; Weltman, M. Systematic Review with Meta-Analysis: Non-Alcoholic Fatty Liver Disease and the Association with Pregnancy Outcomes. Clin. Mol. Hepatol. 2022, 28, 52–66. [Google Scholar] [CrossRef]
  71. Chaemsaithong, P.; Romero, R.; Tarca, A.L.; Korzeniewski, S.J.; Schwartz, A.G.; Miranda, J.; Ahmed, A.I.; Dong, Z.; Hassan, S.S.; Yeo, L.; et al. Maternal Plasma Fetuin-A Concentration Is Lower in Patients Who Subsequently Developed Preterm Preeclampsia than in Uncomplicated Pregnancy: A Longitudinal Study. J. Matern. Neonatal Med. 2015, 28, 1260–1269. [Google Scholar] [CrossRef]
  72. Kolialexi, A.; Tsangaris, G.T.; Sifakis, S.; Gourgiotis, D.; Katsafadou, A.; Lykoudi, A.; Marmarinos, A.; Mavreli, D.; Pergialiotis, V.; Fexi, D.; et al. Plasma Biomarkers for the Identification of Women at Risk for Early-Onset Preeclampsia. Expert Rev. Proteom. 2017, 14, 269–276. [Google Scholar] [CrossRef] [PubMed]
  73. Tiensuu, H.; Haapalainen, A.M.; Tissarinen, P.; Pasanen, A.; Määttä, T.A.; Huusko, J.M.; Ohlmeier, S.; Bergmann, U.; Ojaniemi, M.; Muglia, L.J.; et al. Human Placental Proteomics and Exon Variant Studies Link AAT/SERPINA1 with Spontaneous Preterm Birth. BMC Med. 2022, 20, 141. [Google Scholar] [CrossRef] [PubMed]
  74. Jin, P.P.; Ding, N.; Dai, J.; Liu, X.Y.; Mao, P.M. Investigation of the Relationship between Changes in Maternal Coagulation Profile in the First Trimester and the Risk of Developing Preeclampsia. Heliyon 2023, 9, e17983. [Google Scholar] [CrossRef]
  75. Parameshwar, P.K.; Sagrillo-Fagundes, L.; Fournier, C.; Girard, S.; Vaillancourt, C.; Moraes, C. Disease-Specific Extracellular Matrix Composition Regulates Placental Trophoblast Fusion Efficiency. Biomater. Sci. 2021, 9, 7247–7256. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Overview of serum proteomic data analysis and model performance for first-trimester prediction of PE using DIA-MS. a) PCA score plot showing sample distribution in principal component space. Red circles represent samples from patients with PE (n=32), while blue circles represent samples from the control group (n=32). (b) OPLS-DA score plot illustrating group separation between PE patients (red) and controls (blue). The model effectively distinguishes the two groups based on first-trimester serum protein profiles. (c) Volcano plot of all proteins quantified by DIA-MS. The x-axis represents the log₂(fold change), calculated as the ratio of the median protein level in the PE group to the median level in the control group. The y-axis shows the −log₁₀(FDR-adjusted p-value). Proteins with FDR < 0.01 are labeled in regular text; proteins meeting both FDR < 0.01 and VIP > 1.5 are highlighted in bold text. (d) Receiver operating characteristic (ROC) curves generated from 10-fold cross-validation of machine learning models. Each curve represents the diagnostic performance of a classifier in distinguishing future PE patients from controls. Corresponding AUC values are reported.
Figure 1. Overview of serum proteomic data analysis and model performance for first-trimester prediction of PE using DIA-MS. a) PCA score plot showing sample distribution in principal component space. Red circles represent samples from patients with PE (n=32), while blue circles represent samples from the control group (n=32). (b) OPLS-DA score plot illustrating group separation between PE patients (red) and controls (blue). The model effectively distinguishes the two groups based on first-trimester serum protein profiles. (c) Volcano plot of all proteins quantified by DIA-MS. The x-axis represents the log₂(fold change), calculated as the ratio of the median protein level in the PE group to the median level in the control group. The y-axis shows the −log₁₀(FDR-adjusted p-value). Proteins with FDR < 0.01 are labeled in regular text; proteins meeting both FDR < 0.01 and VIP > 1.5 are highlighted in bold text. (d) Receiver operating characteristic (ROC) curves generated from 10-fold cross-validation of machine learning models. Each curve represents the diagnostic performance of a classifier in distinguishing future PE patients from controls. Corresponding AUC values are reported.
Preprints 217527 g001
Figure 2. The 10 most enriched pathways among DIA-MS-detected protein markers of PE. Pathway enrichment analysis (STRING, FDR < 0.01) identified complement activation as the dominant pathway, along with significant enrichment of protein phosphorylation, IGF transport regulation, platelet degranulation, and selenium-related micronutrient networks.
Figure 2. The 10 most enriched pathways among DIA-MS-detected protein markers of PE. Pathway enrichment analysis (STRING, FDR < 0.01) identified complement activation as the dominant pathway, along with significant enrichment of protein phosphorylation, IGF transport regulation, platelet degranulation, and selenium-related micronutrient networks.
Preprints 217527 g002
Figure 3. Overview of serum proteomic data analysis and predictive model performance for first-trimester prediction of PE using MRM-MS. (a) PCA of serum proteomic data. Samples are projected onto the first two principal components (PC1 and PC2) based on MRM-MS–derived proteomic profiles. Red dots represent patients who later developed PE (n = 32), and blue dots represent control subjects with uncomplicated pregnancies (n = 32). The PCA plot illustrates the natural variance and clustering tendency between the two groups prior to supervised modeling. (b) OPLS-DA score plot. Supervised classification of the same serum samples highlights the separation between future PE cases (red) and controls (blue). OPLS-DA maximizes covariance between proteomic data and group membership, enabling identification of latent variables driving group discrimination. (c) Volcano plot of individual proteins quantified by MRM-MS. The x-axis represents the log₂(fold change), calculated as the ratio of the median protein level in the PE group to the median level in the control group. The y-axis shows the −log₁₀(FDR-adjusted p-value). Among these, proteins with VIP scores > 1.5 (from OPLS-DA) are shown in bold, indicating strong discriminative and predictive potential for first-trimester PE risk assessment.
Figure 3. Overview of serum proteomic data analysis and predictive model performance for first-trimester prediction of PE using MRM-MS. (a) PCA of serum proteomic data. Samples are projected onto the first two principal components (PC1 and PC2) based on MRM-MS–derived proteomic profiles. Red dots represent patients who later developed PE (n = 32), and blue dots represent control subjects with uncomplicated pregnancies (n = 32). The PCA plot illustrates the natural variance and clustering tendency between the two groups prior to supervised modeling. (b) OPLS-DA score plot. Supervised classification of the same serum samples highlights the separation between future PE cases (red) and controls (blue). OPLS-DA maximizes covariance between proteomic data and group membership, enabling identification of latent variables driving group discrimination. (c) Volcano plot of individual proteins quantified by MRM-MS. The x-axis represents the log₂(fold change), calculated as the ratio of the median protein level in the PE group to the median level in the control group. The y-axis shows the −log₁₀(FDR-adjusted p-value). Among these, proteins with VIP scores > 1.5 (from OPLS-DA) are shown in bold, indicating strong discriminative and predictive potential for first-trimester PE risk assessment.
Preprints 217527 g003
Figure 4. Integrated correlation analysis of proteomic markers and clinical characteristics for PE: heatmaps of statistically significant associations (p < 0.05) between key variables. Cells with non-zero correlation coefficients are color-coded according to the strength and direction of the correlation (red for positive, blue for negative). Each significant cell is annotated with the exact correlation coefficient value (e.g., r = 0.45) to facilitate precise interpretation. (a) Correlation heatmap between DIA-MS markers (rows) and MRM-MS markers (columns) of PE. This panel highlights the concordance and complementarity between protein markers identified via semi-quantitative, non-targeted DIA-MS and quantitative, targeted MRM-MS. Significant correlations suggest technical reproducibility and biological consistency across the two proteomic platforms. (b) Correlation heatmap between DIA-MS markers of PE and clinical characteristics of the women. Statistically significant correlations indicate potential links between molecular signatures and clinically measurable phenotypes, supporting the biological relevance of the identified markers.
Figure 4. Integrated correlation analysis of proteomic markers and clinical characteristics for PE: heatmaps of statistically significant associations (p < 0.05) between key variables. Cells with non-zero correlation coefficients are color-coded according to the strength and direction of the correlation (red for positive, blue for negative). Each significant cell is annotated with the exact correlation coefficient value (e.g., r = 0.45) to facilitate precise interpretation. (a) Correlation heatmap between DIA-MS markers (rows) and MRM-MS markers (columns) of PE. This panel highlights the concordance and complementarity between protein markers identified via semi-quantitative, non-targeted DIA-MS and quantitative, targeted MRM-MS. Significant correlations suggest technical reproducibility and biological consistency across the two proteomic platforms. (b) Correlation heatmap between DIA-MS markers of PE and clinical characteristics of the women. Statistically significant correlations indicate potential links between molecular signatures and clinically measurable phenotypes, supporting the biological relevance of the identified markers.
Preprints 217527 g004
Table 1. Clinical characteristics of the control and PE groups. Continuous data are presented as median (Q1; Q3); categorical data are presented as n (%). P-values were calculated using the Mann–Whitney U test for continuous variables and Pearson’s chi-square test for categorical variables. BMI – body mass index, IVF - in vitro fertilization, MAP – mean arterial pressure, SBP - systolic blood pressure, DBP - diastolic blood pressure, PlGF - placental growth factor, UA-PI - umbilical artery pulsatility index, CPR - cerebroplacental ratio, CS – caesarian section.
Table 1. Clinical characteristics of the control and PE groups. Continuous data are presented as median (Q1; Q3); categorical data are presented as n (%). P-values were calculated using the Mann–Whitney U test for continuous variables and Pearson’s chi-square test for categorical variables. BMI – body mass index, IVF - in vitro fertilization, MAP – mean arterial pressure, SBP - systolic blood pressure, DBP - diastolic blood pressure, PlGF - placental growth factor, UA-PI - umbilical artery pulsatility index, CPR - cerebroplacental ratio, CS – caesarian section.
Feature Control (n=32) PE (n=32) p-Value
Age, years, Me[Q1;Q3] 31.6 (29.4; 34.75) 33.35 (29.08; 37.45) 0.08
BMI, Me[Q1;Q3] 21.49 (19.72; 22.89) 22.74 (20.36; 24.52) 0.07
Previous PE, n (%) 0(0%) 7(22%) 0.02
Nulliparous, n(%) 11(34%) 24(72%) <0.001
IVF, n(%) 0(0%) 2(6%) 0.49
Habitual miscarage, n(%) 0(0%) 12(38%) <0.001
Previous preterm delivery, n(%) 0(0%) 8(25%) 0.01
Gestational age at sample collection, wks, Me[Q1;Q3] 12.14 (11.86; 12.93) 12.29 (12.11; 12.57) 0.73
MAP, MoM, Me[Q1;Q3] 1 (0.95; 1.04) 1.04 (0.97; 1.12) 0.1
PIGF (1st trimester prenatal screening), MoM, Me[Q1;Q3] 0.74 (0.54; 1.06) 0.55 (0.36; 0.69) 0.002
FMF first-trimester high PE risk, n(%) 0(0%) 17(53%) <0.001
Max. SBP, Me[Q1;Q3] 115 (110; 120) 135 (125; 149.75) <0.001
Max. DBP, Me[Q1;Q3] 70 (70; 74.5) 89 (80; 99.25) <0.001
UA-PI, Me[Q1;Q3] 0.79 (0.73; 0.88) 0.95 (0.82; 1.13) 0.002
CPR, Me[Q1;Q3] 1.89 (1.58; 2.2) 1.44 (1.31; 1.75) <0.001
24-hour proteinuria, g/L, Me[Q1;Q3] 0 (0; 0) 1.1 (0.61; 2.18) <0.001
Creatinine, µM/L, Me[Q1;Q3] 66.6 (63.3; 69.45) 80.8 (70.85; 87.77) <0.001
Gestational age at delivery, wks, Me[Q1;Q3] 39.4 (38.55; 40) 37.2 (34.8; 38) <0.001
Blood loss at delivery, ml, Me[Q1;Q3] 300 (250; 350) 700 (475; 700) <0.001
Emergency CS, n(%) 1(3%) 19(59%) <0.001
Apgar score at 1 min 8 (8; 8) 8 (7; 8) 0.002
Apgar score at 5 min 9 (9; 9) 8.5 (8; 9) 0.001
Newborn weight, g, Me[Q1;Q3] 3400 (3215; 3613) 2745 (1691; 3097.5) <0.001
Table 2. Quality metrics of machine learning models for predicting PE from first-trimester serum protein markers detected by DIA-MS. Models included OPLS-DA, SVM (linear, polynomial, radial and sigmoid kernels) with PSO-optimized hyperparameters, and random forest. Performance was evaluated by 10-fold cross-validation, reporting sensitivity, specificity, accuracy, AUC, PPV, NPV, and F-score. Protein markers were selected via VIP > 1.5 and FDR < 0.01, with pathway enrichment (STRING, FDR < 0.01) confirming biological relevance.
Table 2. Quality metrics of machine learning models for predicting PE from first-trimester serum protein markers detected by DIA-MS. Models included OPLS-DA, SVM (linear, polynomial, radial and sigmoid kernels) with PSO-optimized hyperparameters, and random forest. Performance was evaluated by 10-fold cross-validation, reporting sensitivity, specificity, accuracy, AUC, PPV, NPV, and F-score. Protein markers were selected via VIP > 1.5 and FDR < 0.01, with pathway enrichment (STRING, FDR < 0.01) confirming biological relevance.
Model Accuracy, % Sensitivity, % Specificity, % AUC PPV NPV F-score
OPLS-DA 94% 94% 94% 0.94 0.94 0.94 0.94
SVM, linear kernel 95% 94% 97% 0.95 0.97 0.94 0.95
SVM, polynomial kernel 95% 94% 97% 0.95 0.97 0.94 0.95
SVM., radial kernel 73% 75% 72% 0.73 0.73 0.74 0.74
SVM, sigmoid kernel 89% 88% 91% 0.89 0.90 0.88 0.89
Random Forest 95% 94% 97% 0.95 0.97 0.94 0.95
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated