4. Discussion
In this study of 1,902 postmenopausal women from the NHANES dataset, we developed a machine learning model to identify key predictors of the P4:E2 ratio—an emerging risk factor for endometrial cancer. The model achieved an R2 of 0.298 on the test set, indicating that approximately 30% of the variance in the log-transformed P4:E2 ratio could be explained by the selected predictors. We found that FSH, waist circumference, and CRP were the most influential predictors, followed by total cholesterol, LH, and intake of dietary fat, protein, and sugar. Additional models revealed that FSH and waist circumference primarily predicted estradiol levels, while progesterone was more strongly influenced by cholesterol and LH. These findings offer new insights into the hormonal, metabolic, and lifestyle correlates of the P4:E2 ratio and provide a foundation for future work aimed at understanding its role in postmenopausal health and disease risk.
The observed inverse association between FSH and the P4:E2 ratio is biologically consistent with known endocrine adaptations to menopause [
8]. As ovarian function declines, circulating levels of both progesterone and estradiol decrease, resulting in diminished negative feedback on the hypothalamic–pituitary–gonadal axis. Notably, progesterone production stabilizes at low levels by the onset of menopause, whereas estradiol continues to be produced in more variable quantities, with progressively lower and more stable levels as menopause progresses. As a result, with the progression of menopause, the P4:E2 ratio increases, leading to compensatory increases in FSH. This relationship is visualized in the SHAP dependence plot (
Figure 2), which reveals a strong, nonlinear positive association between FSH and the predicted log-transformed P4:E2 ratio. SHAP values increase sharply with FSH concentrations up to approximately 75 mIU/mL, after which the curve plateaus, indicating a deflection point beyond which additional increases in FSH contribute minimally to the model’s output. This plateau may reflect a biological “ceiling effect,” wherein estradiol levels have already reached minimal postmenopausal values, thereby limiting the predictive utility of further increases in FSH.
Waist circumference emerged as a key feature influencing the P4:E2 ratio, with SHAP dependence plots suggesting that this effect was primarily driven by estradiol elevations associated with adiposity, while progesterone contributed a waist circumference–restricted effect that emerged only beyond higher thresholds of central adiposity. In the estradiol model, SHAP values increased steadily with waist circumference, reflecting the well-established role of adipose tissue as a site of peripheral estrogen biosynthesis via aromatization [
9]. This adiposity-related rise in estradiol exerts downward pressure on the P4:E2 ratio by disproportionately elevating estradiol relative to progesterone. Notably, starting at waist circumferences of approximately 100 cm, a modest progesterone decline also emerged, suggesting that at this level of adiposity the influence of central adiposity may extend to both hormones, with progesterone dynamics contributing a secondary, waist circumference–dependent effect that reinforces the downward slope of the ratio.
C-reactive protein (CRP) emerged as one of the strongest non-hormonal predictors of the P4:E2 ratio, with SHAP dependence plots revealing a sharp decline in the ratio at lower CRP concentrations, followed by a plateau beyond approximately 5 mg/L. In the estradiol model, SHAP values increased steeply below this threshold and then stabilized, indicating a nonlinear relationship. The effects of estradiol on inflammation are context-dependent and can be pro- or anti-inflammatory depending on the cytokine profile, immune cell type, and estrogen receptor expression patterns [
10]. Pro-inflammatory actions of estradiol, as suggested in the present study assessing the hormone’s association with CRP, are mediated through estrogen receptor signaling pathways that activate transcription factors, particularly in immune and endothelial cells [
11]. The pattern observed in the present study underscores the importance of accounting for the concentration-sensitive interactions between estrogenic activity and inflammatory signaling in postmenopausal physiology, especially considering the altered distribution and function of estrogen receptor subtypes that occur with aging. In fact, prior studies examining the CRP–estradiol associations in postmenopausal women yielded mixed results—some reporting a positive association, others finding no significant relationship (reviewed in [
12]). The use of machine learning in the present analysis enabled detection of threshold-dependent, positive nonlinear associations that helps to reconcile these discrepancies and offer a more nuanced understanding of inflammation–estradiol dynamics.
Total cholesterol exhibited a nonlinear, predominantly positive association in both the P4:E2 and progesterone models. The relationship between estradiol and total cholesterol was relatively weak and more linear, suggesting a limited role for cholesterol in estradiol regulation. The divergence in the association between the two reproductive hormones (i.e., estradiol and progesterone) and total cholesterol likely reflects differences in their positions within the steroid biosynthesis pathway, with progesterone situated upstream and closer to cholesterol than estradiol. In the P4:E2 ratio model, SHAP values increased notably between approximately 140 and 200 mg/dL, with a plateau observed at higher concentrations, indicating a threshold-dependent effect on the ratio. The progesterone model revealed a strong and pronounced positive association, with SHAP values rising steeply between 120 and 220 mg/dL before stabilizing, highlighting cholesterol as a key metabolic predictor of progesterone levels (and the higher P4:E2 ratio) in postmenopausal women.
Although ovulatory cycles cease after menopause, the pulsatile release of LH often mirrors that of FSH, albeit its secretory amplitude is smaller [
13]. The non-linear relationship between LH and the P4:E2 ratio appears to reflect distinct—and at times opposing—contributions from estradiol and progesterone. In the present study, the stimulatory effect of LH on progesterone output plateaued at approximately 40 mIU/mL, contributing to an increase in the ratio within this range. Conversely, estradiol demonstrated a continuous positive association with LH across the entire range of values, exerting a countervailing influence that tempered the rise in the ratio up to 40 mIU/mL. Beyond this threshold, the persistent rise in estradiol—combined with the plateauing of progesterone—drove the ratio downward.
Carbohydrate intake emerged as the most consistent and meaningful dietary contributor across the three models. It showed a positive association with estradiol, particularly within the ~100 to ~250 g/day range, beyond which the effect plateaued. A similar but more attenuated pattern was observed in the P4:E2 ratio model, with SHAP values increasing gradually and leveling off beyond ~200 g/day. In the progesterone model, the association with carbohydrate intake was relatively flat, with only minor positive effects observed at lower intake levels, indicating a less consistent relationship. The remaining dietary measures exhibited weaker, inconsistent, or minimal effects on hormonal outcomes. The remaining dietary features tended to show nonlinear but shallow SHAP profiles, often centering near zero or demonstrating fluctuating associations that lacked clear thresholds or sustained impact across models.
The SHAP dependence plot for age at menarche in the P4:E2 ratio model revealed a U-shaped pattern, with a modest decline in the ratio observed between approximately ages 10 and 13, followed by a steady increase beyond this range. This shape appears to reflect contrasting associations in the component hormone models: in the estradiol model, earlier age at menarche is associated with higher estradiol levels, while in the progesterone model, a positive association emerges at later menarche ages. These opposing trends result in a biphasic effect on the ratio, where the influence of estradiol predominates at younger menarcheal ages, lowering the ratio, and progesterone’s influence becomes more apparent at later ages, pushing the ratio upward. This composite pattern underscores how developmental timing may impart lasting effects on postmenopausal hormonal balance through divergent trajectories of individual steroid hormones. The SHAP dependence plots for age across the three models—estradiol, progesterone, and the P4:E2 ratio—show largely modest and inconsistent effects, suggesting limited explanatory value of chronological age alone in postmenopausal hormone variability.
Estrone sulfate serves as a circulating estrogen reservoir that can be converted to bioactive estradiol via the intermediate estrone conversion step. This process appears to be limited in postmenopausal women as the SHAP dependence plots reveal subtle associations across the estradiol and P4:E2 models. In the estradiol model, there is a mild nonlinear relationship, with SHAP values increasing slightly at low ratio values, followed by a plateau, suggesting a modest positive influence of a higher sulfate-to-parent hormone balance on estradiol levels. Similarly, the P4:E2 ratio model exhibits minimal SHAP variation across the estrone sulfate:estrone ratio range, indicating limited influence on the ratio itself. These findings suggest that while estrone sulfate may contribute to estradiol availability, its impact is not strong enough to meaningfully affect the balance between progesterone and estradiol in a postmenopausal context.
Having examined the individual SHAP dependence patterns above, a more integrated understanding takes shape regarding the interplay between global feature importance and context-specific hormonal dynamics. Estradiol consistently emerged as the dominant hormonal driver across the examined features, demonstrating the highest SHAP magnitudes and most pronounced associations in both the individual estradiol model and the P4:E2 ratio model (Table 2). However, although the overall SHAP magnitude for the progesterone model was modest, the hormone nonetheless exerted a meaningful influence on the P4:E2 ratio in specific contexts. This was particularly evident for features such as total cholesterol LH, and waist circumference where SHAP dependence plots showed that progesterone altered the shape and direction of the ratio’s response. These findings underscore the importance of considering biological relevance alongside global model performance metrics. Progesterone’s sensitivity to upstream metabolic and gonadotropic signals—even if less predictive in isolation—can meaningfully modulate hormonal balance, particularly in systems modeled as ratios. Thus, while estradiol was the dominant driver of SHAP variance in most cases, progesterone’s context-specific contributions add interpretive depth to mechanistic inferences.
The mechanistic insights presented here align with epidemiological evidence linking distinct patterns of progesterone and estradiol concentrations to hormone-sensitive cancer risk in postmenopausal women. Notably, endogenous progesterone appears to play divergent roles in relation to estradiol —
reducing risk in the endometrium but potentially increasing it in the breast. In a case-cohort study nested within the Breast and Bone Follow-up to the Fracture Intervention Trial examining endometrial cancer incidence in relation to progesterone to estradiol ratio in postmenopausal women during a 12-year follow-up, Trabert et al. (2021) [
5] reported that postmenopausal women with high estradiol and low progesterone had the highest risk of developing endometrial cancer, while those with higher progesterone levels exhibited reduced risk. In contrast, in the same cohort,
analysis of 405 incident breast cancer cases revealed that elevated progesterone concentrations were associated with an
increased risk of invasive breast cancer—particularly when estradiol levels were also high [
14]. Together, these data underscore the complex and tissue-specific roles of progesterone in hormone-sensitive cancers—
exerting protective effects in the endometrium while potentially promoting tumorigenesis in the breast. Indeed, although both clinical and epidemiological studies support a synergistic role of estradiol and progesterone in elevating breast cancer risk,
disentangling their individual contributions remains challenging due to the partial dependence of progesterone receptor transcription on estrogen receptor α–mediated signaling [
15].
Thus, evaluating their combined hormonal interaction may be more informative than attempting to isolate independent effects [
15].
The divergent role of progesterone in endometrial versus breast cancer risk underscores the importance of contextualizing hormonal balance within specific biological outcomes and disease pathways, reinforcing the need for mechanistically informed, tissue-targeted research in postmenopausal women. In this regard, the present study’s feature-level SHAP modeling offers a framework for disentangling the nuanced, context-dependent effects of individual hormones. As an example, the analysis of waist circumference revealed how central adiposity may elevate estradiol and reduce progesterone in a threshold-dependent manner (i.e., waist circumference >100 cm) to promote endometrial proliferation while potentially mitigating breast carcinogenesis.
Despite offering valuable insights into the determinants of the P4:E2 ratio, several limitations should be noted. The cross-sectional nature of the NHANES dataset limits causal inference, as the temporal ordering between predictors and hormone levels cannot be established. Although the P4:E2 model explained a moderate proportion of variance (R2 = 0.298), the progesterone model demonstrated low predictive performance (R2 = 0.022), suggesting that relevant biological or behavioral variables may be unmeasured or inadequately captured. Finally, residual confounding remains a concern, as unmeasured factors such as stress and circadian timing could influence hormonal dynamics.