Submitted:
28 April 2025
Posted:
29 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods
2.1. Study Design and Setting
2.2. Study Population
2.3. Inclusion and Exclusion Criteria
- Cases: Women with confirmed histopathological diagnoses of breast cancer. The exclusion criteria included a history of other malignancies or incomplete medical records.
- Controls: Women without a history of cancer. The exclusion criteria were malignancies or reproductive health conditions that could confound the analysis.
2.4. Data Collection
3. Variables Assessed
3.1. Primary Outcome Variable
3.2. Independent Variables
- Socio-Demographic Factors: Age, education level (0–5 years, 6–12 years, 13–20 years), residential area (rural, urban, metropolitan), and employment status (unemployed, employed).
- Reproductive Factors: Age at menarche, age at first marriage, age at first childbirth, gaps between menarche and first childbirth, and first marriage and first childbirth; parity (categorized as 0, 1, or >1 child), type of delivery (vaginal, cesarean, or both), menstrual regularity (regular or irregular), menopausal status (premenopausal or postmenopausal) and abortion history (categorized as no abortion, 1, or ≥2 abortions).
- Anthropometric Measures: Body mass index (BMI) was calculated using height and weight measurements and categorized as underweight (<18.5 kg/m²), normal (18.5–24.9 kg/m²), and obese (≥25 kg/m²) based on WHO guidelines.
3.3. Experimental Design
3.4. Data Preprocessing and Model Training
3.5. Explainability of ML Models
- Global Feature Importance with Confidence Intervals: Shapley values were averaged across all data points to compute the global feature importance for each subtype for a comprehensive view of how individual features contributed to the subtypes of cancer in our dataset. These averages, along with 95% confidence intervals, were plotted to identify the features that significantly influenced the prediction of HR+ and TNBC cases. Values below zero indicate a negative contribution to the prediction, whereas values above zero signify a positive contribution. Significance testing was conducted using the Wilcoxon signed-rank test, and features with a significance greater than zero (p-value < 0.05) for each subtype were highlighted.
- Comparative Analysis of Feature Importance: To further evaluate the relative contributions of features, we conducted pairwise comparisons of Shapley values for HR+ and TNBC cases. Features were grouped into three categories based on their importance: those with higher predictive relevance for TNBC (μ_TN > μ_HR+), those with higher predictive relevance for HR+ (μ_TN < μ_HR+), and those with equal predictive relevance for both subtypes (μ_TN = μ_HR+).
- Visualization of Trends in Numeric Features: Shapley values for numeric features were plotted against their feature values to capture trends and patterns of predictive importance. For each numeric feature, scatterplots were created, with individual Shapley values represented as points and smoothed trend lines indicating the average impact of the feature values on the predictions.
3.6. Ethical Considerations
4. Results
| Breast Cancer Subtypes | ||||
| Features | No Cancer | HR+ | TNBC | P-value |
| Breast Cancer Subtype | 443 (100%) | 246 (100%) | 240 (100%) | |
| Residential Area (n (%)) | ||||
| Rural | 98 (22.12%) | 188 (76.42%) | 146 (60.83%) | 0.000 |
| Urban | 107 (24.15%) | 48 (19.51%) | 80 (33.33%) | 0.000 |
| Metropolitan | 238 (53.72%) | 10 (4.07%) | 14 (5.83%) | 0.000 |
| Employment Status | ||||
| Unemployed | 351 (79.23%) | 241 (97.97%) | 224 (93.33%) | 0.000 |
| Employed | 92 (20.77%) | 5 (2.03%) | 16 (6.67%) | 0.000 |
| Education Level | ||||
| 0-5 Years | 89 (20.09%) | 198 (80.49%) | 190 (79.17%) | 0.000 |
| 6-12 Years | 243 (54.85%) | 37 (15.04%) | 37 (15.42%) | 0.000 |
| 13-20 Years | 111 (25.06%) | 11 (4.47%) | 13 (5.42%) | 0.000 |
| Parity | ||||
| <=1 | 254 (57.34%) | 131 (53.25%) | 119 (49.58%) | 0.142 |
| > 1 | 189 (42.66%) | 115 (46.75%) | 121 (50.42%) | 0.142 |
| Type of Delivery | ||||
| Normal Vaginal Delivery (NVD) | 258 (58.24%) | 188 (76.42%) | 163 (67.92%) | 0.000 |
| Cesarean Section (CS) | 144 (32.51%) | 31 (12.6%) | 25 (10.42%) | 0.000 |
| Both Type (CS+NVD) | 41 (9.26%) | 27 (10.98%) | 52 (21.67%) | 0.000 |
| Type of Menstruation | ||||
| Regular | 369 (83.3%) | 171 (69.51%) | 161 (67.08%) | 0.000 |
| Irregular | 74 (16.7%) | 75 (30.49%) | 79 (32.92%) | 0.000 |
| Menstrual Status | ||||
| Premenopausal | 276 (62.3%) | 130 (52.85%) | 98 (40.83%) | 0.000 |
| Postmenopausal | 167 (37.7%) | 116 (47.15%) | 142 (59.17%) | 0.000 |
| Abortion | ||||
| No | 339 (76.52%) | 148 (60.16%) | 177 (73.75%) | 0.000 |
| Yes = 1 | 89 (20.09%) | 84 (34.15%) | 41 (17.08%) | 0.000 |
| Yes > = 2 | 15 (3.39%) | 14 (5.69%) | 22 (9.17%) | 0.000 |
| Body Mass Index (BMI) | ||||
| Healthy | 138 (31.15%) | 107 (43.5%) | 98 (40.83%) | 0.000 |
| Obese | 298 (67.27%) | 127 (51.63%) | 127 (52.92%) | 0.000 |
| Undernutrition | 7 (1.58%) | 12 (4.88%) | 15 (6.25%) | 0.000 |
| Age (Mean (SD)) | 41.05 (11.4) | 43.00 (10.4) | 45.04 (10.2) | 0.000 |
| Age at First Marriage | 17.73 (3.54) | 17.46 (3.00) | 18.55 (3.19) | 0.001 |
| Age at First Baby | 19.80 (3.82) | 19.57 (3.29) | 20.91 (3.65) | 0.000 |
| Age at Menarche | 12.64 (1.09) | 12.05 (1.21) | 13.03 (0.37) | 0.000 |
| Gap Between Menarche and First Baby | 7.17 (3.84) | 7.52 (3.26) | 7.88 (3.59) | 0.048 |
| Gap Between First Marriage and First Baby | 2.07 (1.71) | 2.11 (1.42) | 2.36(1.78) | 0.082 |
4.1. Study Population Characteristics
4.2. Model Comparison

4.3. Feature Importance and Comparative Analysis

5. Discussion
5.1. Socioeconomic Disparities and Breast Cancer Risk
5.2. Reproductive Health and Breast Cancer Subtypes
6. Challenges and Policy Recommendations
7. Future Research Directions
8. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hamid F, Roy T. Unveiling Sociocultural Barriers to Breast Cancer Awareness Among the South Asian Population: Case Study of Bangladesh and West Bengal, India. JMIR Hum Factors. 2025 Jan 10;12:e53969. [CrossRef]
- Urbanization in Bangladesh The Prevalence of Breast Cancer Brings Unique Challenges - The ASCO Post [Internet]. [cited 2024 Mar 11]. Available from: https://ascopost.com/issues/october-25-2021/urbanization-in-bangladesh-the-prevalence-of-breast-cancer-brings-unique-challenges/.
- Wilkinson L, Gathani T. Understanding breast cancer as a global health concern. Br J Radiol. 2022 Feb 1;95(1130):20211033. [CrossRef]
- Hossain MS, Ferdous S, Karim-Kos HE. Breast cancer in South Asia: A Bangladeshi perspective. Cancer Epidemiology. 2014 Oct 1;38(5):465–70. [CrossRef]
- Ma H, Bernstein L, Pike MC, Ursin G. Reproductive factors and breast cancer risk according to joint estrogen and progesterone receptor status: a meta-analysis of epidemiological studies. Breast Cancer Res. 2006;8(4):R43. [CrossRef]
- Bui OT, Tran HT, Nguyen SM, Dao TV, Bui QV, Pham AT, et al. Menstrual and Reproductive Factors in Association With Breast Cancer Risk in Vietnamese Women: A Case-Control Study. Cancer Control : Journal of the Moffitt Cancer Center. 2022 Nov 12;29:10732748221140206. [CrossRef]
- Xie F, Liu L, Yang H, Liu M, Wang S, Guo J, et al. The Impact of Reproductive Factors on the Risk of Breast Cancer by ER/PR and HER2: A Multicenter Case-Control Study in Northern and Eastern China. Oncologist. 2022 Jan 28;27(1):e1–8. [CrossRef]
- Linnenbringer E, Geronimus AT, Davis KL, Bound J, Ellis L, Gomez SL. Associations between breast cancer subtype and neighborhood socioeconomic and racial composition among Black and White women. Breast Cancer Res Treat. 2020;180(2):437–47. [CrossRef]
- Bae SY, Kim S, Lee JH, Lee H chul, Lee SK, Kil WH, et al. Poor prognosis of single hormone receptor- positive breast cancer: similar outcome as triple-negative breast cancer. BMC Cancer. 2015 Mar 18;15:138. [CrossRef]
- Lehrer S, Green S, Rosenzweig KE. Affluence and Breast Cancer. Breast J. 2016 Sep;22(5):564–7. [CrossRef]
- Beaujouan É, Sobotka T. Late childbearing continues to increase in developed countries. Population & Societies. 2019 Oct 9;562(1):1–4.
- World Contraceptive Use | Population Division [Internet]. [cited 2025 Jan 31]. Available from: https://www.un.org/development/desa/pd/data/world-contraceptive-use.
- Aysola K, Desai A, Welch C, Xu J, Qin Y, Reddy V, et al. Triple Negative Breast Cancer – An Overview. Hereditary Genet. 2013;2013(Suppl 2):001. [CrossRef]
- Zagami P, Carey LA. Triple negative breast cancer: Pitfalls and progress. npj Breast Cancer. 2022 Aug 20;8(1):1–10. [CrossRef]
- Francies FZ, Hull R, Khanyile R, Dlamini Z. Breast cancer in low-middle income countries: abnormality in splicing and lack of targeted treatment options. Am J Cancer Res. 2020 May 1;10(5):1568–91.
- Islam MdS, Hussain MdA, Islam S, Mahumud RA, Biswas T, Islam SMS. Age at menarche and its socioeconomic determinants among female students in an urban area in Bangladesh. Sexual & Reproductive Healthcare. 2017 Jun 1;12:88–92. [CrossRef]
- Sundararajan M, Najmi A. The Many Shapley Values for Model Explanation. In: Proceedings of the 37th International Conference on Machine Learning [Internet]. PMLR; 2020 [cited 2025 Apr 17]. p. 9269–78. Available from: https://proceedings.mlr.press/v119/sundararajan20b.html.
- Momenimovahed Z, Salehiniya H. Epidemiological characteristics of and risk factors for breast cancer in the world. Breast Cancer (Dove Med Press). 2019 Apr 10;11:151–64. [CrossRef]
- Song M, Huang X, Wei X, Tang X, Rao Z, Hu Z, et al. Spatial patterns and the associated factors for breast cancer hospitalization in the rural population of Fujian Province, China. BMC Women’s Health. 2023 May 9;23(1):247. [CrossRef]
- Where You Live, Work, Learn and Play Can Affect Breast Health - HealthyWomen [Internet]. [cited 2025 Mar 13]. Available from: https://www.healthywomen.org/your-care/social-determinents-of-health.
- Sen KK, Nilima S, Zahura FT, Bari W. Do education and living standard matter in breaking barriers to healthcare access among women in Bangladesh? BMC Public Health. 2023 Jul 26;23(1):1431. [CrossRef]
- Neuhouser MarianL, Aragaki AK, Prentice RL, Manson JE, Chlebowski R, Carty CL, et al. Overweight, Obesity and Postmenopausal Invasive Breast Cancer Risk. JAMA Oncol. 2015 Aug;1(5):611–21. [CrossRef]
- Rakoczy K, Kaczor J, Sołtyk A, Jonderko L, Sędzik M, Lizon J, et al. Pregnancy, abortion, and birth control methods’ complicity with breast cancer occurrence. Molecular and Cellular Endocrinology. 2024 Sep 1;590:112264. [CrossRef]
- Horn J, Vatten LJ. Reproductive and hormonal risk factors of breast cancer: a historical perspective. Int J Womens Health. 2017 Apr 27;9:265–72. [CrossRef]
- Lambertini M, Santoro L, Del Mastro L, Nguyen B, Livraghi L, Ugolini D, et al. Reproductive behaviors and risk of developing breast cancer according to tumor subtype: A systematic review and meta-analysis of epidemiological studies. Cancer Treat Rev. 2016 Sep;49:65–76. [CrossRef]
- Anderson KN, Schwab RB, Martinez ME. Reproductive Risk Factors and Breast Cancer Subtypes: A Review of the Literature. Breast Cancer Res Treat. 2014 Feb;144(1):1–10. [CrossRef]
- Menarche, menopause, and breast cancer risk: individual participant meta-analysis, including 118 964 women with breast cancer from 117 epidemiological studies. Lancet Oncol. 2012 Nov;13(11):1141–51. [CrossRef]


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).