Submitted:
04 June 2025
Posted:
04 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods and Materials
2.1. Data Source
2.2. Data Preprocessing
- (1)
- Terminology standardization: Clinical descriptions and diagnoses were normalized to a consistent vocabulary. In the raw EMRs, the same concept could be documented with varying terms or abbreviations. For example, “Polycystic Ovary Syndrome” was recorded as “Polycystic ovarian syndrome”, “PCOS”, “Poly-ovary syndrome”, etc., depending on the doctor’s style. To address this, all symptom and diagnosis labels were mapped to a unified terminology, effectively harmonizing synonyms and abbreviations into one standard descriptor. This normalization ensured that each clinical concept, such as hyperprolactinaemia, was represented uniformly across all records, preventing duplicate features arising from naming variations.
- (2)
- Record filtering and de-duplication: Duplicate entries and records with substantial missing or incomplete data were removed. It is common for secondary-use healthcare data to contain duplicate records or omissions, which can bias the results. Any repeated patient entries, as well as cases lacking critical fields, such as outcome or key diagnoses, were identified and removed. This step improved data integrity and reliability by focusing the analysis on unique, complete cases.
- (3)
- One-hot encoding: One-hot encoding is a standard data transformation technique for converting categorical data into a numeric matrix for machine learning [9]. In practice, each distinct feature value, such as primary infertility, secondary infertility, or different stimulation protocols, was turned into its own column with a binary value, where the value “1” indicates the presence of that attribute in a given record and “0” indicates its absence. This encoding resulted in a Boolean feature matrix where each row corresponds to a patient’s IVF cycle, and each column corresponds to the presence of a specific condition or attribute. Continuous variables such as age, body mass index (BMI), and duration of infertility were discretized into clinically meaningful ranges before encoding, according to the standard of the World Health Organization [10,11].
2.3. Feature Selection
2.4. Association Rule Mining
- (1)
- Frequent itemset generation: The one-hot-encoded dataset was used to compute the frequent itemset. Apriori systematically explores combinations of items (features) and counts their occurrences in the one-hot-encoded dataset generated in the data preprocessing stage. A minimum support threshold of 0.05 was set, meaning that an itemset had to appear in at least 5% of patient records to be considered frequent, balancing the need to find non-trivial patterns with the desire to ignore extremely rare combinations and ensuring that any reported association involves a patient subset of meaningful size in the cohort.
- (2)
- Association rule extraction: Upon the identification of frequent itemsets, the association rules function commenced the calculation of rule metrics. A minimum confidence threshold of 0.60 was applied to filter the rules. Confidence measures the conditional probability of the consequent given the antecedent. It was utilized to assess the likelihood of pregnancy under specific conditions. A rule’s confidence was required to meet or exceed 60% for consideration in this study, ensuring that, among patients exhibiting the antecedent feature set, at least 60% achieved the expected result. Furthermore, a criterion of lift > 1.0 was imposed for all rules. Lift, defined as the ratio of the rule’s confidence to the baseline probability of the consequent, indicates the degree to which the antecedent and outcome co-occur more frequently than expected by chance. The lift > 1 requirement guaranteed that any reported rule represented a positive association, signifying an improvement over random chance in predicting pregnancy.
2.5. Computational Environment
3. Results
3.1. Descriptive Statistics
3.1.1. Clinical Pregnancy Outcomes
3.1.2. Demographic and Clinical Characteristics
3.1.3. Comparison of Pregnancy Outcomes by Demographic and Clinical Groups
3.1.4. Correlation of Features with Pregnancy Outcome
- (1)
- Bilateral tubal obstruction exhibited the strongest negative correlation (−0.064), aligning with the clinical understanding of impaired fertility associated with significant tubal pathology.
- (2)
- Pelvic adhesion (0.048) and adnexal mass (undiagnosed) (0.045) demonstrated small positive correlation with pregnancy success, an unexpected finding possibly influenced by confounding clinical management practices.
- (3)
- Habitual abortion (−0.033) and secondary infertility (−0.024) negatively correlated with pregnancy success, confirming their relevance as adverse prognostic factors.
- (4)
- Other factors, including BMI, insulin resistance, hypertension, and age, exhibited negligible or minimal correlations, suggesting limited predictive value when assessed individually in this cohort.
3.2. Association Rule Mining Outcomes
3.2.1. Overall Rule Summary
- (1)
- The most frequent clinical features associated with IVF failure were luteinized unruptured follicle syndrome (LUFS) and secondary infertility, each identified in 13 instances.
- (2)
- Luteinized unruptured follicle syndrome (LUFS) had a high support rate (79.67%), indicating its high prevalence, although it demonstrated a relatively weak lift (1.0017), suggesting limited discriminative power when considered alone.
- (3)
- Bilateral tubal obstruction exhibited the strongest association with clinical pregnancy failure, showing the highest lift value (1.0422) and a confidence of 85.25%, emphasizing its significance as a risk factor.
- (4)
- Other important features, such as years of infertility >5 and BMI >24, demonstrated moderate frequencies and lifts (1.0104 and 1.0196, respectively), indicating their meaningful, though more limited, contributions as individual predictors.
- (5)
- Pelvic adhesion lacked calculated metrics in this analysis, limiting the interpretation of its independent predictive strength.
3.2.2. Top-Ranked Association Rules
- (1)
- Bilateral tubal obstruction appears in 7 of the top 10 rules, underscoring its centrality as a structural impediment to successful implantation or embryo transport.
- (2)
- Luteinized unruptured follicle syndrome (LUFS) frequently co-occurs with both tubal factors and metabolic risks (e.g., BMI > 24), suggesting that ovarian dysfunction and metabolic dysregulation synergistically compromise reproductive outcomes.
- (3)
- Secondary infertility and prolonged infertility (>5 years) repeatedly emerge in multifactorial rule sets, reflecting the compounded difficulty of achieving pregnancy in patients with a history of prior conception failure.
4. Discussion
4.1. Comparative Analysis
4.1.1. Comparison with Previous Literature or Known Clinical Evidence
4.1.2. Unexpected Findings in the Current Analysis
4.2. The Strengths and Innovation of This Study
4.3. Limitations of This Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Azziz, R.; Carmina, E.; Chen, Z.; Dunaif, A.; Laven, J.S.; Legro, R.S.; Lizneva, D.; Natterson-Horowtiz, B.; Teede, H.J.; Yildiz, B.O. Polycystic ovary syndrome. Nat. Rev. Dis. Primer 2016, 2, 1–18.
- Bozdag, G.; Mumusoglu, S.; Zengin, D.; Karabulut, E.; Yildiz, B.O. The prevalence and phenotypic features of polycystic ovary syndrome: a systematic review and meta-analysis. Hum. Reprod. 2016, 31, 2841–2855. [CrossRef]
- Sunkara, S.K.; Rittenberg, V.; Raine-Fenning, N.; Bhattacharya, S.; Zamora, J.; Coomarasamy, A. Association between the number of eggs and live birth in IVF treatment: an analysis of 400 135 treatment cycles. Hum. Reprod. 2011, 26, 1768–1774. [CrossRef]
- McGee, E.A.; Hsueh, A.J. Initial and cyclic recruitment of ovarian follicles. Endocr. Rev. 2000, 21, 200–214. [CrossRef]
- Dumesic, D.A.; Oberfield, S.E.; Stener-Victorin, E.; Marshall, J.C.; Laven, J.S.; Legro, R.S. Scientific statement on the diagnostic criteria, epidemiology, pathophysiology, and molecular genetics of polycystic ovary syndrome. Endocr. Rev. 2015, 36, 487–525. [CrossRef]
- Kavakiotis, I.; Tsave, O.; Salifoglou, A.; Maglaveras, N.; Vlahavas, I.; Chouvarda, I. Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 2017, 15, 104–116. [CrossRef]
- Agrawal, R.; Imieliński, T.; Swami, A. Mining association rules between sets of items in large databases. In Proceedings of the Proceedings of the 1993 ACM SIGMOD international conference on Management of data; ACM: Washington D.C. USA, 1993; pp. 207–216.
- Wu, W.-T.; Li, Y.-J.; Feng, A.-Z.; Li, L.; Huang, T.; Xu, A.-D.; Lyu, J. Data mining in clinical big data: the frequently used databases, steps, and methodological models. Mil. Med. Res. 2021, 8, 44. [CrossRef]
- Samuels, J. In One-Hot Encoding and Two-Hot Encoding: An Introduction; 2024.
- World Health Organization Standards for Maternal and Neonatal Care 2007.
- World Health Organization WHO recommendations on antenatal care for a positive pregnancy experience 2016.
- Ombelet, W. WHO fact sheet on infertility gives hope to millions of infertile couples worldwide. Facts Views Vis. ObGyn 2020, 12, 249.
- World Health Organization Infertility prevalence estimates, 1990–2021; World Health Organization, 2023; ISBN 92-4-006831-7.
- Coussa, A.; Hasan, H.A.; Barber, T.M. Impact of contraception and IVF hormones on metabolic, endocrine, and inflammatory status. J. Assist. Reprod. Genet. 2020, 37, 1267–1272. [CrossRef]
- Herman, T.; Csehely, S.; Orosz, M.; Bhattoa, H.P.; Deli, T.; Torok, P.; Lagana, A.S.; Chiantera, V.; Jakab, A. Impact of Endocrine Disorders on IVF Outcomes: Results from a Large, Single-Centre, Prospective Study. Reprod. Sci. 2023, 30, 1878–1890. [CrossRef]
- Vannuccini, S.; Clifton, V.L.; Fraser, I.S.; Taylor, H.S.; Critchley, H.; Giudice, L.C.; Petraglia, F. Infertility and reproductive disorders: impact of hormonal and inflammatory mechanisms on pregnancy outcome. Hum. Reprod. Update 2016, 22, 104–115. [CrossRef]
- Wang, L.; Yu, X.; Xiong, D.; Leng, M.; Liang, M.; Li, R.; He, L.; Yan, H.; Zhou, X.; Jike, E.; et al. Hormonal and metabolic influences on outcomes in PCOS undergoing assisted reproduction: the role of BMI in fresh embryo transfers. BMC Pregnancy Childbirth 2025, 25, 368. [CrossRef]
- Harrison, R.F.; Bonnar, J.; Thompson, W. Diagnosis and Management of Tubo-Uterine Factors in Infertility; Springer Science & Business Media, 2012; Vol. 4;.
- Ozgur, K.; Bulut, H.; Berkkanoglu, M.; Coetzee, K.; Kaya, G. ICSI pregnancy outcomes following hysteroscopic placement of Essure devices for hydrosalpinx in laparoscopic contraindicated patients. Reprod. Biomed. Online 2014, 29, 113–118. [CrossRef]
- Qiu, J.; Du, T.; Chen, C.; Lyu, Q.; Mol, B.W.; Zhao, M.; Kuang, Y. Impact of uterine malformations on pregnancy and neonatal outcomes of IVF/ICSI–frozen embryo transfer. Hum. Reprod. 2022, 37, 428–446. [CrossRef]
- Tournaye, H. Male factor infertility and ART. Asian J. Androl. 2011, 14, 103. [CrossRef]
- Solanki, S.K.; Patel, J.T. A survey on association rule mining. In Proceedings of the 2015 fifth international conference on advanced computing & communication technologies; IEEE, 2015; pp. 212–216.
- Altaf, W.; Shahbaz, M.; Guergachi, A. Applications of association rule mining in health informatics: a survey. Artif. Intell. Rev. 2017, 47, 313–340. [CrossRef]
- Pradhan, G.N.; Prabhakaran, B. Association Rule Mining in Multiple, Multidimensional Time Series Medical Data. J. Healthc. Inform. Res. 2017, 1, 92–118. [CrossRef]
- Al-Maolegi, M.; Arkok, B. An improved Apriori algorithm for association rules. ArXiv Prepr. ArXiv14033948 2014. [CrossRef]
- Hegland, M. THE APRIORI ALGORITHM – A TUTORIAL. In Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore; WORLD SCIENTIFIC, 2007; Vol. 11, pp. 209–262 ISBN 978-981-270-905-9.
- Melo, A.S.; Ferriani, R.A.; Navarro, P.A. Treatment of infertility in women with polycystic ovary syndrome: approach to clinical practice. Clinics 2015, 70, 765–769. [CrossRef]
- Ou, H.; Sun, J.; Lin, L.; Ma, X. Ovarian Response, Pregnancy Outcomes, and Complications Between Salpingectomy and Proximal Tubal Occlusion in Hydrosalpinx Patients Before in vitro Fertilization: A Meta-Analysis. Front. Surg. 2022, 9, 830612. [CrossRef]
- Capmas, P.; Suarthana, E.; Tulandi, T. Management of Hydrosalpinx in the Era of Assisted Reproductive Technology: A Systematic Review and Meta-analysis. J. Minim. Invasive Gynecol. 2021, 28, 418–441. [CrossRef]
- Xu, B.; Zhang, Q.; Zhao, J.; Wang, Y.; Xu, D.; Li, Y. Pregnancy outcome of in vitro fertilization after Essure and laparoscopic management of hydrosalpinx: a systematic review and meta-analysis. Fertil. Steril. 2017, 108, 84-95.e5. [CrossRef]
- Zhang, L.; Cai, H.; Li, W.; Tian, L.; Shi, J. Duration of infertility and assisted reproductive outcomes in non-male factor infertility: can use of ICSI turn the tide? BMC Womens Health 2022, 22, 480. [CrossRef]
- Huang, C.; Shi, Q.; Xing, J.; Yan, Y.; Shen, X.; Shan, H.; Sun, H.; Mei, J. The relationship between duration of infertility and clinical outcomes of intrauterine insemination for younger women: a retrospective clinical study. BMC Pregnancy Childbirth 2024, 24, 199. [CrossRef]
- Wang, X.; Tian, P.; Zhao, Y.; Lu, J.; Dong, C.; Zhang, C. The association between female age and pregnancy outcomes in patients receiving first elective single embryo transfer cycle: a retrospective cohort study. Sci. Rep. 2024, 14, 19216. [CrossRef]
- Alenezi, S.A.; Khan, R.; Amer, S. The Impact of High BMI on Pregnancy Outcomes and Complications in Women with PCOS Undergoing IVF—A Systematic Review and Meta-Analysis. J. Clin. Med. 2024, 13, 1578. [CrossRef]
- Azmoodeh, A.; Pejman Manesh, M.; Akbari Asbagh, F.; Ghaseminejad, A.; Hamzehgardeshi, Z. Effects of Letrozole-HMG and Clomiphene-HMG on Incidence of Luteinized Unruptured Follicle Syndrome in Infertile Women Undergoing Induction Ovulation and Intrauterine Insemination: A Randomised Trial. Glob. J. Health Sci. 2015, 8, 244. [CrossRef]
- Li, S.; Liu, L.; Meng, T.; Miao, B.; Sun, M.; Zhou, C.; Xu, Y. Impact of luteinized unruptured follicles on clinical outcomes of natural cycles for frozen/thawed blastocyst transfer. Front. Endocrinol. 2021, 12, 738005. [CrossRef]
- Liu, S.; Mo, M.; Xiao, S.; Li, L.; Hu, X.; Hong, L.; Wang, L.; Lian, R.; Huang, C.; Zeng, Y.; et al. Pregnancy Outcomes of Women With Polycystic Ovary Syndrome for the First In Vitro Fertilization Treatment: A Retrospective Cohort Study With 7678 Patients. Front. Endocrinol. 2020, 11, 575337. [CrossRef]
- Dybciak, P.; Humeniuk, E.; Raczkiewicz, D.; Krakowiak, J.; Wdowiak, A.; Bojar, I. Anxiety and Depression in Women with Polycystic Ovary Syndrome. Medicina (Mex.) 2022, 58, 942. [CrossRef]
- Rakic, D.; Joksimovic Jovic, J.; Jakovljevic, V.; Zivkovic, V.; Nikolic, M.; Sretenovic, J.; Nikolic, M.; Jovic, N.; Bicanin Ilic, M.; Arsenijevic, P.; et al. High Fat Diet Exaggerate Metabolic and Reproductive PCOS Features by Promoting Oxidative Stress: An Improved EV Model in Rats. Medicina (Mex.) 2023, 59, 1104. [CrossRef]
- Kusuhara, S.; Kishimoto-Kishi, M.; Matsumiya, W.; Miki, A.; Imai, H.; Nakamura, M. Short-Term Outcomes of Intravitreal Faricimab Injection for Diabetic Macular Edema. Medicina (Mex.) 2023, 59, 665. [CrossRef]







| Domain | Extracted Variables | Reference |
|---|---|---|
| Demographics | Female age: discretized ≤ 35, >35 | [10,11] |
| Anthropometry | Female BMI (kg m⁻2): discretized ≤ 2, >24 | |
| Reproductive history | Years of infertility: discretized ≤ 5, >5; infertility type: primary/secondary | [12,13] |
| Hormonal and metabolic disorders | Hyperprolactinemia, sub-clinical hypothyroidism, insulin resistance | [14,15,16,17] |
| Tubo-uterine factors | Unilateral tubal obstruction, bilateral tubal obstruction, hydrosalpinx, pelvic adhesion, intra-uterine adhesion | [18,19]s |
| Ovarian and endocrine factors | Luteinized unruptured follicle syndrome (LUFS), ovarian cysts | [18,19] |
| Uterine malformations | Septate uterus, uterine fibroids (leiomyomas), adenomyosis | [20] |
| ART treatment data | Stimulation protocol (categorical), number of oocytes retrieved (binned), fertilization method (IVF vs. ICSI) | [21] |
| Outcome | Clinical pregnancy: 1 = success, 0 = failure |
| Category | Specification | Purpose |
|---|---|---|
| Programming language | Python 3.12 | Core scripting/analysis |
| Interactive IDE | Jupyter Notebook | Reproducible, stepwise workflow |
| Key libraries | pandas, scikit-learn, mlxtend | Data wrangling; preprocessing; Apriori and rule generation |
| Plotting | matplotlib, seaborn | Descriptive and network visualization |
| Hardware | 2 × Intel Xeon (48 physical cores), 128 GB RAM | Parallel support-counting and rule filtering |
| Operating system | Ubuntu 22.04 LTS | Stable Linux environment for multi-threaded tasks |
| Itemset | Frequency | Support | Confidence | Lift |
|---|---|---|---|---|
| Luteinized Unruptured Follicle Syndrome (LUFS) | 13 | 0.7967 | 0.8194 | 1.0017 |
| Secondary Infertility | 13 | 0.2505 | 0.8251 | 1.0086 |
| Pelvic Adhesion | 7 | - | - | - |
| Years of Infertility > 5 | 7 | 0.1751 | 0.8266 | 1.0104 |
| Bilateral Tubal Obstruction | 6 | 0.3122 | 0.8525 | 1.0422 |
| BMI > 24 | 4 | 0.1711 | 0.8341 | 1.0196 |
| Age > 35 | 4 | 0.1502 | 0.8234 | 1.0065 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).