Submitted:
23 October 2024
Posted:
24 October 2024
You are already at the latest version
Abstract
The objective was to explore the contributions and limitations of structure learning methods within an epidemiological analysis of real-world data. The specific aim was to use these networks to identify determinants of access to healthcare among various social factors. We analyzed data from the 2010 wave of the SIRS cohort, which included a sample of 3,006 adults from the Paris region, France. Healthcare utilization, encompassing both direct and indirect access, was the primary outcome. Candidate determinants included health status, demographic characteristics, and socio-cultural and economic positions. We employed a dual approach: a non-automated epidemiological method (initial expert-knowledge network and logistic regression models) and structure-learning techniques based on several algorithms, with and without knowledge con-straints. We compared the results based on the presence, direction, and strength of specific links within the produced network. Although the interdependencies and relative strengths identified by approaches were similar, the structure-learning algorithms detected fewer associations with the outcome than the non-automated method. Relationships between variables were sometimes incorrectly oriented when using a purely data-driven approach. Structure learning algorithms can be valuable in exploratory stages, helping to generate new hypotheses or mining novel databases. However, results should be validated against prior knowledge and supplemented with additional confirmatory analyses.
Keywords:
1. Introduction
2. Materials and Methods
2.1. Population
2.2. Measures
2.3. Statistical analysis
2.3.1. “Non-Automated Approach”
2.3.2. Structure Learning Approaches
- Score-based algorithms: These identify the network that maximizes a score function reflecting how well the network fits the data [22]. We used the Hill Climbing algorithm with a BIC score (Bayesian Information Criteria) from this category.
2.3.3. Comparison of Approaches
2.4. Ethical Approval
3. Results
3.1. Description of Population
3.2. Non-automated Epidemiological Approach
3.2.1. Direct Access to Care
3.2.2. Indirect Access to Care
3.3. Structure Learning Approaches
3.3.1. Direct Access to Care
With knowledge constraints
3.3.2. Indirect Access to Care
| Only data-driven learning | Constrained learning | ||||||
|---|---|---|---|---|---|---|---|
| Link1 | Direction2 | Strength3 | Link1 | Direction2 | Strength3 | ||
| Hill-climbing | |||||||
| Age | No | - | - | No | - | - | |
| Gender | Yes | From DAC (70%) | 100% | Yes | To DAC (100%) | 100% | |
| Origin | No | - | - | No | - | - | |
| Education level | No | - | - | No | - | - | |
| Employment status | No | - | - | No | - | - | |
| Income | No | - | - | No | - | - | |
| Health insurance status | Yes | To DAC (86%) | 86% | Yes | To DAC (72%) | 69% | |
| Health relatives | No | - | - | No | - | - | |
| Social integration | No | - | - | No | - | - | |
| Chronic Disease | Yes | To DAC (98%) | 98% | Yes | To DAC (99%) | 99% | |
| Perceived health status | Yes | To DAC (88%) | 47% | Yes | To DAC (73%) | 26% | |
| Interleaved Incremental Association | |||||||
| Age | No | - | - | No | - | - | |
| Gender | Yes | From DAC (74%) | 95% | Yes | To DAC (100%) | 95% | |
| Origin | No | - | - | No | - | - | |
| Education level | No | - | - | No | - | - | |
| Employment status | No | - | - | No | - | - | |
| Income | No | - | - | No | - | - | |
| Health insurance status | Yes | To DAC (77%) | 24% | Yes | To DAC (69%) | 24% | |
| Health relatives | No | - | - | No | - | - | |
| Social integration | No | - | - | No | - | - | |
| Chronic Disease | Yes | From DAC (83%) | 59% | Yes | From DAC (81%) | 59% | |
| Perceived health status | No | - | - | No | - | - | |
| ARACNE | |||||||
| Age | No | - | - | ||||
| Gender | Yes | - | 100% | ||||
| Origin | No | - | - | ||||
| Education level | No | - | - | ||||
| Employment status | No | - | - | ||||
| Income | No | - | - | ||||
| Health insurance status | Yes | - | 40% | ||||
| Health relatives | No | - | - | ||||
| Social integration | No | - | - | ||||
| Chronic Disease | Yes | - | 97% | ||||
| Perceived health status | No | - | - | ||||
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bi, Q.; E Goodman, K.; Kaminsky, J.; Lessler, J. What is Machine Learning? A Primer for the Epidemiologist. Am. J. Epidemiology 2019, 188, 2222–2239. [Google Scholar] [CrossRef] [PubMed]
- Kino, S.; Hsu, Y.-T.; Shiba, K.; Chien, Y.-S.; Mita, C.; Kawachi, I.; Daoud, A. A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects. SSM - Popul. Heal. 2021, 15, 100836. [Google Scholar] [CrossRef]
- Hernán, M.A.; Hernández-Díaz, S.; Werler, M.M. Causal Knowledge as a Prerequisite for Confounding Evaluation: An Application to Birth Defects Epidemiology. Am. J. Epidemiology 2002, 155, 176–184. [Google Scholar] [CrossRef] [PubMed]
- Glymour, C.; Zhang, K.; Spirtes, P. Review of Causal Discovery Methods Based on Graphical Models. Front. Genet. 2019, 10, 524. [Google Scholar] [CrossRef]
- Peters, J.; Janzing, D.; Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, 2017. [Google Scholar]
- Laan MJ van der, Rose S. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Science & Business Media; 2011.
- Ahern, J.; Karasek, D.; Luedtke, A.R.; Bruckner, T.A.; van der Laan, M.J. Racial/Ethnic Differences in the Role of Childhood Adversities for Mental Disorders Among a Nationally Representative Sample of Adolescents. Epidemiology 2016, 27, 697–704. [Google Scholar] [CrossRef] [PubMed]
- Butcher, B.; Huang, V.S.; Robinson, C.; Reffin, J.; Sgaier, S.K.; Charles, G.; Quadrianto, N. Causal Datasheet for Datasets: An Evaluation Guide for Real-World Data Analysis and Data Collection Design Using Bayesian Networks. Front. Artif. Intell. 2021, 4, 612551. [Google Scholar] [CrossRef]
- Arora, P.; Boyne, D.; Slater, J.J.; Gupta, A.; Brenner, D.R.; Druzdzel, M.J. Bayesian Networks for Risk Prediction Using Real-World Data: A Tool for Precision Medicine. Value Heal. 2019, 22, 439–445. [Google Scholar] [CrossRef]
- Sgaier, S.K.; Huang, V.; Charles, G. The Case for Causal AI. Stanf. Soc. Innov. Rev. 2020, 18, 50–55. [Google Scholar] [CrossRef]
- Kyrimi E, McLachlan S, Dube K, Fenton N. Bayesian Networks in Healthcare: the chasm between research enthusiasm and clinical adoption. 2020. [CrossRef]
- Martin-Fernandez, J.; Grillo, F.; Parizot, I.; Caillavet, F.; Chauvin, P. Prevalence and socioeconomic and geographical inequalities of household food insecurity in the Paris region, France, 2010. BMC Public Heal. 2013, 13, 486–486. [Google Scholar] [CrossRef]
- Vallée, J.; Chauvin, P. Investigating the effects of medical density on health-seeking behaviours using a multiscale approach to residential and activity spaces: Results from a prospective cohort study in the Paris metropolitan area, France. Int. J. Heal. Geogr. 2012, 11, 54–54. [Google Scholar] [CrossRef]
- Chauvin P, Parizot I. Les inégalités sociales et territoriales de santé dans l’agglomération parisienne. Une analyse de la cohorte Sirs (2005). Délégation interministérielle à la Ville; 2009.
- Vallée, J.; Cadot, E.; Grillo, F.; Parizot, I.; Chauvin, P. The combined effects of activity space and neighbourhood of residence on participation in preventive health-care activities: The case of cervical screening in the Paris metropolitan area (France). Heal. Place 2010, 16, 838–852. [Google Scholar] [CrossRef] [PubMed]
- Lefèvre, T.; Rondet, C.; Parizot, I.; Chauvin, P. Applying Multivariate Clustering Techniques to Health Data: The 4 Types of Healthcare Utilization in the Paris Metropolitan Area. PLOS ONE 2014, 9, e115064. [Google Scholar] [CrossRef] [PubMed]
- Rondet, C.; Soler, M.; Ringa, V.; Parizot, I.; Chauvin, P. The role of a lack of social integration in never having undergone breast cancer screening: Results from a population-based, representative survey in the Paris metropolitan area in 2010. Prev. Med. 2013, 57, 386–391. [Google Scholar] [CrossRef] [PubMed]
- Trohel, G.; Bertaud-Gounot, V.; Soler, M.; Chauvin, P.; Grimaud, O. Socio-Economic Determinants of the Need for Dental Care in Adults. PLoS ONE 2016, 11, e0158842. [Google Scholar] [CrossRef] [PubMed]
- Chevreul, K.; Durand-Zaleski, I.; Bahrami, S.B.; Hernández-Quevedo, C.; Mladovsky, P. France: Health system review. Health Syst. Transit. 2010, 12, 1–291, xxi–xxii. [Google Scholar]
- Tsamardinos, I.; Brown, L.E.; Aliferis, C.F. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 2006, 65, 31–78. [Google Scholar] [CrossRef]
- Scutari, M. Learning Bayesian Networks with thebnlearnRPackage. J. Stat. Softw. 2010, 35, 1–22. [Google Scholar] [CrossRef]
- Cooper, G.F.; Herskovits, E. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 1992, 9, 309–347. [Google Scholar] [CrossRef]
- Spirtes P, Glymour C, Scheines R. Causation, Prediction, and Search, Second Edition. second edition edition. Cambridge, Mass: A Bradford Book; 2001.
- Verma T, Pearl J. Equivalence and synthesis of causal models. UCLA, Computer Science Department; 1991.
- Yaramakala, S.; Margaritis, D. Speculative Markov Blanket Discovery for Optimal Feature Selection. Fifth IEEE International Conference on Data Mining (ICDM'05). LOCATION OF CONFERENCE, USADATE OF CONFERENCE; p. 4 pp.
- A Margolin, A.; Nemenman, I.; Basso, K.; Wiggins, C.; Stolovitzky, G.; Favera, R.D.; Califano, A. ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinform. 2006, 7, 1–15. [Google Scholar] [CrossRef]
- R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: 2005.
- Kitson, N.K.; Constantinou, A.C. Learning Bayesian networks from demographic and health survey data. J. Biomed. Informatics 2020, 113, 103588. [Google Scholar] [CrossRef]
- Gemert, S.l.B.-V.; Stolk, R.P.; Heuvel, E.R.v.D.; Fidler, V. Causal inference algorithms can be useful in life course epidemiology. J. Clin. Epidemiology 2014, 67, 190–198. [Google Scholar] [CrossRef] [PubMed]
- Kitson, N.K.; Constantinou, A.C.; Guo, Z.; Liu, Y.; Chobtham, K. A survey of Bayesian Network structure learning. Artif. Intell. Rev. 2023, 56, 8721–8814. [Google Scholar] [CrossRef]
- Constantinou, A.C.; Fenton, N. Things to know about Bayesian Networks: Decisions under Uncertainty, Part 2. Significance 2018, 15, 19–23. [Google Scholar] [CrossRef]
- Lewis, F.I.; McCormick, B.J.J. Revealing the Complexity of Health Determinants in Resource-poor Settings. Am. J. Epidemiology 2012, 176, 1051–1059. [Google Scholar] [CrossRef]
- Requejo Castro D, Giné Garriga R, Pérez Foguet A. Exploring the interlinkages of water and sanitation across the 2030 Agenda: a Bayesian Network approach. ISDRS 2018 24th Int. Sustain. Dev. Res. Soc. Conf. Messina Italy -15 2018 Book Pap., 2018, p. 121–35. 13 June.
- Spirtes, P.; Zhang, K. Causal discovery and inference: concepts and recent methodological advances. Appl. Informatics 2016, 3, 1–28. [Google Scholar] [CrossRef] [PubMed]
- Weinberger, N. Faithfulness, Coordination and Causal Coincidences. Erkenntnis 2017, 83, 113–133. [Google Scholar] [CrossRef]
- Scheines, R. An Introduction to Causal Inference n.d.
- Shen, X.; Ma, S.; Vemuri, P.; Simon, G. Challenges and Opportunities with Causal Discovery Algorithms: Application to Alzheimer’s Pathophysiology. Sci. Rep. 2020, 10, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef]
- Belle, V.; Papantonis, I. Principles and Practice of Explainable Machine Learning. Front. Big Data 2021, 4. [Google Scholar] [CrossRef]
- Pearl, J. Causality. Cambridge University Press, 2009. [Google Scholar]

| Link to DAC1 | Direction2 | Strength3 | ||
|---|---|---|---|---|
| Age | 18-29 | Yes | To DAC | Ref |
| 30-44 | 1.39 (0.98 to 1.98) | |||
| 45-59 | 1.20 (0.83 to 1.72) | |||
| 60-74 | 1.72 (1.12 to 2.64) | |||
| 75+ | 2.23 (1.22 to 4.31) | |||
| Gender | Men | Yes | To DAC | Ref |
| Women | 3.13 (2.45 to 4.02) | |||
| Origin | French, French parents | No | - | - |
| French, foreign parents | - | |||
| Migrant | ||||
| Education level | Primary or none | No | - | |
| Secondary | - | |||
| Tertiary | - | |||
| Employment status | Employed | No | - | |
| Unemployed | - | |||
| Inactive | - | |||
| Income | 1stquintile | No | - | |
| 2ndquintile | - | |||
| 3rdquintile | - | |||
| 4thquintile | - | |||
| 5thquintile | - | |||
| Health insurance status | None or SHI only | Yes | To DAC | Ref |
| SHI and VHI | 2.38 (1.75 to 3.23) | |||
| Health relatives | No | No | - | |
| Yes | - | |||
| Social integration | 1st quartile | No | - | |
| 2nd quartile | - | |||
| 3rd quartile | - | |||
| 4th quartile | - | |||
| Chronic Disease | No | Yes | To DAC | Ref |
| Yes | 2.94 (2.12 to 4.16) | |||
| Perceived health status | Good | Yes | To DAC | Ref |
| Bad-Average | 2.16 (1.50 to 3.20) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).