Subject:
Medicine And Pharmacology,
Other
Keywords:
directed acyclic graph; DAG; causal inference; bias; inferential statistics; reproducibility
Online: 8 October 2022 (02:57:44 CEST)
The origins of directed acyclic graphs (DAGs) date back to the emergence of ‘graph theory’ in the early 1700s (Biggs et al. 1986). DAGs are conceptual or literal, diagrammatic representations of causal paths between variables which are constructed – as their name suggests – on the basis of two over-riding principles: first, that all causal paths are ‘directed’ (i.e. for each pair of variables, only one can represent the cause, while the other must be its consequence); and second, that no direct cyclical paths, or indirect cyclical pathways (comprising sequences of consecutive paths) are allowed, such that no consequence can be considered its own direct or indirect cause (hence ‘acyclic’; Law et al., 2012). As such DAGs reflect the knowledge, presumptions, assumptions and/or speculation of the analyst(s) concerned regarding the causal relationships between each of the variables included therein. Current convention dictates that variables are represented as nodes/vertices, and that any causal paths between variables are represented as directed arcs/edges/lines, often in the form of arrows (see Figure 1). Although each arc indicates the presence and direction of a known/presumed/assumed/speculative causal relationship between the two variables concerned, drawing an arc does not require the sign, magnitude, precision or shape of the relationship to be known or declared (Tennant et al., 2021). In this respect, DAGs provide a simple, uncomplicated, accessible and entirely nonparametric approach for postulating causal relationships amongst any variables of interest even when these are uncertain, unknown or entirely speculative (Ellison, 2020). Nonetheless, as a result of the parametric constraints imposed by the presence/absence of possible arcs within any given DAG, these also reflect and support a number of more sophisticated statistical applications which make it possible to use DAGs to inform the design of multivariable statistical models that reflect the causal structure(s) involved – albeit without the need to know or understand the mathematical technicalities on which these are based (Lewis and Kuerbis, 2016). These features make DAGs attractive cognitive, educational and analytical tools for strengthening the epistemological, theoretical and empirical basis of causal inference, and there has been a recent proliferation in the use of DAGs across a range of applied scientific disciplines (e.g. Knight and Winship, 2013), and an associated upsurge in analytical methods training (e.g. Elwert, 2011; Gilthorpe, 2017; Hernán 2018; Roy, 2021; Hünermund, 2021). This Chapter reflects on a decade of delivering medical statistics training to undergraduate medical students at the University of Leeds between 2012-2021 in which the third year research, evaluation and special studies module (‘RESS3’) has used DAGs to support the development of applied statistical skills relevant to the extended student-selected research and evaluation projects (ESREP) students undertake in their fourth and final years (Ellison, 2021; Ellison et al., 2014a,b). Based on successive iterations of the structure and content of the RESS3 module, together with notes made during formal and informal planning and review meetings with module leads, lecturers, tutors and students, we draw on the claims and criticisms made of DAGs in the epidemiological literature to identify a number of explicit strengths (and associated, often implicit. weaknesses) that are central to their use in prediction and causal inference modelling. While using DAGs requires (and benefits from) a clear understanding of their non-parametric nature and parametric implications, the weaknesses of DAGs seem likely to reflect both: the challenges inherent in the modelling of data generating processes when these are imperfectly understood; and troublesome cognitive and heuristic tendencies common to all analytical tools – in which the tool facilitates the task in hand by reducing the necessity (and benefits of) exploring uncertainties and identifying assumptions. These, more epistemological considerations appear particularly challenging for medical undergraduates to grasp (Ellison, 2021), but also appear poorly understood by many established analysts and clinical epidemiologists (Ellison, 2020).
Subject:
Computer Science And Mathematics,
Probability And Statistics
Keywords:
Directed Acyclic Graph; DAG; confounding; collider bias; epistemology; inferential statistics
Online: 8 October 2022 (02:59:34 CEST)
Directed acyclic graphs (DAGs) are nonparametric causal path diagrams that have substantial utility as principled representations of disease and healthcare pathways, and of the underlying ‘data generating mechanisms’ these pathways involve. As such, DAGs provide a valuable bridge between: the aetiological knowledge, operational insight and professional experience on which clinical training and practice depend; and the more abstract epistemological and analytical considerations required to extract robust statistical insight from health and healthcare data. DAGs are nonetheless vulnerable to imperfect biomedical paradigms, partial clinical knowledge and limited empirical data. DAGs drawn under such circumstances offer limited scope for statistical insight free from cognitive, analytical or inferential bias if: they misrepresent the data generating mechanisms involved; or ignore the important role that omitted variables (whether measured, unmeasured or unacknowledged) might play therein. To address these weaknesses and broaden the appeal and application of DAGs, this chapter provides ten simple steps that educators can use to improve the analytical competence and statistical confidence of the healthcare students, qualified practitioners and experienced researchers they support. These steps use temporal logic to draw DAGs so as to: reduce reliance on uncertain knowledge, incomplete information, flawed assumptions or guesswork; and avoid, mitigate or acknowledge the errors and biases that each of these incur. The chapter comprises an accessible, non-technical overview of the perspective and thoughtfulness required to generate temporally coherent DAGs as objective representations of the probabilistic causal paths involved in context-specific data generating mechanisms. It encourages a focus on those variables operating as potential sources of analytical or inferential bias when estimating the plausible, probabilistic causal relationship between two pre-specified variables; and specifically addresses the challenges posed by: omitted; time-variant; non-asynchronous; and temporally obscure variables. The chapter includes a worked example based on a published clinical study to demonstrate how each of the steps required to generate temporally-informed DAGs can be applied to: critically appraise the analytical decisions made during applied healthcare research; and inform the decisions required when designing, undertaking and analysing primary and secondary, prospective and retrospective research. The appendices include a summary of ten recommendations for improving the reporting and interrogability of DAGs and DAG-informed analyses.