Subject: Medicine And Pharmacology, Other Keywords: directed acyclic graph; DAG; causal inference; bias; inferential statistics; reproducibility
Online: 8 October 2022 (02:57:44 CEST)
The origins of directed acyclic graphs (DAGs) date back to the emergence of ‘graph theory’ in the early 1700s (Biggs et al. 1986). DAGs are conceptual or literal, diagrammatic representations of causal paths between variables which are constructed – as their name suggests – on the basis of two over-riding principles: first, that all causal paths are ‘directed’ (i.e. for each pair of variables, only one can represent the cause, while the other must be its consequence); and second, that no direct cyclical paths, or indirect cyclical pathways (comprising sequences of consecutive paths) are allowed, such that no consequence can be considered its own direct or indirect cause (hence ‘acyclic’; Law et al., 2012). As such DAGs reflect the knowledge, presumptions, assumptions and/or speculation of the analyst(s) concerned regarding the causal relationships between each of the variables included therein. Current convention dictates that variables are represented as nodes/vertices, and that any causal paths between variables are represented as directed arcs/edges/lines, often in the form of arrows (see Figure 1). Although each arc indicates the presence and direction of a known/presumed/assumed/speculative causal relationship between the two variables concerned, drawing an arc does not require the sign, magnitude, precision or shape of the relationship to be known or declared (Tennant et al., 2021). In this respect, DAGs provide a simple, uncomplicated, accessible and entirely nonparametric approach for postulating causal relationships amongst any variables of interest even when these are uncertain, unknown or entirely speculative (Ellison, 2020). Nonetheless, as a result of the parametric constraints imposed by the presence/absence of possible arcs within any given DAG, these also reflect and support a number of more sophisticated statistical applications which make it possible to use DAGs to inform the design of multivariable statistical models that reflect the causal structure(s) involved – albeit without the need to know or understand the mathematical technicalities on which these are based (Lewis and Kuerbis, 2016). These features make DAGs attractive cognitive, educational and analytical tools for strengthening the epistemological, theoretical and empirical basis of causal inference, and there has been a recent proliferation in the use of DAGs across a range of applied scientific disciplines (e.g. Knight and Winship, 2013), and an associated upsurge in analytical methods training (e.g. Elwert, 2011; Gilthorpe, 2017; Hernán 2018; Roy, 2021; Hünermund, 2021). This Chapter reflects on a decade of delivering medical statistics training to undergraduate medical students at the University of Leeds between 2012-2021 in which the third year research, evaluation and special studies module (‘RESS3’) has used DAGs to support the development of applied statistical skills relevant to the extended student-selected research and evaluation projects (ESREP) students undertake in their fourth and final years (Ellison, 2021; Ellison et al., 2014a,b). Based on successive iterations of the structure and content of the RESS3 module, together with notes made during formal and informal planning and review meetings with module leads, lecturers, tutors and students, we draw on the claims and criticisms made of DAGs in the epidemiological literature to identify a number of explicit strengths (and associated, often implicit. weaknesses) that are central to their use in prediction and causal inference modelling. While using DAGs requires (and benefits from) a clear understanding of their non-parametric nature and parametric implications, the weaknesses of DAGs seem likely to reflect both: the challenges inherent in the modelling of data generating processes when these are imperfectly understood; and troublesome cognitive and heuristic tendencies common to all analytical tools – in which the tool facilitates the task in hand by reducing the necessity (and benefits of) exploring uncertainties and identifying assumptions. These, more epistemological considerations appear particularly challenging for medical undergraduates to grasp (Ellison, 2021), but also appear poorly understood by many established analysts and clinical epidemiologists (Ellison, 2020).
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Directed Acyclic Graph; DAG; confounding; collider bias; epistemology; inferential statistics
Online: 8 October 2022 (02:59:34 CEST)
Directed acyclic graphs (DAGs) are nonparametric causal path diagrams that have substantial utility as principled representations of disease and healthcare pathways, and of the underlying ‘data generating mechanisms’ these pathways involve. As such, DAGs provide a valuable bridge between: the aetiological knowledge, operational insight and professional experience on which clinical training and practice depend; and the more abstract epistemological and analytical considerations required to extract robust statistical insight from health and healthcare data. DAGs are nonetheless vulnerable to imperfect biomedical paradigms, partial clinical knowledge and limited empirical data. DAGs drawn under such circumstances offer limited scope for statistical insight free from cognitive, analytical or inferential bias if: they misrepresent the data generating mechanisms involved; or ignore the important role that omitted variables (whether measured, unmeasured or unacknowledged) might play therein. To address these weaknesses and broaden the appeal and application of DAGs, this chapter provides ten simple steps that educators can use to improve the analytical competence and statistical confidence of the healthcare students, qualified practitioners and experienced researchers they support. These steps use temporal logic to draw DAGs so as to: reduce reliance on uncertain knowledge, incomplete information, flawed assumptions or guesswork; and avoid, mitigate or acknowledge the errors and biases that each of these incur. The chapter comprises an accessible, non-technical overview of the perspective and thoughtfulness required to generate temporally coherent DAGs as objective representations of the probabilistic causal paths involved in context-specific data generating mechanisms. It encourages a focus on those variables operating as potential sources of analytical or inferential bias when estimating the plausible, probabilistic causal relationship between two pre-specified variables; and specifically addresses the challenges posed by: omitted; time-variant; non-asynchronous; and temporally obscure variables. The chapter includes a worked example based on a published clinical study to demonstrate how each of the steps required to generate temporally-informed DAGs can be applied to: critically appraise the analytical decisions made during applied healthcare research; and inform the decisions required when designing, undertaking and analysing primary and secondary, prospective and retrospective research. The appendices include a summary of ten recommendations for improving the reporting and interrogability of DAGs and DAG-informed analyses.
ARTICLE | doi:10.20944/preprints202305.0974.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: Adaptive routing; dyamic programming; shortest paths; acyclic directed graphs
Online: 15 May 2023 (04:33:40 CEST)
Routing a person through a traffic network presents a tension between selecting a fixed route that is easy to navigate and selecting an aggressively adaptive route that minimizes the expected travel time. We propose to create non-aggressive adaptive routes in the middle-ground seeking the best of both these extremes. Specifically, these routes still adapt to changing traffic conditions, however we limit the number of adjustments made in the route. This improves the user experience, by providing a continuum of options between saving travel time and minimizing navigation. We design strategies to model single and multiple route adjustments, and investigate enumerative techniques to solve these models. To alleviate the intractability with handling real-life traffic data, we develop efficient algorithms with easily computable lower and upper bounds. We finally present computational experiments highlighting the benefits of limited adaptability in terms of reducing the expected travel time.
ARTICLE | doi:10.20944/preprints201607.0007.v1
Subject: Business, Economics And Management, Econometrics And Statistics Keywords: Bayesian networks; directed acyclic graphs; employee loyalty; employment arrangements; flexi-time; job satisfaction; teleworking; workplace employment relations survey
Online: 7 July 2016 (12:12:14 CEST)
This study explores the relationship between job satisfaction, employee loyalty and two types of flexible employment arrangements; teleworking and flexi-time. The analysis relies on data derived by the Workplace Employee Relations Survey (WERS) in 2004 and 2011. A propensity score matching and least squares regressions are applied. Furthermore, Bayesian Networks (BN) and Directed Acyclic Graphs (DAGs) are employed in order to confirm the causality between employment types explored and the outcomes of interest. Finally, an instrumental variables (IV) approach based on the BN framework is proposed and applied in this study. The results support that there is a positive causal effect from these employment arrangements on job satisfaction and employee loyalty.