Preprint Essay Version 1 Preserved in Portico This version is not peer-reviewed

COVID-19 and the Epistemology of Epidemiological Models at the Dawn of AI

Version 1 : Received: 9 August 2020 / Approved: 10 August 2020 / Online: 10 August 2020 (10:44:46 CEST)

How to cite: Ellison, G. COVID-19 and the Epistemology of Epidemiological Models at the Dawn of AI. Preprints 2020, 2020080245. Ellison, G. COVID-19 and the Epistemology of Epidemiological Models at the Dawn of AI. Preprints 2020, 2020080245.


The models used to estimate disease transmission, susceptibility and severity determine what epidemiology can (and cannot tell) us about COVID-19. These include: ‘model organisms’ chosen for their phylogenetic/aetiological similarities; multivariable statistical models to estimate the strength/direction of (potentially causal) relationships between variables (through ‘causal inference’), and the (past/future) value of unmeasured variables (through ‘classification/prediction’); and a range of modelling techniques to predict beyond the available data (through ‘extrapolation’), compare different hypothetical scenarios (through ‘simulation’), and estimate key features of dynamic processes (through ‘projection’). Each of these models: address different questions using different techniques; involve assumptions that require careful assessment; and are vulnerable to generic and specific biases that can undermine the validity and interpretation of their findings. It is therefore necessary that the models used: can actually address the questions posed; and have been competently applied. In this regard, it is important to stress that extrapolation, simulation and projection cannot offer accurate predictions of future events when the underlying mechanisms (and the contexts involved) are poorly understood and subject to change. Given the importance of understanding such mechanisms/contexts, and the limited opportunity for experimentation during outbreaks of novel diseases, the use of multivariable statistical models to estimate the strength/direction of potentially causal relationships between two variables (and the biases incurred through their misapplication/misinterpretation) warrant particular attention. Such models must be carefully designed to address: ‘selection-collider bias’, ‘unadjusted confounding bias’ and ‘inferential mediator adjustment bias’ – all of which can introduce effects capable of enhancing, masking or reversing the estimated (true) causal relationship between the two variables examined. Selection-collider bias occurs when these two variables independently cause a third (the ‘collider’), and when this collider determines/reflects the basis for selection in the analysis. It is likely to affect all incompletely representative samples, although its effects will be most pronounced wherever selection is constrained (e.g. analyses focusing on infected/hospitalised individuals). Unadjusted confounding bias disrupts the estimated (true) causal relationship between two variables when: these share one (or more) common cause(s); and when the effects of these causes have not been adjusted for in the analyses (e.g. whenever confounders are unknown/unmeasured). Inferentially similar biases can occur when: one (or more) variable(s) (or ‘mediators’) fall on the causal path between the two variables examined (i.e. when such mediators are caused by one of the variables and are causes of the other); and when these mediators are adjusted for in the analysis. Such adjustment is commonplace when: mediators are mistaken for confounders; prediction models are mistakenly repurposed for causal inference; or mediator adjustment is used to estimate direct and indirect causal relationships (in a mistaken attempt at ‘mediation analysis’). These three biases are central to ongoing and unresolved epistemological tensions within epidemiology. All have substantive implications for our understanding of COVID-19, and the future application of artificial intelligence to ‘data-driven’ modelling of similar phenomena. Nonetheless, competently applied and carefully interpreted, multivariable statistical models may yet provide sufficient insight into mechanisms and contexts to permit more accurate projections of future disease outbreaks.


COVID-19; description; prediction; causal inference; extrapolation; simulation; projection

Comments (2)

Comment 1
Received: 10 August 2020
Commenter: George Ellison (author)
Commenter's Conflict of Interests: I am the author of this essay.
Comment: This essay was written as an invited Commentary (currently under review) for publication in the Annals of Human Biology. I would welcome any suggestions for improvements and/or clarifications (or indeed, any errors that might need to be corrected). Many thanks, G
+ Respond to this comment
Response 1 to Comment 1
Received: 26 September 2020
Commenter: George Ellison
The commenter has declared there is no conflict of interests.
Comment: Many thanks to those of you who have offered comments and suggestions on this pre-publication version of the manuscript. This has now been accepted for publication in the Annals of Human Biology and will be published shortly. Any last minute comments/suggestions can be submitted directly to me at: - Many thanks, George

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 2
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.