Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Analysis of Missingness Scenarios for Observational Health Data

Version 1 : Received: 4 April 2024 / Approved: 5 April 2024 / Online: 5 April 2024 (10:45:36 CEST)

How to cite: Zamanian, A.; von Kleist, H.; Ciora, O.A.; Piperno, M.; Lancho, G.; Ahmidi, N. Analysis of Missingness Scenarios for Observational Health Data. Preprints 2024, 2024040429. https://doi.org/10.20944/preprints202404.0429.v1 Zamanian, A.; von Kleist, H.; Ciora, O.A.; Piperno, M.; Lancho, G.; Ahmidi, N. Analysis of Missingness Scenarios for Observational Health Data. Preprints 2024, 2024040429. https://doi.org/10.20944/preprints202404.0429.v1

Abstract

Despite the extensive literature on missing data theory and cautionary articles emphasizing the importance of realistic analysis for healthcare data, a critical gap persists in incorporating domain knowledge into missing data problem formulation, assumption specification, and method development. In this paper, we highlight the gap particularly for observational data from healthcare facilities. We address this gap by identifying ten fundamental missingness scenarios arising during data measurement, recording, and pre-processing in observational health data, influenced by physicians, patients, healthcare facilities, and data scientists. We analyze the effect of scenarios on estimand formulation, missing data identification, estimation, and sensitivity analysis. To emphasize how domain-informed analysis can improve method reliability, we conduct simulation studies under the influence of various missingness scenarios. We compare the results of three common methods in medical data analysis (complete-case analysis, Missforest imputation, and inverse probability weighting estimation) for two estimands (variable mean estimation and classification accuracy). We advocate for our analysis approach as a reference for the analysis of observational health data. Furthermore, we posit that the proposed analysis framework is applicable to other medical domains, including medical wearable data analysis.

Keywords

Missing Data Analysis; Observational Health Data; Missingness Scenarios; Missing Data Assumptions; Missingness distribution shift

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.