Zamanian, A.; von Kleist, H.; Ciora, O.-A.; Piperno, M.; Lancho, G.; Ahmidi, N. Analysis of Missingness Scenarios for Observational Health Data. J. Pers. Med.2024, 14, 514.
Zamanian, A.; von Kleist, H.; Ciora, O.-A.; Piperno, M.; Lancho, G.; Ahmidi, N. Analysis of Missingness Scenarios for Observational Health Data. J. Pers. Med. 2024, 14, 514.
Zamanian, A.; von Kleist, H.; Ciora, O.-A.; Piperno, M.; Lancho, G.; Ahmidi, N. Analysis of Missingness Scenarios for Observational Health Data. J. Pers. Med.2024, 14, 514.
Zamanian, A.; von Kleist, H.; Ciora, O.-A.; Piperno, M.; Lancho, G.; Ahmidi, N. Analysis of Missingness Scenarios for Observational Health Data. J. Pers. Med. 2024, 14, 514.
Abstract
Despite the extensive literature on missing data theory and cautionary articles emphasizing the importance of realistic analysis for healthcare data, a critical gap persists in incorporating domain knowledge into missing data problem formulation, assumption specification, and method development. In this paper, we highlight the gap particularly for observational data from healthcare facilities. We address this gap by identifying ten fundamental missingness scenarios arising during data measurement, recording, and pre-processing in observational health data, influenced by physicians, patients, healthcare facilities, and data scientists. We analyze the effect of scenarios on estimand formulation, missing data identification, estimation, and sensitivity analysis. To emphasize how domain-informed analysis can improve method reliability, we conduct simulation studies under the influence of various missingness scenarios. We compare the results of three common methods in medical data analysis (complete-case analysis, Missforest imputation, and inverse probability weighting estimation) for two estimands (variable mean estimation and classification accuracy). We advocate for our analysis approach as a reference for the analysis of observational health data. Furthermore, we posit that the proposed analysis framework is applicable to other medical domains, including medical wearable data analysis.
Keywords
Missing Data Analysis; Observational Health Data; Missingness Scenarios; Missing Data Assumptions; Missingness distribution shift
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.