Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Enhancing Race and Ethnicity using Bayesian Imputation in an All Payer Claims Database

Version 1 : Received: 14 January 2022 / Approved: 17 January 2022 / Online: 17 January 2022 (12:40:15 CET)

How to cite: El Ibrahimi, S.; Hendricks, M.A.; Hallvik, S.E.; Dameshghi, N.; Hildebran, C.; Fischer, M.A.; Weiner, S.G. Enhancing Race and Ethnicity using Bayesian Imputation in an All Payer Claims Database. Preprints 2022, 2022010227 (doi: 10.20944/preprints202201.0227.v1). El Ibrahimi, S.; Hendricks, M.A.; Hallvik, S.E.; Dameshghi, N.; Hildebran, C.; Fischer, M.A.; Weiner, S.G. Enhancing Race and Ethnicity using Bayesian Imputation in an All Payer Claims Database. Preprints 2022, 2022010227 (doi: 10.20944/preprints202201.0227.v1).

Abstract

Background: All Payer Claims Databases (APCD) are a rich source of health information, however, race and ethnicity (R&E) data are largely missing. Bayesian Improved Surname Geocoding (BISG) is a common R&E imputation method, yet, validation of BISG in APCDs is lacking. We used the BISG to impute missing R&E in the Oregon APCD. Methods: BISG imputed R&E for Asian Pacific Islanders (API), Blacks, Hispanics and Whites were contrasted to the gold standard (vital statistics) and sensitivity and specificity improvements were assessed. Logistic regression examined whether missing R&E was random across patient characteristics. Results: Among 85,857 individuals in the study, 32.1% (n=27,594) had missing R&E. Missing R&E was not randomly distributed. There were higher odds of missingness among males, Whites, those age 65 and older, and commercially insured individuals. Differences in the percent missing were also found by co-morbid conditions and mortality causes. Imputing the missing R&E with BISG method improved the sensitivity to identify White, Black, API, and Hispanics. Conclusions: APCDs can benefit from enhancing missing R&E with BISG imputation to perform more robust population-health level analyses and identify inequities according to R&E without losing power or dropping non-random records with missing R&E data.

Keywords

Bayesian inference; race and ethnicity imputation; All Payer Claims Database; vital statistics death records; validation

Subject

BEHAVIORAL SCIENCES, Other

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.