Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Opening the Black-box of Imputation Software to Study the Impact of Reference Panel Composition on Performance

Version 1 : Received: 23 December 2022 / Approved: 26 December 2022 / Online: 26 December 2022 (09:52:08 CET)

A peer-reviewed article of this Preprint also exists.

Dekeyser, T.; Génin, E.; Herzig, A.F. Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance. Genes 2023, 14, 410. Dekeyser, T.; Génin, E.; Herzig, A.F. Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance. Genes 2023, 14, 410.

Abstract

Genotype imputation is widely used to enrich genetic datasets. The operation relies of panels of known reference haplotypes with typically whole-genome sequencing data. How to choose a reference panel has been widely studied and it is essential to have a panel that is well matched to the individuals who require imputation of missing genotypes. However, it is broadly accepted that such an imputation panel will have an enhanced performance with the inclusion of diversity; haplotypes from many different populations. We investigate this observation in this work by examining in fine detail exactly which reference haplotypes are contributing at different regions of the genome. This is achieved using a novel method of inserting synthetic genetic variation into the reference panel in order to track the performance of leading imputation algorithms. We show that while diversity may globally improve imputation accuracy, there can be occasions where incorrect genotypes are imputed following the inclusion of more diverse haplotypes in the reference panel. We however demonstrate a technique for retaining and benefitting from the diversity in the reference panel whilst avoiding the occasional adverse effects on imputation accuracy. What is more, our results elucidate more clearly the role of the diversity in a reference panel than has been shown in previous studies.

Keywords

genotype imputation; population genetics; rare-variants; reference panel; admixture

Subject

Biology and Life Sciences, Biochemistry and Molecular Biology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.