Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Integrating External Controls by Regression Calibration for Genome-Wide Association Study

Version 1 : Received: 14 December 2023 / Approved: 15 December 2023 / Online: 15 December 2023 (12:15:41 CET)

A peer-reviewed article of this Preprint also exists.

Zhu, L.; Yan, S.; Cao, X.; Zhang, S.; Sha, Q. Integrating External Controls by Regression Calibration for Genome-Wide Association Study. Genes 2024, 15, 67. Zhu, L.; Yan, S.; Cao, X.; Zhang, S.; Sha, Q. Integrating External Controls by Regression Calibration for Genome-Wide Association Study. Genes 2024, 15, 67.

Abstract

Genome-wide association studies (GWAS) have successfully revealed many disease-associated genetic variants. For a case-control study, the adequate power of an association test can be achieved with a large sample size, although genotyping large samples is expensive. A cost‐effective strategy to boost power is to integrate external control samples with publicly available genotyped data. However, the naïve integration of external controls may inflate the type I error rates if ignoring the systematic differences (batch effect) between studies, such as the differences in sequencing platforms, genotype calling procedures, population stratification, and so forth. To account for the batch effect, we propose an approach by integrating External Controls into the Association Test by Regression Calibration (iECAT-RC) in case-control association studies. Extensive simulation studies show that iECAT-RC not only can control type I error rates but also can boost statistical power in all models. We also apply iECAT-RC to the UK Biobank data for M72 Fibroblastic disorders by considering genotype calling as the batch effect. Four SNPs associated with Fibroblastic disorders have been detected by iECAT-RC and the other two comparison methods. However, our method has a higher probability of identifying these significant SNPs in the scenario of an unbalanced case-control association study.

Keywords

Genome-wide association test; case-control study; batch effect; data integration

Subject

Computer Science and Mathematics, Probability and Statistics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.