Initial review and analysis of COVID-19 host genetics and associated phenotypes

The global pandemic of COVID-19 accounts for more than 14,000 deaths worldwide. However, little is known about the host genetics interaction with infection and COVID-19 progression. To better understand the role of host genetics, we review the current literature, aggregate readily available genetic resources, and provide some updated analysis relevant to COVID-19 and associated phenotypes. Using the unrelated individuals in UK Biobank (total n = 337,579 across 5 populations), we aggregate human leukocyte antigen and ABO blood type frequencies. We find significant and consistent risk reduction of blood group O reported in Zhao et al. and encourage broad sharing of ABO blood type frequencies that are readily accessible across COVID-19 with mild, moderate, and severe/critical symptoms for robust inferences at https://tinyurl.com/abo-covid19. In addition, we generate polygenic risk scores (PRSs) weights for 29 blood measurements, including clinically relevant haematological measurements for COVID-19, such as lymphocyte count and percentage. Focusing on the 8 most COVID-19 clinically relevant blood measurements, we performed PRS-PheWAS analysis across 44 disease antigen measurements (n = 6,643 unrelated individuals in White British group), infectious diseases and acute respiratory infections (n = 20,928 cases and 349,000 controls across 3 population groups) and deaths (n = 1,846 cases and 368,082 controls), recorded in hospital inpatient record and death registry data, respectively, in UK Biobank, and find host genetic PRS associations with disease risk. Taken together, we anticipate these resources Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 24 March 2020 doi:10.20944/preprints202003.0356.v1 © 2020 by the author(s). Distributed under a Creative Commons CC BY license. 2 (https://github.com/rivas-lab/covid19) will aid in improving our understanding of host genetic risk factors playing a role in SARS-CoV-2 infection and COVID-19 disease severity.


Introduction
Here, we explore the literature available for COVID-19 disease and aggregate reference datasets like UK Biobank data to better understand the role host genetics may play in affecting viral infection predisposition and COVID-19 progression. Furthermore, we provide polygenic risk score weights for haematological measurements that play a role in the body's immune system that help the body fight viral infection.

HLA frequencies
Motivated by the HIV Controllers Study to define host genetic effects on the outcome of a viral infection where they analyzed the effects of individual amino acids in the human leukocyte antigen (HLA) region of the genome and identified 300 single nucleotide polymorphisms (SNPs) within the region and none elsewhere (Study & The International HIV Controllers Study, 2010), and by the 23andMe study that found that the HLA region is significantly associated with 13 of the 23 common infectious diseases studied (Tian et al., 2017), we sought to aggregate all the available HLA allelotype references including across five groups in UK Biobank (Bycroft et al., 2018) including South Asian (n= 7,885), East Asian (n = 1,154), African (n = 6,497), White British (n = 337,138), and non British European (n = 24,905) with the aim to provide more fine-scale estimates of allelotype frequencies in specific populations by further integrating with country of origin data and external reference data from the 17th International HLA & Immunogenetics workshop (http://17ihiw.org/17th-ihiw-ngs-hla-data/). We made these data publicly available at https://github.com/rivas-lab/covid19/tree/master/UKB_HLA_freq.

ABO blood type
Recently, Zhao et al. investigated the relationship between the ABO blood group from 1775 patients infected with SARS-CoV-2 and compared the distribution to surveys of ABO blood group distribution of 3,694 normal people from Wuhan city and 23,386 people from Shenzhen City (Zhao et al., 2020). In the study they found that blood group A had a 4 significantly higher risk for COVID-19 compared with non-blood group A groups (albeit modest effect size OR = 1.20, p = 0.02), whereas blood group O had a significantly lower risk for COVID-19 compared with non-O blood groups (OR = 0.67, p < 0.001). Given this result we sought to aggregate ABO blood group frequencies across population groups we have genotype data available for by using the combination of alleles at three different SNPs that represent the four major ABO antigens (rs8176746, rs687289, rs507666) clearly demonstrated by the Uppsala team (Johansson et al., 2015) (we thank Mike Inouye (@minouye271) for pointing us to this reference). It does seem like the data for rs8176746 is reported on the (-) strand as https://gnomad.broadinstitute.org/variant/9-136131322-G-T?dataset=gnomad_r2_1 shows it is a G/T allele and not C/A. First we generated additional population definitions based on more fine-scale resolution provided by UK Biobank (https://github.com/rivaslab/covid19/blob/master/ABO/sample_qc_v3.2.self_reported_pop_def.ipynb) using Data Field 21000 (Ethnic background) and compared the frequencies observed in UK Biobank to control groups and COVID-19 patients frequencies in both Shenzhen and Wuhan. Overall, we observed similar ABO blood type inferred frequencies using the three marker haplotype analysis between UK Biobank Chinese and Shenzhen. When we compared O blood group frequencies between Shenzhen controls and UK Biobank Chinese group we did not find differences in frequencies (p = 0.978; OR = 1.003, 95% CI: 0.898 -1.122), but did find differences in frequencies between Shenzhen 3rd Hospital patients and UK Biobank Chinese group (p = 9.76x10 -4 ; OR = 0.629, 95% CI: 0.470 -0.837) consistent with Zhao et al. observation. Nonetheless, we did find that the frequency of O blood group was different between UK Biobank Chinese group and Wuhan controls (p = 0.00121) suggesting that we should carefully consider inferences regarding ABO blood group differences and hope that these data be made available immediately.

Polygenic risk scores
Lower lymphocyte count has been associated with more severe disease in COVID-19 (Yang

Blood count polygenic risk score associations to infectious diseases
As proof of principle, we used the polygenic risk score weights trained with the snpnet package across the eight laboratory biomarkers in Table 1

Blood count polygenic risk score associations to respiratory infections, acute respiratory distress syndrome, influenza and pneumonia and death as a result
Finally, we assessed association between blood count polygenic risk scores and respiratory infections, acute respiratory distress syndrome, influenza and pneumonia and death as a result. We aggregated hospital in-patient and death register data from over 337,000 individuals in UK Biobank for ICD codes corresponding to J00-J06, J09-J18, J80, and J20-J22 (UK Biobank Fields 41202, 41204, 40001, 40002, 41201, and 41270 (Figure 4) with consistent effects in non-British European and South Asian group in UK Biobank (Supplementary Data 3). These results suggest that the PRS are positioned to assess associations with immune response to SARS-CoV-2.

Discussion
We present a review and analysis of COVID-19 host genetics and related phenotypes. We present some reference datasets that we hope will aid ongoing studies to improve our understanding of the role that host germline genetics plays in infection predisposition and COVID-19 disease progression including: 1) HLA allelotype and 2) ABO blood group frequencies across different population groups, and 3) polygenic risk score weights across haematological measurements.
Based on our review and analysis we find some support for O blood group protection.
However, we caution that further data aggregation across independent populations will be needed to make robust inferences. One such effort is the International COVID-19 Host Genetics Initiative (https://covid-19genehostinitiative.net/), where we are intimately involved. These efforts will include the generation of host genotype data via arrays, exome, and whole genome sequencing combined with analysis of phenotypes. Given that blood group information is accessible we encourage all areas around the globe to make those data readily available. Simple summary statistics like those provided in Zhao et al.
(distribution of blood group frequencies across all COVID-19 patients, mild COVID-19 patients, and severe COVID-19 patients) should expedite integration of host genetics. We make a publicly accessible Google Spreadsheet available for personnel to input ABO blood group summary level data https://tinyurl.com/abo-covid19 .
Insights gained from the PRS analysis include the association between blood count measurement PRS to infectious disease, respiratory infections, acute respiratory distress syndrome, influenza and pneumonia and death as a result. These results suggest we should consider PRS analysis of blood count measurements in the context of SARS-CoV-2 infection and COVID-19 disease severity.