Asians do not exhibit elevated expression or unique genetic polymorphisms for ACE 2 , the cell-entry receptor of SARS-CoV-2

The recurrent coronavirus outbreaks in China (SARS-CoV and its relative, SARS-CoV-2) have raised speculations that perhaps Asians are somehow more susceptible to these coronaviruses. Here, we test this possibility based on an analysis of the lung-specific expression of ACE2, which encodes the known cell-entry receptor of both SARS-CoV and SARS-CoV-2. We show that ACE2 expression is not affected during tumorigenesis, supporting that the abundant transcriptomes in cancer genomic studies can be informatively used to study ACE2 expression among diverse individuals without cancer. We find that ACE2 expression in the lung increases with age, but is not associated with sex. Further, Asians do not differ from other populations for ACE2 expression and do not harbor unique genetic polymorphisms in the ACE2 locus. Thus, beyond illustrating an innovative method for assessing the potential impacts of demographic factors for non-cancer diseases from large-scale cancer sample datasets, our statistically robust findings emphasize that individuals of all races require the same level of personal protection against SARS-CoV-2.


INTRODUCTION
The outbreak of coronavirus disease 2019, which is caused by SARS-CoV-2, has led to significant illness and death and has been designated a global health emergency by the World Health Organization. It is clear that SARS-CoV-2 is a close relative of SARS-CoV (Xu et al., 2020), which caused the well-studied severe acute respiratory syndrome in 2003. However, any demographic factors which may predict differential susceptibility to these coronaviruses remain poorly understood. In particular, since both SARS-CoV and SARS-CoV-2 epidemics broke out in China, it has been speculated that East Asians may be relatively more susceptible to these coronaviruses (Zhao et al., 2020).
Biologically, research from around the time of the SARS epidemic in 2003 revealed that SARS-CoV enters cells through a protein known as Angiotensinconverting enzyme 2 (ACE2) (Kuba et al., 2005;Li et al., 2003), whose native function is to play a role in the renin-angiotensin system regulating blood pressure and so forth (Donoghue et al., 2000). Susceptibility to SARS-CoV was positively associated with the ACE2 expression level among cells in the lung, as well as among nine diverse cell lines (Hofmann et al., 2004;Jia et al., 2005). Another study showed that, on the one hand, overexpressing ACE2 in cell lines promoted efficient replication of SARS-CoV, while on the other hand use of neutralizing antibodies against ACE2 inhibited viral replication, and did so in a dose-dependent manner (Li et al., 2003). In humans, profiling revealed ACE2 expression in alveolar epithelial cells (Hamming et al., 2004), which are understood as the primary site of SARS-CoV infection in the lung (Kuiken et al., 2003). And the study using an ACE2 knockout mouse model showed that mice lacking functional ACE2 exhibited reduced SARS-CoV levels in the lung (Kuba et al., 2005).
Very recent work has indicated that ACE2 is also used by SARS-CoV-2 to enter cells, and structural analyses revealed apparently strong binding affinity between the ACE2 protein and the spike protein of SARS- CoV-2 (Wrapp et al., 2020;Xu et al., 2020;Yan et al., 2020). Furthermore, it was demonstrated that HeLa cells were only susceptible to infection by SARS-CoV-2 when ACE2 was expressed . Thus, ACE2 expression in lung cells is understood as by far one of the most promising indicators for susceptibility to infection by SARS-CoV-2.
An analysis using the data of the genotype-tissue expression (GTEx) project failed to identify any quantitative trait loci associated with ACE2 expression in the lung (https://gtexportal.org/home/gene/ACE2); Asian-biased expression for any genes was not detected (Mele et al., 2015), potentially owing to the small number of samples from Asians (1.3% of all samples) in the GTEx project. In contrast, we reasoned that data for thousands of lung samples from a great diversity of individuals of various ages, sexes, and races are present in the database of The Cancer Genome Atlas (TCGA) and other cancer genomics studies. Pursuing this, we here investigated potential demographic factors affecting ACE2 expression based on cancer transcriptome data. We show that ACE2 expression is not affected during tumorigenesis, and find that Asians do not have higher ACE2 expression than other populations in a number of organs, including the lung. Furthermore, we did not detect any genetic polymorphisms in the ACE2 locus specifically enriched amongst Asians.
Thus, beyond helping to better understand the basic biology of coronavirus infection, our study can inform global efforts to develop efficient protection strategies against SARS-CoV-2.

RESULTS AND DISCUSSION
TCGA includes data for lung transcriptomes of two cancer types: lung adenocarcinoma (LUAD) and lung squamous cell carcinoma. We focused on the LUAD samples in this study because LUAD is known to be more likely developed from alveolar cells (Lin et al., 2012;Xu et al., 2012). The expression level of ACE2 was similar between primary LUAD tumors and adjacent normal tissues ( Figure   S1A-B) and was not associated with survival probability or pathologic stage ( Figure  S1C-D), indicating that ACE2 expression is not apparently affected during tumorigenesis.
Pursuing this further with additional cancer types, we determined that the expression level of ACE2 in primary tumor samples did genuinely reflected the levels of adjacent normal tissues across 19 cancer types in the TCGA (Pearson's correlation coefficient R = 0.8, P = 3×10 −5 , Figure S1E and Table S1). These findings support that the LUAD transcriptome data can be used to study ACE2 expression among cancer-free individuals. However, there were only seven Asian LUAD samples in the TCGA; to address this paucity of relevant samples, we expanded the sample size for Asians by including 260 Chinese LUAD transcriptomes (Chen et al., 2020a).
To determine if cancer transcriptome data can detect differential gene expression with sufficient sensitivity among cancer-free individuals, we used the 38 genes previously reported from the GTEx project that showed sex-or race-specific expression in the lung (Mele et al., 2015) as the gold standard. Among them, 37 genes are expressed in the LUAD samples, and all of these (37 out of 37) showed the same trends for significant differential expression as reported using the GTEx data ( Figure   S2), findings highlighting that our approach is highly sensitive for detecting demographic factors affecting gene expression in lungs. The average expression level of ACE2 among Asians was not significantly different from that among individuals of African or European ancestry (hereafter referred to as African or European, P = 0.42 and 0.96, respectively, assessed using two-tailed Mann-Whitney U tests, Figure 1C). Nor was any race-specific difference detected by a linear model which predicted ACE2 expression levels based on multiple demographic factors including age, sex, and race (Table S2); the variance in the log2transformed ACE2 expression level among all LUAD samples was 3.53 (i.e., standard deviation = 1.88), and race explained only ~ 0.1% of this variance (P = 0.64, the analysis of variance). These findings and data-driven predictions from our study refute a previously reported conclusion about the biased ACE2 expression that was based on a single Asian sample (Zhao et al., 2020).

Analysis in LUAD samples revealed that
There are some caveats to bear in mind with the present study. First, we focused on ACE2 expression in the lung, but ACE2 is expressed in other tissues/organs as well ( Figure S3), with especially strong expression in the gastrointestinal tract and kidney.
To address this, we further examined the potential Asian-biased expression of ACE2 in other cancer types. No such Asian-bias was detected for any cancer types with higher ACE2 expression levels than LUAD, including stomach, colon, and kidney cancers (Figure 2). Nevertheless, we cannot rule out the possibility that maybe ACE2 expresses at higher levels in Asians in some other as-yet-unexamined tissues.
A second caveat relates to our exclusive focus at the gene expression level; it is possible that some Asian-specific genetic variations in the coding sequence of ACE2 may also affect the cell-entry efficiency of viruses. To explore the potential impact of population-specific genetic variations, we retrieved all of the genetic variation data available for the ACE2 locus from the 1000 Genomes Project (The 1000 Genomes Project Consortium, 2015), which include samples from Africans, (native) Americans, East Asians, Europeans, and South Asians. We focused on the polymorphisms in East Asians, since both coronavirus epidemics broke out in East Asia. All 22 of the missense or stop-gained variations were present at a low frequency among East Asians (< 1%, Figure 3). Other variants (N = 690, Table S3) are mostly located in introns, and the frequencies of these variants among East Asians were highly correlated with those among all populations (Pearson's correlation coefficient R = 0.996, P < 2.2×10 −16 , Figure 3). For instance, we found that among the 37 detected polymorphisms having an alternate allele frequency > 10% in East Asians, none (0 out of 37) had reached a frequency two times higher than all populations (Figure 3).
We reached a similar conclusion about the lack of any East-Asian-specific enrichment for ACE2 genetic variants based on analyzing data from the Genome Aggregation Database (gnomAD, Figure S4) (Karczewski et al., 2019).
Another possibility is that other layers of regulation (e.g., translational efficiency, protein modification, folding, or subcellular localization) exist in addition to regulation at the mRNA level. While ACE2 protein levels are largely correlated with mRNA levels among tissues (https://www.proteinatlas.org/ENSG00000130234-ACE2/tissue) (Uhlen et al., 2015), whether these ACE2 proteins are located on the cell membrane remains unclear. Therefore, future investigations should examine potential race-specific bias for the accumulation of functional ACE2 on the plasma membrane. It would also be informative to investigate if co-receptors of SARS-CoV-2 exist and, if so, to determine if their abundances vary among populations. The identification of host cell features that affect the efficiency of viral amplification will also help improve accuracy for predicting susceptibility to coronaviruses in the future.
Based on the available data, we conclude that Asians do not express ACE2 at a higher level than other populations and do not bear unique genetic polymorphisms in ACE2. The recurrent coronavirus outbreaks in China may be better explained by the high diversity of coronaviruses and their animal hosts, or perhaps by Chinese food culture (Fan et al., 2019). Our study, therefore, cautions against any use of race to predict susceptibility to SARS-CoV-2 among individuals; that is, individuals of all races require the same level of personal protection against SARS-CoV-2.

METHODS
All available open RNA-seq data and clinical data for LUAD samples in TCGA were retrieved from https://www.cancer.gov/tcga. The RNA-seq data were obtained in the FPKM-UQ (fragments per kilobase of transcript per million mapped reads of the upper quartile gene) format, which is known to perform better in cross-sample comparisons and differential expression analyses. Transcriptomes of additional East Asian samples (Chen et al., 2020a) were retrieved from OncoSG (https://src.gisapps.org/OncoSG/) under the dataset "Lung Adenocarcinoma" (GIS, 2019); FPKM-UQ values for these samples were calculated from the numbers of reads mapped to individual genes. Notably, the FPKM-UQ values of expressed genes were globally higher in the data from OncoSG than from the TCGA data ( Figure   S5A); therefore, median normalization was further performed to help ensure comparability of the expression levels from these two data sources ( Figure S5B). The allele frequencies for ACE2 in East Asians and in all populations were also retrieved from gnomAD (https://gnomad.broadinstitute.org/). Figure 1. ACE2 expression levels are associated with age but not with sex or race, among LUAD samples. (A) Relationship between age and ACE2 expression level. Pearson's correlation coefficient (R) and the corresponding P-value are shown. (B) Comparison of ACE2 expression between sexes. P-value was calculated with a two-tailed Mann-Whitney U test. (C) Comparison of ACE2 expression among races. P-values were calculated with two-tailed Mann-Whitney U tests. The single sample of American Indian or Alaska Native in TCGA was not shown.

Figure 2. Comparison of ACE2 expression levels between Asians and others (Africans and Europeans) in additional tissues.
TCGA data for six cancer types in which the average ACE2 expression level is higher than in LUAD samples are shown. Rectum adenocarcinoma also has ACE2 expression higher than LUAD but is not shown, as there is only one available Asian sample. P-values were calculated with the two-tailed Mann-Whitney U tests. Outliers are not shown for the boxplots. Each dot represents a polymorphic site, and the frequency of the alternate allele is shown. Only dimorphic genetic variations are shown. To avoid negative infinity during the logarithm transformation, we calculate the frequencies of alternate alleles that were not detected in East Asian samples as if 0.5 alleles were found. Detailed information is provided in Table S3.