A Hint on the COVID-19 Risk: Population Disparities in Gene Expression of Three Receptors of SARS-CoV

The current spreading novel coronavirus SARS-CoV-2 is highly infectious and pathogenic and has attracted global attention. Recent studies have found that SARS-CoV-2 and SARS-CoV share around 80% of homology and use the same cell entry receptor, ACE2. These inspired us to study other receptors of SARS-CoV, which may be used for SARS-CoV-2 binding as well. In this study, we screened the gene expression of three receptors (ACE2, DC-SIGN and L-SIGN) in four datasets of normal lung tissue from lung adenocarcinoma patients and two single-cell RNA sequencing datasets from normal lung and bronchial epithelial cells separately. No significant difference in gene expression of these three receptors were found between gender groups (male vs female). We found higher gene expression of DC-SIGN in elder with age>60 and higher gene expression of L-SIGN in Caucasian than Asian. Similar to ACE2, we observed significantly higher DC-SIGN gene expression in the lungs of smokers, especially former smokers. However, smokers upregulate ACE2 and DC-SIGN gene expression in different cell types. In the whole lung, ACE2 is actively expressed in remodeled Alveolar Type II cells of former smokers, while DC-SIGN is largely expressed in monocytes of former smokers and dendritic cells of current smokers. In bronchial epithelium, no obvious gene expression of DCSIGN and L-SIGN was observed while ACE2 was found to be actively expressed in goblet cells of current smokers and club cells of non-smokers. In conclusion, our findings may indicate that smokers, especially former smokers, and people over 60 have higher risk and are more susceptible to SARS-CoV-2 infection. Also, this study provides hints on possible SARS-CoV-2 pathogenicity mechanisms in lung infection.


Introduction
A novel coronavirus SARS-CoV-2 from Wuhan has been spreading as a pandemic infection.
SARS-CoV-2 is highly infectious and pathogenic through animal-to-human and human-tohuman transmission and causes severe Coronavirus disease 2019 (COVID-19) 1  Coronavirus is a single-stranded RNA virus that can be divided into four main genres including the alpha, beta, gamma and delta coronaviruses 4 . Recent studies showed that SARS-CoV-2 is closely related to SARS-CoV, with around 80% identity of genome 5 , belonging to the same -genres 6 and causing similar symptoms such as fever, malaise, dry cough and acute respiratory response 7 . Moreover, Lu et al. found that SARS-CoV-2 and SARS-CoV have similar receptor-binding domain (RBD) structures 4 and further studies confirmed that they use the same cell entry receptor, ACE2 8 . These findings indicate that SARS-CoV-2 and SARS-CoV may largely share pathogenicity mechanisms.
Many viruses use multiple alternative receptors to enter host cells. Three receptors-ACE2, DC-SIGN and L-SIGN (gene symbol as ACE2, CD209 and CLEC4M respectively) have been found to be involved in the pathogenicity of SARS-CoV 9-12 . DC-SIGN and L-SIGN are homologous C-type lectins receptors, which can identify carbohydrate structures of viral glycoproteins and play crucial roles in anchoring enveloped viruses 13 . In vivo, L-SIGN is largely expressed on endothelial cells in liver sinusoids and lymph nodes, whereas DC-SIGN is expressed on dendritic cells (DCs) 14 . They both capture virus and play important roles in virus transmission within the host 15,16 . Studies showed DC-SIGN is an independent receptor and synergistically works with ACE2 on the SARS-CoV viral entry which is mediated by the pH-dependent endocytosis 10,17,18 . Also, Jeffers et.al found L-SIGN is a potential portal of entry for SARS-CoV, similar to Ebola and Sindbis. 12 Paradoxically, L-SIGN can also internalize the virus and promote virus degradation in a proteasome-dependent manner 19 . Indeed, Chan et.al found homozygote L-SIGN in 69-nucleotide tandem repeats in exon 4 may play a protective role during SARS infection 11 .
Based on above knowledge of SARS-CoV, we believe that ACE2 is not the only receptor of SARS-CoV-2 as well. Here, we studied the differences related to race, age, gender, and smoking status in the gene expression of three putative SARS-CoV-2 receptors (ACE2, DC-SIGN and CD209L) by analyzing four large-scale datasets of normal lung tissues. Also, we investigated the distribution of their gene expression among cell types by analyzing two lung tissue single-cell transcriptomic datasets. This study helps understand the pathogenicity and susceptibility of SARS-CoV-2 infection.

Bulk transcriptomics
We used two RNA-seq datasets and two DNA microarray datasets of normal lung tissues from lung adenocarcinoma patients, including a Caucasian RNA-seq dataset from TCGA (54 samples), an Asian RNA-seq dataset GSE40419 (77 samples), a Caucasian microarray dataset GSE10072 (33 samples) and an Asian microarray dataset GSE19804 (60 samples). The details and processing of data were described in our previous study 3 .
Simple linear regressions were used to test the association of ACE2, CD209 and CLCE4M gene expression with each single variable of age, gender, race and smoking status. Also, multiple linear regression was used to test the association of their gene expression with multiple factors (age, gender, race, smoking status and data platform). P-values and fold changes from group comparisons were visualized in dot plots. Also, ordinal regression was performed to investigate the trend between ACE2, CD209 and CLEC4M expression and ordinal categorical smoking history (current smokers, former smokers and non-smokers). All data management, statistical analyses and visualizations were accomplished using R 3.6.3.

Single-cell transcriptomics
We analyzed two single-cell RNA sequencing datasets GSE122960 and GSE131391. The GSE122960 dataset was from lung tissue of 8 lung transplant donors, including 5 African American non-smokers, 1 Asian former smoker and 2 Caucasian current smokers. The GSE131391 dataset focused on bronchial epithelial cells from 6 never and 6 current smokers.
The data normalization, high variable feature selection, data scaling, data visualization and cell type identification and other analyses were performed in the same way as our previous study 3 .

ACE2 and CD209 are overexpressed in lungs of smokers, especially former smokers
Same as our previous study, we found smokers (including current smokers and former smokers) had a significantly higher gene expression of ACE2 than non-smokers in GSE40419 (p-value<0.01, Fig. 1C) and TCGA (p-value=0.05, Fig. 1C) datasets. The GSE19804 microarray study, which focused on female non-smokers, was not included into the analysis. Further, we performed a multivariate analysis of smoking status to adjust the effects from other factors (platform, age, race and gender) and found smoking still shows a significant disparity in gene expression of ACE2 (p-value<0.01, Table A). We also found that CD209 was significantly upregulated in smokers after multi-factor adjustment (p-value<0.01, Table A) although the difference in each dataset didn't reach the statistical threshold of significance. No significant difference in CLEC4M gene expression between smokers and non-smokers was found either in simple or multiple regression analysis. Further, we studied the expression profiles of nonsmoker, former-smoker and current smoker and found no significant trend of gene expression of CD209 in these three separate datasets. The small sample size in each group might not have enough power to detect the trend. However, we found higher mean expression of gene CD209 in former smokers compared with non-smokers and current smokers in both GSE10072 and GSE40419 datasets, which is similar to our previous observation on ACE2 gene expression ( Fig. 2). Consistently, TCGA dataset showed higher average expression of CD209 in recent quitters (<=15 years) compared to non-smokers, current smokers and smokers who have quit for longer durations (>15 years). Although these differences are not statistically significant which might due to the small sample size, these findings provide scientific hypotheses for further investigation.

Caucasian have higher CLEC4M lung gene expression than Asian
We observed higher gene expression of CLEC4M in Caucasian lung tissue samples compared with Asian lung tissue samples in both Microarray datasets (p-value<0.01) and RNA-seq datasets (p-value<0.01). Also, multivariate analysis showed a significant disparity in CLEC4M gene expression (p-value=3.75E-12, Table A). Differently, we didn't observe that in CD209 and ACE2 expression in multivariate analysis with adjustment of platform, age, gender and smokingstatus.

CD209 is upregulated in elder, no gender disparity was observed
We found higher expression of CD209 in the age>60 group than the age<60 group by single variable analysis on the GSE40419 dataset (p-value=0.05, Fig. 1A) and multivariate analysis on all four datasets (p-value=0.03, Table 1). No significant differences between age groups were found in ACE2 and CLEC4M gene expression. And, we didn't find any significantly difference in the expression of any of three interested genes between gender groups.

Smokers have upregulated ACE2 and CD209 gene expression in different cell types in lung
In our previous study, we analyzed two single cell datasets and found that ACE2 was expressed in goblet cells of smokers and club cells of non-smokers, respectively, and upregulated in remodeled Alveolar Type II (AT2) cells of former smokers. Applying the same method, we identified 13 different cell populations from single-cell RNA sequencing of whole-lung tissue and found that CD209 was highly expressed in monocytes and dendritic cells (DCs), which is consistent with previous reports 15 . We found CD209 was actively expressed in monocytes in the former smoker but not distinctly expressed in current smokers and non-smokers. And, we observed active CD209 expression in DCs from current smokers but not from a former smoker and non-smokers (Fig.3). Given the close lineage relationship between monocytes and DCs, this may indicate smokers has upregulated CD209 in the monocyte-DC lineage. In summary, similarly to ACE2, CD209 may be associated with smoking history in different cell types as well.
We didn't observe the expression of CLEC4M in smokers, quitters or non-smokers in current analyzed dataset. And, CD209 and CLEC4M were not obviously expressed in human bronchial epithelial cells (Fig.S1).

Discussion
In this study, we investigated disparities of four factors including gender, age, race and tobacco- Besides, we also observed significant higher gene expression of DC-SIGN in population ages 60 and above. Together with the aging immune system and organ health, this may lead to the higher COVID-19 severity in elderly patients 20,29 . No significant disparities in the gene expression of these three receptors were found between gender groups (male vs female).
Similar with our previous study on ACE2 3  In this study, we observed disparities of age, race and tobacco use in gene expression of three receptors of SARS-CoV. This provides hints on possible SARS-CoV-2 infection pathogenicity and risk factors. Developing conspiracy theories based on unilaterally interpretation of this study is wrong and unwise.

Ethical oversight
There is no direct involvement of human subjects in this study. All the data use existing deidentified biological samples and data from prior studies. Therefore, ethical oversight and patient consent were not handled in this project. A, B and C shows groups in age (>60 vs <60), gender (male vs female) and smoking (nonsmoking vs smoking). The color shows fold change comparing groups, while the size indicates the -log10 (p-value). The significant difference was indicated by red circles surrounding dots. The figure shows groups of never-smoker/non-smoker, reformed/former smoker and current smoker. TCGA dataset has more categories of smoking history, including never-smoker, smoker reformed more than 15 years, smoker reformed less than 15 years and current smoker.