Single cell analysis of ACE2 expression reveals the potential targets for 2019-nCoV

ACE2, the putative receptor for the novel coronavirus (2019-nCoV), played an important role in cell entry of 2019-nCoV. However, it is not yet clear what cell types within the human body express ACE2. Here, a systematic analysis was undertaken using published single cell datasets. In total, our study analyzed 229652 cells, from five different organs, derived from 88 donors. The top ACE2 expressing cells include proximal tubule cells in the kidney and enterocytes in the intestine. Other major ACE2 expressing cells in the kidney include podocytes, intercalated cells and endothelial cells. Our results offer a comprehensive atlas of ACE2 expression at the single cell level and unravel the enormous potential targets of 2019-nCoV infection beyond the lung. cell types a gene of and extent this gene is Our study aims to undertake a comprehensive analysis of ACE2 expression at the single cell level using publicly available datasets and shed light on the potential mechanisms for the cell entry and pathogenicity of 2019-nCoV. In total, our study analyzed 229652 single cells, from five different organs, derived from 88 donors. Our results offer a comprehensive atlas of ACE2 expression at the single cell level and unravel the enormous potential targets of 2019-nCoV infection beyond the lung. at both physiological and pathological circumstances.


Introduction
Recently, a novel coronavirus (2019-nCoV) outbreak has claimed around 40261 cases of infection and 909 cases of deaths until the preparation of our manuscript. Spike proteins of the SARS coronaviruses associate with angiotensin-converting enzyme 2 (ACE2) to mediate cell entry [1]. The putative progenitor virus of SARS coronavirus was isolated from bat fecal samples and was reported to harness ACE2 from human, civet and Chinese horseshoe bat for cell entry [2]. Analysis of the sequence of 2019-nCoV receptor-binding domain (RBD) revealed that 2019-nCoV and SARS shared sequence similarity in RBD and ACE2 might function as the receptor for cell entry by 2019-nCoV [3]. Isolation and co-localization analysis also suggested 2019-nCoV used ACE2 for cell entry [4]. However, a systematic analysis of ACE2 expression in different organs has not yet been performed.
In the past decade, we have witnessed the transformation of biomedical sciences by single cell analysis [5,6]. One notable example is the application of single cell RNA-seq to revisit the century old questions in developmental biology and cancer biology. To understand the cellular composition of the human body at the single cell level, the Human Cell Atlas Project was initiated in Oct. of 2016 [7]. Numerous single cell experiments have been performed to profile cells from different organs of the human body, and those massive datasets were made available publicly. Thus, the time is ripe now to ask what cell types express a gene of interest and to what extent this gene is expressed.
Our study aims to undertake a comprehensive analysis of ACE2 expression at the single cell level using publicly available datasets and shed light on the potential mechanisms for the cell entry and pathogenicity of 2019-nCoV. In total, our study analyzed 229652 single cells, from five different organs, derived from 88 donors. Our results offer a comprehensive atlas of ACE2 expression at the single cell level and unravel the enormous potential targets of 2019-nCoV infection beyond the lung.

Result
First, ACE2 expression was analyzed in the lung, the major organ affected by the 2019-nCoV. 57020 cells from five donors were used for the analysis [8]. Three compartments were recovered in the dataset, including epithelial, immune and stromal compartments ( Figure 1A). ACE2 was mainly expressed in the epithelial compartment, with around 1.7% Type II alveolar cells expressing ACE2 ( Figure 1B). Very few fibroblasts (13/4624) also expressed ACE2 (Supplementary Figure 1).
The expression of ACE2 was investigated in the liver using a dataset generated with mCEL-Seq2 [9]. In total, eight distinct cell types were recovered, including hepatocyte, cholangiocyte, B cell, NK+T cell, Kupffer cell, monocyte, hepatic stellate cell, liver macrovascular endothelial cell (LVEC) and liver sinusoidal endothelial cell (LSEC). All three distinct compartments were represented: epithelial, immune and stromal compartment (Figure 2A). ACE2 was detected in less than 1% of hepatocytes and cholangiocytes ( Figure 2B).
The intestine was an important organ in the digestive system [10]. One dataset consisting of cells from ileum, colon and rectum was first employed for analysis. In total, 14537 cells were detected, including TA cell, stem cell , progenitor cell, Paneth like cell, goblet cell, enterocyte and Enteroendocrine cell ( Figure 2C). The cell annotation from the original study was used. ACE2 was expressed in enterocytes, Paneth like cells, Enteroendocrine cells and goblet cells. Using publicly available data at the Human Protein Atlas [11,12], the expression of ACE2 at the protein level was confirmed (Supplementary

Figure 4).
To obtain a glimpse of ACE2 expression in the central nervous system, a dataset of single cell transcriptomes derived from human prefrontal cortex was used [13]. In total, 80660 cells from 48 donors with varying degrees of AD pathology were analyzed. The cell types were annotated with classical cell markers ( Figure 3A). Eight cell types were identified: Excitatory neuron, inhibitory neuron, oligodendrocyte progenitor, oligodendrocyte, microglia, astrocyte, pericyte, endothelial cell. ACE2 was detected in around 0.2% of cells in the prefrontal cortex ( Figure 3B, 3C).
Next the expression of ACE2 was analyzed in the kidney using a recently generated human kidney cell atlas [14]. In total, 40268 cells from mature kidneys and 27203 cells from fetal kidneys were included in the atlas (  We then asked what distinguished ACE2 positive cells from ACE2 negative cells.
The human kidney cell atlas data was employed in the analysis as ACE2 positive cells constituted a significant portion of the total population in both mature and fetal kidneys.
Differentially expressed genes were identified. The top genes significantly upregulated in ACE2 positive cells in the mature kidney were enriched for regulated exocytosis, protein processing in endoplasmic reticulum ( Figure 5A). The protein-protein interaction network was constructed and MCODE components were identified, revealing the potential biological functions enriched: antigen processing and presentation of endogenous antigen, transport to the Golgi and subsequent modification, response to ER stress and protein folding ( Figure 5B, 5C). Those results revealed the potential mechanisms for coronavirus to hijack ACE2 positive cells for viral infection and propagation.

Datasets
Lung cell atlas dataset was available through the Human Cell Atlas Data Coordination Platform and NCBI BIOPROJECT (accession code PRJEB31843). Liver cell atlas dataset was downloaded from GEO (GSE124395). Precortex cell atlas was downloaded from Synapse (syn18485175). The intestine cell atlas dataset was downloaded from GEO (GSE125970). The kidney cell atlas dataset was downloaded from the data portal of the Human Cell Atlas project ( www.kidneycellatlas.org.).

Data pre-processing
Seurat and Scanpy were used to analyze the single cell datasets. In general, a quality control step was undertaken to remove low-quality cells with minimal number of genes detected, maximal number of genes detected, minimal number of cells in which the gene was detected. Second, a normalization step was performed to scale the gene-cell matrix.
The top 2000 highly variable genes were identified by the FindVariableGenes function.

Cell clustering and visualization
Seurat and Scanpy were used to analyze the single cell datasets. In general, a quality control step was undertaken to remove low-quality cells with minimal number of genes detected, maximal number of genes detected, minimal number of cells in which the gene was detected. Second, a normalization step was performed to scale the gene-cell matrix.
In most cases, the original cell type annotations were used as previously published.

Gene list analysis
Gene list analysis was performed with metascape [15]. Differentially expressed gene list was input into the online tool and analyzed using default parameters. Statistically enriched terms were identified and filtered. Remaining significant terms were then hierarchically clustered into a tree based on Kappa-statistical similarities among their gene memberships.

Protein-protein interaction network
All protein-protein interactions (PPI) among the input gene list were extracted from PPI databases and formed a PPI network. MCODE components were identified from the merged network. Each MCODE network is assigned a unique color. GO enrichment analysis was applied to the original PPI network and its MCODE network components to assign biological "meanings", where top three best p-value terms were retained. All input gene lists were also merged into one list and resulted in a PPI network.

Discussion
To our knowledge, this study is the first comprehensive profiling of ACE2 expression in the human body across different systems. The expression of ACE2 was analyzed in the respiratory system, the digestive system, the urinary system, the nervous system. Our result suggested that various cell types expressed ACE2 at different level, rendering them potential targets of 2019-nCoV infection. Our study further extends previous attempts to analyze ACE2 expression at the single cell level using publicly available datasets [16,17].
Besides fever and cough, less common symptoms of 2019-nCoV associated pneumonia also included sputum production, headache, haemoptysis and diarrhoea [18].The first patient diagnosed with 2019-nCoV infection in the United States presented dry cough, nausea and vomitting, loose stool very early in the disease progression, suggesting that the digestive system might be affected directly by viral infection [19]. The finding that ACE2 is highly expressed in several cell types within the kidney suggests the kidney might also be an important target of the 2019-nCoV infection.
Cell-to-cell variability is a common biological phenomenon. On one hand, single cell analysis enabled an unprecedented resolution for us to analyze gene expression and thus revealed information that could be potentially masked by population averaged measurements. On the other hand, viral infection might be influenced by population context, leading to a more complex picture of viral infection in vivo [20]. Patient-to-patient variability was observed in the pneumonia associated with 2019-nCoV.
Around 10% of patients presented as critically ill patients, with failures in multiple organs.
It is yet to be determined whether organ failures are also associated with viral infections of affected organs.
Our study served as a proof-of-concept that mining the publicly available datasets generated by the Human Cell Atlas is a powerful means to deepen our understanding of the human body at both physiological and pathological circumstances.