The small intestine, an underestimated site of SARS-CoV-2 infection: from Red Queen effect to probiotics

Acknowledgment statement: This study was supported by the National Natural Science Foundation of China (81903875). The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from: the GTEx Portal on 3/3/2020. Abstract Understanding how the coronaviruses invade our body is an essential point, and the expression profile of coronaviruses receptor may help us to find where the coronavirus infects our body. We found that the coronavirus receptors, including angiotensin-converting enzyme 2 (ACE2) for SARS-CoV and SARS-Cov-2, are digestion-related enzymes in human enterocytes. Coronaviruses are continually altering the binding receptor and binding modes during their evolution, but the potential target cell in the small intestine is constant when in the lung is inconstant. Enterocytes may act as a conserved cell reservoir for coronaviruses, which may be partially explained by the Red Queen hypothesis. We also found that coronaviruses receptors could be elevated in the presence of both invasive bacteria and their counterpart, probiotics. We demonstrated here that enterocytes act as a conserved cell reservoir for coronaviruses during their evolutions, which should not be ignored in the investigation of coronavirus diagnosis and treatment strategies.


Introduction
An outbreak of 2019 novel coronavirus diseases  caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is gradually becoming a crisis of global public health. Understanding how the coronavirus invades our body is an essential point to find the clue to confront the current crisis. Identification of susceptible cell that holds a receptor that the virus can bind to may help us to find where the coronavirus infects our body. These receptors provide the binding site for viruses for attachment to and fuse with the cell membrane [1]. As long as the inhibition of receptor-virus binding, the virus fails to infect host cells. Therefore, identification of host receptors that the coronaviruses bind to as well as the distribution of these receptors may benefit the investigation of diagnosis and treatment strategies. For example, severe acute respiratory syndrome coronavirus (SARS-CoV) was reported to attach to angiotensin-converting enzyme 2 (ACE2) and then enter host cells. The neutralized antibodies m396 and S230.15 that competed with ACE2 were under development as anti-SARS-CoV drugs [2]. Since SARS-CoV-2 shares sequence similarities with SARS-CoV, it is not surprising that SARS-CoV-2 also uses ACE2 to enter host cells [3]. ACE2, the receptor for NL63-CoV, SARS-CoV and SARS-CoV-2, was highly expressed in human airway epithelia as well as lung parenchyma [4] [5]. However, we recently found that ACE2 is highly expressed in enterocytes in the mouse small intestine, indicating that small intestine is an underestimated site of SARS-CoV-2 infection [6]. Interestingly, two receptors for other coronaviruses, dipeptidyl peptidase-4 (DPP4) for MERS-CoV, and alanine aminopeptidase (ANPEP) for HCoV-229E, are also highly expressed in the same type of cells, enterocytes, in the mouse small intestine. In this way, we are curious whether these coronaviruses target and infect the same host cells in human.
According to scRNA-Seq analysis, we found that the coronavirus receptors, angiotensin-converting enzyme 2 (ACE2) for SARS-CoV and SARS-Cov-2, dipeptidyl peptidase-4 (DPP4) for MERS-CoV, and alanine aminopeptidase (ANPEP) for HCoV-229E, was highly expressed in human enterocytes. These coronaviruses have common ancestors. The different ancient host environments might initiate the ancestor coronavirus divergent evolution; however, these current coronaviruses still target the same host cells in the small intestine. Do these coronaviruses select enterocytes or specific receptors? Moreover, are these receptors highly expressed in enterocytes by coincidence? Binding modes of virus receptor binding domains (RBD) and their corresponding receptors can potentially answer these questions. In our structural analysis, the binding modes vary with the change in RBD sequence, which disproved that the attraction of enterocyte results from the structurally convergent evolution.
Given that the small intestine is an organ closely related to the microbiome, microbiota may also change the expression of coronaviruses receptors. In the small intestine infected with Salmonella Enterica, the number of enterocytes with high expression of coronaviruses receptors was boosted, may result in the elevated accessibility of coronaviruses to enterocytes. In the COVID-19 treatment, probiotics were widely used for diarrhoea patients. However, based on our preliminary analysis, we found that probiotics such as Segmented Filamentous Bacteria may elevate the expression of coronaviruses receptors in the murine small intestine.

The potential target cells of coronaviruses are conserved in the small intestine
We analyzed the public single cells sequencing data of mouse cell atlas, human ileum and lung, and the mRNA expression profiling of different human organs derived from the Genotype-Tissue Expression (GTEx) database. Intriguingly, we found that the "co-expression" phenomenon only exists in the small intestine, rather than the lung and other organs. Both the scRNA-seq data of mouse cell atlas and the mRNA expression profiling of GTEx database consistently showed that ACE2, DPP4 and ANEPE are all highly expressed in the small intestine (Extended Data Fig. 1, Extended Data Fig. 2) [7].
Depends on the reanalysis of the public scRNA-seq data in the lung (GSE135893) and the small intestine (GSE125970 and GSE134809), it was indicated that the potential target cells of coronaviruses continually changing in the lung, highly expressed ACE2 in AT2 cells, highly expressed DPP4 in AT2 cells and T cells, and highly expressed ANPEP in macrophages (Fig. 1c), but conserved in the small intestine. Consistently with the murine expression profiling of single-cell RNAseq in our previous study, single-cell RNAseq profiling of human ileum epithelial cells are also highly overlapped (Fig. 1a). The phenomenon of highly overlapped expression profiling existed in both epithelial cells and immune cells in the small intestine. The majority receptors for coronavirus entry were highly expressed in enterocytes in epithelial cells and mononuclear phagocytes, stromal cells and glial cells in immune cells (Fig. 1b) [8]. In the complete view of ileum cells, we summarized here that the majority of coronavirus are likely to infect enterocytes that highly expressed specific receptors for coronavirus binding.
To a certain extent, our finding is consistent with current knowledge of tissue and cellular tropism of coronaviruses. In 2004, Dr KF To and coworkers investigated six fatal cases of SARS using in-situ hybridization (ISH) [9]. It was found that SARS-CoV was tested positively in pneumocytes of the lung and surface enterocytes of the small intestine. In the 6 fatal cases, only 3 cases were positive in pneumocytes, while 4 cases were positive in enterocytes. Furthermore, in the recent work of Dr F Xiao and coworkers, the positive tested SARS-CoV-2 in the mucous tissue of the digestive tract was shown in samples of endoscopic biopsy derived from a patient with SARS-CoV-2 infection [10]. However, only gastric, rectal and duodenal samples were collected in this work, the novel virus detection in samples from the small intestine may be the further work based on the study of fatal cases.

Binding modes of virus receptor binding domains (RBD) and their corresponding receptors share little similarities
To figure out whether this attraction of enterocyte results from the structurally convergent evolution, we analyzed the binding modes of coronavirus RBDs and their corresponding receptors. According to the alignment of RBDs (Fig. 2a), not surprisingly, SARS-CoV and SARS-CoV-2 share high similarities in their RBDs (Fig. 2b), suggesting the possibility for them to share the same receptor ACE2. Although SARS-CoV and SARS-CoV-2 share the particular tyrosine-rich pattern, five critical residues in SARS-CoV RBD that provide its favorable interaction with ACE2 changed in SARS-CoV-2 RBD (Fig. 2c) [11] [12]. These five critical residues, including two at a β-sheet and three at flexible loops, contact with 'virus-binding hotspot' at two helixes and a short loop between two β-sheets in ACE2 (Fig. 2d).
Interestingly, NL63-CoV does not share a highly similar sequence with SARS-CoV, but it also targets to the similar 'virus-binding hotspot' in ACE2. Unlike SARS-CoV RBD, NL63-CoV RBD provides a typical three loops as receptor binding motif (RBM) for ACE2 binding (Fig. 2e) [13]. In comparison, MERS-CoV and HCoV-229E share low sequence similarities in their RBDs and target to different receptors. Unlike SARS-CoV RBD, MERS-CoV RBD contacts its receptor DPP4 with two patches. Patch 1 loop contacts blade 4 from DPP4, whereas patch 2 loop contacts a short helix between blades 4 and 5 of DPP4 (Fig. 2f) [14]. HCoV-229E, as another ɑ-coronavirus, forms three loops to contact a long β-sheet and a short helix in its receptor ANPEP (Fig. 2g) [15]. Taken together, regardless of sharing the same or different target receptors, the interactions between these co-expressed receptors and their corresponding RBDs share little similarities in their binding modes. In this way, infecting to the same enterocytes, the coronaviruses may go through functionally convergent evolution. The ancestor coronavirus likely diverged independently, but the attraction of enterocytes might act as the driving force for the subsequent convergent evolution.

The expression of coronaviruses receptors in the presence of Salmonella Enterica or Segmented Filamentous Bacteria
The expression level of ACE2, DPP4 and ANPEP in the pathophysiological condition such as enteritis may be changed, and it could alter the capability of the virus to enter cells. Through the reanalysis of the public scRNA-seq data (GSE92332) of murine intestine treated Salmonella Enterica (Fig. 3a, Fig. 3b) [16], we found that the cell quantity of enterocytes with highly expressed ACE2, DPP4 and ANPEP dramatically elevated (Fig. 3c). In term of individual enterocytes, the expression level of these exopeptidases was also elevated, but only mildly elevated. The increased number of enterocytes with high expression of these exopeptidases may boost the accessibility of coronaviruses to enterocytes.
Probiotics may be a treatment for enteritis and diarrhea. In Wuhan, probiotics were widely used for patients of SARS-CoV-2 infection with diarrhea. Relying on public RNA-seq data (GSE71734), we explored the expression change of coronavirus receptors in the presence of probiotics [17]. Segmented Filamentous Bacteria (SFB) is a probiotic widely investigated in mouse and proved even prevent and cure rotavirus infection [18]. However, we found that SFB increased expression of coronavirus receptors Ace2 and Anepep rather than decreased (Fig. 4). For Ace2 normalized by Gapdh, the expression in the murine small intestine is 0.7164 in the presence of mouse-SFB, and 0.6023 in the presence of rat-SFB, and 0.3115 in the germ-free mouse, with the statistical significance (p=0.0286) in Kruskal-Wallis test. In other public data [19], Lactobacillus acidophilus and Bacillus clausii also failed to decrease the coronavirus receptors expression in the murine small intestine compared to control and post Salmonella infection (Extended Data Fig. 5).

Discussion
Enterocytes are epithelial cells that line the inner surface of the small intestines. Despite different receptors that viruses can bind to, these coronaviruses, including SARS-CoV-2, enter the small intestine consistently mainly relying on enterocytes. In other words, enterocytes are likely to act as a conserved cell reservoir for coronaviruses during the evolution because the infected cells of these coronaviruses in lung dramatically changed. Moreover, the function of these coronavirus receptors is conserved, as ACE2, DPP4 and ANPEP intriguingly all are exopeptidases, which are enriched in the same type of cells, enterocytes. It may indicate the underestimated role of the small intestine in coronavirus infection, in terms of potential fecal-related spreading and participating in the generation of systematic inflammation storm.
Given that the host cells that viruses infected make them particularly susceptible to the genetic changes that help to drive their evolution, we postulated that the enterocytes play the essential roles in coronavirus evolution. The arms race between the coronavirus and mammalian immune system could be partially explained by the Red Queen hypothesis. It is an evolutional theory in which an organism gains the evolutional advantage by continuously adapt and evolve [20]. However, the version of the Red Queen effect for the coronavirus infection could be different in lung and small intestine. In the lung, the spike protein and target cells are continually changing. The harsh environment in the lung may drive it, as the lung is the original site of inflammation storm. Under this kind of virus evolution, the only virus that can infect enterocytes can survive. It means that the evolved coronavirus should have a receptor-binding domain (RBD) having a high binding affinity with the receptor on the membrane of enterocytes. It is also consistent with our previous hypothesis that the attraction of enterocytes might act as the driving force for the subsequent convergent evolution.
The version of the Red Queen effect driven by the small intestine may follow the underlying mechanism. The selection pressure of the small intestine could be milder than the lung. For instance, in the research of SARS-CoV, it was shown that the gut was intact when diffuse alveolar damage was found in fatal cases [9]. In the milder environment, coronavirus can "shuffle" their genes with other viruses in enterocytes. The shuffled RBD genes are likely to express the RBD that binds to the receptors from enterocytes. In this way, the RBD genes and the binding modes with the spike protein are continuously changing, but it happens in the same type of cells, enterocytes. Taken together, we believed that enterocytes provide a directional selection for coronavirus evolution.
The small intestine is an organ closely related to the microbiome. We analyzed the public murine data related to the small intestine treated by invasive bacteria and their counterpart, probiotics. The murine model infected by Salmonella Enterica mimicked the inflammatory diarrhea. It was not surprising that the ratio of enterocytes was dramatically elevated, as the cell composition rapidly shifted from the stem and progenitor cells to mature enterocytes. However, intriguingly, as a probiotic, which was proved to prevent and cure Rotor viruses induced diarrhea, Segmented Filamentous Bacteria (SFB) elevated expression of ACE2 and ANEPEP in the absence of promoted proliferation of enterocytes, which may theoretically boost the accessibility of coronaviruses to enterocytes. The composition of the enteric microbiome is complicated, and the public data related to probiotics is limited. The function of SFB and other probiotics may still need further investigation. However, we could raise a concern about the utility of probiotics for patients infected with SARS-CoV-2.

Conclusion
Our study would remind healthcare workers of the underestimated site of SARS-CoV-2 infection, the enterocytes in the small intestine. In the COVID-19 treatment, the utility of probiotics may need to be supported by more evidence. We also found that enterocytes act as a conserved cell reservoir for coronaviruses, and the attraction of enterocytes might provide a directional selection for their evolutions. We raise a question of why enterocytes act as a conserved cell reservoir for coronaviruses. Further clarifying this question may facilitate research in the related field.

Single-cell sequencing analysis of human ileum epithelial cells
All the scRNA-seq data in our project were analyzed by R package Seruat (Version 3.1.2). The public scRNA-seq sequencing data (GSE125970) was downloaded from the GEO database. Only cells from ileum were subjected to analysis. Cells with over 25% of counts in mitochondrial genes were filtered out. After PCA analysis by the Seruat function RunPCA, the first 15 PCA components were utilized in the Seruat function FindNeighbors, with k.param of 30. Clusters were figured out by the Seruat function FindClusters, with a resolution of 1.5. Two-dimensional visualization with two-dimensional t-distributed stochastic neighbor embedding (tSNE) was performed. Clusters were identified according to the barcode names, which contained the cell identities defined by the authors of the original paper. We used the functions of FeaturePlot and VlnPlot to illustrate gene expression of coronavirus receptors, ACE2, DPP4, and ANPEP.

Single-cell sequencing analysis of human ileum immune cells
The public scRNA-seq data of human ileum immune cells (GSE134809) were downloaded. The data were generated from Crohn's disease lesions. Only the uninflamed samples were subjected to our analysis. In order to remove the batch effect, data integration was performed following the Seurat integration procedure. The first 15 PCA components were utilized for further two-dimensional t-distributed stochastic neighbor embedding (tSNE). K.param of 10 was used in the function FindNeighbors, and resolution of 1.5 was used in the function FindClusters. The identity of each cluster was defined by the marker genes for each cell type.

Single-cell sequencing analysis of human lung cells
The public scRNA-seq data of human lung cells (GSE135893) were downloaded. It is data of research related to pulmonary fibrosis and contains healthy controls. Only the data from healthy controls were subjected to the subsequent analysis. The first 15 PCA components were utilized for tSNE. We used cell types derived from metadata to define cell identities of our analysis.

Single-cell sequencing analysis of the small intestine infected by Salmonella Enterica
The public scRNA-seq data of the small intestine infected by Salmonella Enterica (GSE92332) were downloaded. The barcode-gene matrix of "GSE92332_SalmHelm_UMIcounts.txt" was utilized for the following analysis. Subsequently, the data of the enteric epithelial cells from Salmonella Enterica infected (48hours) C57BL/6 mouse and the control were subsetted. Data integration of Salmonella Enterica infected cells and control cells was performed following the Seurat integration procedure. K.param of 20 was used in the function FindNeighbors, while a resolution of 1.5 was used in the function FindClusters. The first 15 PCA components were utilized for tSNE. Clusters were identified according to the original barcode names. Wilcoxon rank sum test was used to calculate the differential expression significance of Ace2, Dpp4, and Anpep between enterocytes with or without Salmonella Enterica treatment.

Bulk sequencing analysis of the small intestine treated by Segmented Filamentous Bacteria (SFB)
The public bulk sequencing data of the small intestine treated by SFB (GSE71734) were downloaded. Fragments Per Kilobase of exon model per Million (FPKM) of coronaviruses receptors Ace2, Dpp4, and Anpep, enterocytes markers Mep1a and Ephx2, and internal control Gapdh were extracted. The data were then subjected to and illustrated by Prism (version 8.3.1). The FPKM of Ace2, Dpp4, Anpep, Mep1a, and Ephx2, were normalized by the FPKM of Gapdh, and Kruskal-Wallis test was performed.

Structural analysis
All of the protein structures were obtained from Protein Data Bank. The binding results were graphically presented by using PyMOL Molecular Graphics System version 1.3 (Schrödinger, LLC).  Fig.3). These receptors were highly expressed in epithelial cells, mainly in enterocytes. A small proportion of mononuclear phagocytes, stromal cells, and glial cells also highly expressed coronavirus receptors. c, however, the expression profile of these viruses receptors was inconsistent in the lung (Extended Data Fig.4). AT2 cells and transitional AT2 cells highly expressed ACE2 and DPP4, and a small proportion of T cells also highly expressed DPP4. However, ANEPE highly expressed in macrophages and monocytes rather than AT2 cells or transitional AT2 cells. Red font refers to helix; yellow font refers to sheet; and green font refers to loops. The receptor-binding residues were highlighted with a colored background. b, spike RBDs phylogram of representative coronaviruses. c, RBDs were shown as a molecular surface while the ACE2 binding residues are highlighted with blue mesh. d, structural analysis of human ACE2 recognition by SARS-CoV RBD was shown as a cartoon; e, structural analysis of human ACE2 recognition by NL63-CoV RBD were shown as a cartoon; f, structural analysis of human DPP4 recognition by MERS RBD; and g, structural analysis of human ANPEP recognition by HCoV-229E RBD.

Fig. 3 | The number of enterocytes with highly expressed coronaviruses receptors was boosted after the treatment of Salmonella.
We reanalyzed public scRNA-seq data (GSE92332) in which C57BL/6J mice were treated by Salmonella enterica for 48 hours with control. a, b, we identified different cell subpopulations by the marker genes. c, the number of enterocytes with high expression of coronaviruses receptors were boosted. However, the average expression of Ace2, Dpp4, and ANPEP in enterocytes was only mildly elevated. Wilcoxon rank-sum test was used to calculate the differential expression significance of ACE2, DPP4, and ANPEP between enterocytes with or without Salmonella Enterica treatment. The p-value of the Wilcoxon test for Ace2 is 0.06102, for Dpp4 and Anepep is less than 0.001. performed. We found that that Ace2 and Anpep were elevated in the small intestine with a significant p-value. Fig.1 | Coronaviruses receptors expression in mouse cell atlas. a, cells of different organs were identified in the mouse cell atlas. b, ACE2, DPP4 and ANPEP were coincident with high expression level in the mouse intestine. It was also shown that expression levels in fetal organs are lower than adult organs, which may indicate that vertical transmission of coronaviruses is restricted. c, the small intestine cells and lung cells were subsetted. It was shown that cells with highly expressed coronaviruses receptors were co-localized in the mouse intestine. However, the phenomenon of colocalization did not exist in lung cells. The public RNA-seq data (GSE98353) of murine gut wall tissue in the research of Dr. Palok Aich and co-worders were analyzed. The gene expression of Ace2, Dpp4, and Anpep was analyzed, relying on this public data. It was found that, in some cases, probiotics elevated the expression of coronavirus receptors. In the published work, it has been proved that Lactobacillus acidophilus could attenuate the microbial dysbiosis and inflammation induced by Salmonella infection in C57BL/6 and BALB/c mouse. However, for the receptor of coronaviruses, Ace2, Dpp4, and Anpep, the expression in the small intestine is consistently elevated in the Salmonella infected C57BL/6 mice that rescued by Lactobacillus acidophilus. We can not exclude the effect of small sample size, but it still indicated that Ace2, Dpp4, and Anpep, may still in high expression levels even when the luminal inflammation had been rescued by Lactobacillus acidophilus.