1 2019-nCoV : A possible progenitor for SARS-CoV with bat origin ?

2019-nCoV: A possible progenitor for SARS-CoV with bat origin?   Yongchang Xu1,2,4, Leyi Wang3, Xu Jia2, Youjun Feng1,2,4*    1 Department of Pathogen Biology & Microbiology, and Department of General  Intensive Care Unit of the Second Affiliated Hospital, Zhejiang University School of  Medicine, Hangzhou, Zhejiang 310058, China  2 Non-coding RNA and Drug Discovery Key Laboratory of Sichuan Province,  Chengdu Medical College, Chengdu, Sichuan 610500, China  3 Department of Veterinary Clinical Medicine and the Veterinary Diagnostic  Laboratory, College of Veterinary Medicine, University of Illinois, Urbana, IL, 61802,  USA  4 College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang 310058,  China   # these authors contribute equally to this work.  *Correspondence: Youjun Feng (fengyj@zju.edu.cn)  Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 March 2020 doi:10.20944/preprints202003.0159.v1

It seems likely that the novel CoV initially found in Wuhan City of Hubei Province, China, 2019 (initially called 2019-nCoV, and then formally named COVID- 19 by WHO) represents the 7 th human CoV pathogen 15,16 . The clinical feature of those patients infected with 2019-nCoV refers to atypical viral pneumonia with abnormal findings on chest computed tomography (CT), implicating acute respiratory distress syndrome 17,18 . Epidemiological studies reveal that certain inpatients with unknown pneumonia consistently display the history of exposure to Huanan seafood wholesale wet market in Wuhan City, China 15,16,19 , implying the possibility of its wild animal-to-human transmission at the initial stage. Subsequently, a large-scale retrospective study of 425-confirmed human cases including familial clusters of pneumonia 18,20 , underlines that 2019-nCoV also acquires an ability of person-toperson transmission 15  Wuhan City alone, as of Jan 25, 2020. Therefore, this unexpected outbreak of acute respiratory illness is due to the emergence of a novel CoV, 2019-nCoV of global health concern 16,19,22 . To prevent its further spread worldwide, this requires close monitoring, effective interference strategies, and development of promising vaccines (and/or anti-viral therapies) against this ongoing, and quickly-devastating challenge with 2019-nCoV 19,22 .
Despite that clinical and genomic aspects of 2019-nCoV outbreaks are increasingly clear, the evolutionary relationship of 2019-nCoV to SARS-CoV remains fragmentarily understood. In this paper, we aim to close this gap of knowledge. We report integrated evidence (ranging from viral genomics, phylogeny, to structural biology) that 2019-nCoV acts as a possible progenitor of SARS-CoV, the causative of 2002-03 pandemic of acute respiratory disease worldwide.

Structural Modeling and Analysis
In addition to cellular ACE2 protein 23 , viral proteins of SARS-CoV with known structures used in this study, included S protein 23 , fusion core 24 , and 3CL 25, 26 . The

Genomic Insights into 2019-nCoV
The availability of viral genomes renders it possible to determine origin, evolution, and pathogenesis of 2019-nCoV, this deadly agent of atypical pneumonia.

Phylogeny of 2019-nCoV
To address its evolutionary origin or placement, we construct phylogenetic tree of 2019-nCoV isolates using the complete genome sequences within β-CoVs.
As for 2019-nCoV, 44 representative isolates are included, 12 of which are from sporadic cases abroad exported from Wuhan City, China, and the remaining 32 isolates are collected from patients distributed in different cities of China. Maximum likelihood-based phylogeny reveals that i) all the isolates of 2019-nCoV outbreak are almost identical, and clustered into the same subclade, termed Subclade I (Fig. 2); ii) three bat SARS-like CoVs are closely-related cousins of Subclade I, namely β-CoV/bat/Yunnan/RaTG13/2013, bat-SL-CoVZC45, and bat-SL-CoVZXC21 (Fig. 2); and iii) Subclade II, a neighboring cluster of Subclade I, is majorly consisted of human/civet SARS-CoVs and bat SARS-like CoVs (Fig. 2). Intriguingly, the aforementioned 3 bat SARS-like versions bridge Subclade I and Subclade II. This raises a possibility that Subclade I, is phylogenetically placed in the position of an ancestor/progenitor for Subclade II, rather than in paralleled evolution.
To further consolidate this hypothesis, we also performed phylogenetic analyses with two evolutionarily conserved proteins (S protein and nucleoprotein (N)). As expected, similar scenarios are also seen in phylogeny of both S protein ( Fig. S3) and N protein (Fig. S4)

Entry of 2019-nCoV by S Protein
The surface S glycoprotein of 2019-nCoV plays critical roles in successful entry into the infected host cells. This is dependent on efficient binding of 2019-nCoV S to angiotensin converting enzyme 2 (ACE2) receptor, followed by the formation of membrane fusion between CoV and host (Fig. S5). In general, the S protein of 2019-nCoV consists of two distinct domains (S1 and S2, Fig. S2). As predicted, a putative receptor-binding domain (RBD, 331-583aa) is also detected in C-terminus of S1 protein (its counterpart of SARS-CoV locates at the position 318-569aa).
Re-analysis of cryo-EM structure of SARS-CoV RBD and its ACE2 receptor (PDB: 6ACG) allows us to illustrate a binding interface comprising two motifs (motif I & II, Fig. 3A). Obviously, binding residues of SARS-CoV RBM we propose here  Fig. 3C]. Structural modeling of 2019-nCoV RBD/ACE2 complex also gives two unique binding motifs (Fig. 3D): i) N487 of RBM interacts with Q24 and Y83 of ACE2 in motif I (Fig. 3E); ii) though that the contact residues of ACE2 remains intact, the residues of RBM are changed into Y421, G496, T500 & Q506 (Fig. 3F). Of note, the two binding-residues (D38 and K353) of ACE2 are consistently revealed in our analysis along with other reports 31,32 . The variable snapshots on RBM/ACE2 are in part (if not all) due to the different structural templates we utilized 23, 31, 32 . However, it might constitute a relatively comprehensive picture on receptor recognition by 2019-nCoV S protein.
In addition, S2 region of S protein contains two typical motifs of heptad repeat region: HR1 (920-970aa), and HR2 (1163-1202aa) in front of its transmembrane region (Fig. S2). Obviously, such organization of motifs are quite conserved across bat SARS-like CoVs (Figs 4A and S2). The two HR motifs of CoVs participate in formation of membrane fusion core with coiled-coil structure (Figs 4B-F). Not surprisingly, fusion core of 2019-nCoV (Fig. 4C) is structurally similar to those of SARS-CoV (Fig. 4B) and the other bat SARS-like CoV (Figs 4D-F). Thus, S protein of 2019-nCoV is a paradigmatic type I viral fusion protein, assuring its efficient entry into host cells.

Structural Similarity of SARS-CoV and 2019-nCoV 3CL
The chymotrypsin-like protease (3CL pro ), also called main protease (M pro ), is a prevalent nonstructural protein that mediates the proteolytic processing of viral replicase polypeptides, pp1a (486 kDa) and pp1ab (790 kDa) across different species of CoV. Unlike that papain-like protease 2 (PL2 pro ) cleaves three sites [33][34][35][36][37] , the paradigm 3CL (~34kDa) of SARS-CoV recognizes 11 cleavage sites, giving numbers of functional enzymes, like RNA-dependent RNA polymerase 25, 26 . The maturation of 3CL is dependent on its auto-processing at both N-terminal and Cterminal sites 25 . Like that of SARS-CoV (Fig. 5A), a 3CL Pro homologue is predicted within the polypeptide ORF1a/b of 2019-nCoV (Fig. S2), and supposed to be excised by its own auto-cleavage ability 38 . Sequence alignment indicates that the 3CL enzymes have 98%~99% similarity each other, with the limited substitution of only ~10 residues (Fig. 5A). In relative to the prototypical version (H41 and C145) of SARS-CoV 26 , an evolutionarily-conserved catalytic dyad (H41 and C145) is also examined in 2019-nCoV 3CL (Fig. 5A). X-ray crystal structure (PDB: 1Z1I) allows us to define that SARS-CoV 3CL is composed of three unique domains 24 . Among them, the former two domains (Domain I & II) are structurally featuring with anti-parallel βbarrels and represent the reminiscent of trypsin-like serine proteases (Fig. 5B). By contrast, the last one (Domain III) that is linked to Domain II, seems to display a globular structure comprising five α-helices (Fig. 5B). A similar scenario is also seen in the overall structure of 2019-nCoV 3CL (PDB: 6LU7). More importantly, the fusion inhibitors targeting HR1 domain is a promising therapy against the 2019-nCoV infections 39 . Such almost-identical structural snapshots of 3CL proteases validate their functional identity (Fig. 5C), highlighting the bat origin shared by SARS-CoV and 2019-nCoV in the context of viral evolution.

Conclusion
As of preparing this manuscript, WHO(https://www.who.int/) renamed the        The putative residues of ACE2 indicated with blue letters, are implicated into its RBD binding. Three ACE2 proteins used here are sampled from bats, civets, and humans (homo sapiens).