Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Evolutionary Origin of SARS-CoV-2 (COVID-19 Virus) and SARS Viruses through the Identification of Novel Protein/DNA Sequence Features Specific for Different Clades of Sarbecoviruses

Version 1 : Received: 13 June 2020 / Approved: 14 June 2020 / Online: 14 June 2020 (04:09:39 CEST)
Version 2 : Received: 25 August 2020 / Approved: 26 August 2020 / Online: 26 August 2020 (10:17:16 CEST)

A peer-reviewed article of this Preprint also exists.

Journal reference: PeerJ 9 2021
DOI: 10.7717/peerj.12434


Both SARS-CoV-2 (COVID-19) and SARS coronaviruses (CoVs) are members of the subgenus Sarbecovirus. To understand the origin of SARS-CoV-2 and its relation to other viruses, protein sequences from sarbecoviruses were analyzed to identify conserved inserts or deletions (termed CSIs) demarcating either particular clusters/lineages of sarbecoviruses or those shared by specific lineages shedding light on their interrelationships. We report several clade-specific CSIs in the spike (S) and nucleocapsid (N) proteins that reliably demarcate distinct sarbecoviruses clades providing important insights into the origin and evolution of SARS-CoV-2. Two CSIs in the N-terminal domain (NTD) of S-protein are uniquely shared by SARS-CoV-2, BatCoV-RaTG13 and most pangolin CoVs (SARS-CoV-2r cluster); another CSI supports a closer relationship of SARS-CoV-2 to BatCov-RaTG13. Three additional CSIs in the NTD are specific for two Bat-SARS-like CoVs (viz. CoVZXC21 and CoVZC45; CoVZC cluster) which form an outgroup of the SARS-CoV-2r cluster. Interestingly, one of the pangolin-CoV-MP789 also shares these CSIs but lack the CSIs specific for the SARS-CoV-2r cluster. The N-terminal sequence (aa 1-320) of the S-protein for pangolin-CoV-MP789 shows highest similarity (85.94%) to the CoVZC cluster, while its C-terminal region including the receptor binding domain (RBD) is most similar (97-98% identity) to the SARS-CoV-2 virus. These observations indicate that the spike protein sequence for the strain MP789 is of chimeric origin. Multiple CSIs described here also distinguish two bat SARS-CoVs strains (BM48-31/BGR/2008 and SARS_BtKY72) from all others. Our work also clarifies that two large CSIs (5 aa and 13 aa) found in the RBD of S-protein are mainly specific for the SARS and SARS-CoV-2r clusters of CoVs. The surface loops formed by these CSIs are predicted to be important in the binding of S-protein with the human ACE-2 receptor. Lastly, we have mapped the locations of different CSIs in the structure of the S-protein. These studies reveal that the three CSIs specific for the SARS-CoV-2r cluster form distinct surface-exposed loops/patches on the S-protein. As the surface-exposed loops play important roles in mediating novel interactions, the novel lobes/patches formed by the SARS-CoV-2-specific CSIs in the spike protein are predicted to play important roles in the interaction of this protein with other surface-exposed components in the host cells thereby enhancing the binding/infectivity of this virus to humans.


conserved signature indels specific for SARS and SARS-CoV-2 viruses; DNA and Protein markers distinguishing different clades of Sarbecoviruses; evolutionary origin of SARS and SARS-CoV-2 viruses



Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.

We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.