PreprintArticleVersion 1Preserved in Portico This version is not peer-reviewed
Evolutionary Origin of SARS-CoV-2 (COVID-19 Virus) and SARS Viruses through the Identification of Novel Protein/DNA Sequence Features Specific for Different Clades of Sarbecoviruses
Version 1
: Received: 13 June 2020 / Approved: 14 June 2020 / Online: 14 June 2020 (04:09:39 CEST)
Version 2
: Received: 25 August 2020 / Approved: 26 August 2020 / Online: 26 August 2020 (10:17:16 CEST)
Khadka, B.; Gupta, R. S. Conserved Molecular Signatures in the Spike Protein Provide Evidence Indicating the Origin of SARS-CoV-2 and a Pangolin-CoV (MP789) by Recombination(s) between Specific Lineages of Sarbecoviruses. PeerJ, 2021, 9, e12434. https://doi.org/10.7717/peerj.12434.
Khadka, B.; Gupta, R. S. Conserved Molecular Signatures in the Spike Protein Provide Evidence Indicating the Origin of SARS-CoV-2 and a Pangolin-CoV (MP789) by Recombination(s) between Specific Lineages of Sarbecoviruses. PeerJ, 2021, 9, e12434. https://doi.org/10.7717/peerj.12434.
Khadka, B.; Gupta, R. S. Conserved Molecular Signatures in the Spike Protein Provide Evidence Indicating the Origin of SARS-CoV-2 and a Pangolin-CoV (MP789) by Recombination(s) between Specific Lineages of Sarbecoviruses. PeerJ, 2021, 9, e12434. https://doi.org/10.7717/peerj.12434.
Khadka, B.; Gupta, R. S. Conserved Molecular Signatures in the Spike Protein Provide Evidence Indicating the Origin of SARS-CoV-2 and a Pangolin-CoV (MP789) by Recombination(s) between Specific Lineages of Sarbecoviruses. PeerJ, 2021, 9, e12434. https://doi.org/10.7717/peerj.12434.
Abstract
Both SARS-CoV-2 (COVID-19) and SARS coronaviruses (CoVs) are members of the subgenus Sarbecovirus. To understand the origin of SARS-CoV-2 and its relation to other viruses, protein sequences from sarbecoviruses were analyzed to identify conserved inserts or deletions (termed CSIs) demarcating either particular clusters/lineages of sarbecoviruses or those shared by specific lineages shedding light on their interrelationships. We report several clade-specific CSIs in the spike (S) and nucleocapsid (N) proteins that reliably demarcate distinct sarbecoviruses clades providing important insights into the origin and evolution of SARS-CoV-2. Two CSIs in the N-terminal domain (NTD) of S-protein are uniquely shared by SARS-CoV-2, BatCoV-RaTG13 and most pangolin CoVs (SARS-CoV-2r cluster); another CSI supports a closer relationship of SARS-CoV-2 to BatCov-RaTG13. Three additional CSIs in the NTD are specific for two Bat-SARS-like CoVs (viz. CoVZXC21 and CoVZC45; CoVZC cluster) which form an outgroup of the SARS-CoV-2r cluster. Interestingly, one of the pangolin-CoV-MP789 also shares these CSIs but lack the CSIs specific for the SARS-CoV-2r cluster. The N-terminal sequence (aa 1-320) of the S-protein for pangolin-CoV-MP789 shows highest similarity (85.94%) to the CoVZC cluster, while its C-terminal region including the receptor binding domain (RBD) is most similar (97-98% identity) to the SARS-CoV-2 virus. These observations indicate that the spike protein sequence for the strain MP789 is of chimeric origin. Multiple CSIs described here also distinguish two bat SARS-CoVs strains (BM48-31/BGR/2008 and SARS_BtKY72) from all others. Our work also clarifies that two large CSIs (5 aa and 13 aa) found in the RBD of S-protein are mainly specific for the SARS and SARS-CoV-2r clusters of CoVs. The surface loops formed by these CSIs are predicted to be important in the binding of S-protein with the human ACE-2 receptor. Lastly, we have mapped the locations of different CSIs in the structure of the S-protein. These studies reveal that the three CSIs specific for the SARS-CoV-2r cluster form distinct surface-exposed loops/patches on the S-protein. As the surface-exposed loops play important roles in mediating novel interactions, the novel lobes/patches formed by the SARS-CoV-2-specific CSIs in the spike protein are predicted to play important roles in the interaction of this protein with other surface-exposed components in the host cells thereby enhancing the binding/infectivity of this virus to humans.
Keywords
conserved signature indels specific for SARS and SARS-CoV-2 viruses; DNA and Protein markers distinguishing different clades of Sarbecoviruses; evolutionary origin of SARS and SARS-CoV-2 viruses
Subject
Biology and Life Sciences, Virology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.