Preserved in Portico This version is not peer-reviewed
Evolutionary Origin of SARS-CoV-2 (COVID-19 Virus) and SARS Viruses through the Identification of Novel Protein/DNA Sequence Features Specific for Different Clades of Sarbecoviruses
: Received: 13 June 2020 / Approved: 14 June 2020 / Online: 14 June 2020 (04:09:39 CEST)
: Received: 25 August 2020 / Approved: 26 August 2020 / Online: 26 August 2020 (10:17:16 CEST)
A peer-reviewed article of this Preprint also exists.
Journal reference: PeerJ 9 2021
Both SARS-CoV-2 (COVID-19) and SARS coronaviruses (CoVs) are members of the subgenus Sarbecovirus. To understand the origin of SARS-CoV-2 and its relation to other viruses, protein sequences from sarbecoviruses were analyzed to identify conserved inserts or deletions (termed CSIs) demarcating either particular clusters/lineages of sarbecoviruses or those shared by specific lineages shedding light on their interrelationships. We report several clade-specific CSIs in the spike (S) and nucleocapsid (N) proteins that reliably demarcate distinct sarbecoviruses clades providing important insights into the origin and evolution of SARS-CoV-2. Two CSIs in the N-terminal domain (NTD) of S-protein are uniquely shared by SARS-CoV-2, BatCoV-RaTG13 and most pangolin CoVs (SARS-CoV-2r cluster); another CSI supports a closer relationship of SARS-CoV-2 to BatCov-RaTG13. Three additional CSIs in the NTD are specific for two Bat-SARS-like CoVs (viz. CoVZXC21 and CoVZC45; CoVZC cluster) which form an outgroup of the SARS-CoV-2r cluster. Interestingly, one of the pangolin-CoV-MP789 also shares these CSIs but lack the CSIs specific for the SARS-CoV-2r cluster. The N-terminal sequence (aa 1-320) of the S-protein for pangolin-CoV-MP789 shows highest similarity (85.94%) to the CoVZC cluster, while its C-terminal region including the receptor binding domain (RBD) is most similar (97-98% identity) to the SARS-CoV-2 virus. These observations indicate that the spike protein sequence for the strain MP789 is of chimeric origin. Multiple CSIs described here also distinguish two bat SARS-CoVs strains (BM48-31/BGR/2008 and SARS_BtKY72) from all others. Our work also clarifies that two large CSIs (5 aa and 13 aa) found in the RBD of S-protein are mainly specific for the SARS and SARS-CoV-2r clusters of CoVs. The surface loops formed by these CSIs are predicted to be important in the binding of S-protein with the human ACE-2 receptor. Lastly, we have mapped the locations of different CSIs in the structure of the S-protein. These studies reveal that the three CSIs specific for the SARS-CoV-2r cluster form distinct surface-exposed loops/patches on the S-protein. As the surface-exposed loops play important roles in mediating novel interactions, the novel lobes/patches formed by the SARS-CoV-2-specific CSIs in the spike protein are predicted to play important roles in the interaction of this protein with other surface-exposed components in the host cells thereby enhancing the binding/infectivity of this virus to humans.
conserved signature indels specific for SARS and SARS-CoV-2 viruses; DNA and Protein markers distinguishing different clades of Sarbecoviruses; evolutionary origin of SARS and SARS-CoV-2 viruses
LIFE SCIENCES, Virology
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.