Preprint Article Version 2 Preserved in Portico This version is not peer-reviewed

Conserved Sequence Features in the Spike Protein Provide Evidence Suggesting the Origin of SARS-CoV-2 (COVID-19)-Related Viruses by Recombination between SARS virus and Another Sarbecovirus

Version 1 : Received: 13 June 2020 / Approved: 14 June 2020 / Online: 14 June 2020 (04:09:39 CEST)
Version 2 : Received: 25 August 2020 / Approved: 26 August 2020 / Online: 26 August 2020 (10:17:16 CEST)

A peer-reviewed article of this Preprint also exists.

Khadka, B.; Gupta, R. S. Conserved Molecular Signatures in the Spike Protein Provide Evidence Indicating the Origin of SARS-CoV-2 and a Pangolin-CoV (MP789) by Recombination(s) between Specific Lineages of Sarbecoviruses. PeerJ, 2021, 9, e12434. Khadka, B.; Gupta, R. S. Conserved Molecular Signatures in the Spike Protein Provide Evidence Indicating the Origin of SARS-CoV-2 and a Pangolin-CoV (MP789) by Recombination(s) between Specific Lineages of Sarbecoviruses. PeerJ, 2021, 9, e12434.


Both SARS-CoV-2 (COVID-19) and SARS coronaviruses (CoVs) are members of the subgenus Sarbecovirus. To understand the origin of SARS-CoV-2, protein sequences from sarbecoviruses were analyzed to identify highly-specific molecular markers consisting of conserved inserts or deletions (termed CSIs) in the spike (S) and nucleocapsid (N) proteins that are specific for either particular clusters/lineages of these viruses or are commonly shared by specific lineages. Three novel CSIs in the N-terminal domain of the spike protein S1-subunit (S1-NTD) are uniquely shared by the SARS-CoV-2, BatCoV-RaTG13 and most pangolin CoVs, distinguishing this cluster of viruses (SARS-CoV-2r) from all others. In the same positions, where these CSIs are found, related CSIs are also present in two other sarbecoviruses (viz. CoVZXC21 and CoVZC45 forming CoVZC cluster), which form an out group of the SARS-CoV-2r cluster. These three CSIs are not found in the SARS-CoVs. However, both SARS and SARS-CoV-2r CoVs contain two large CSIs in the C-terminal domain of S1 (S1-CTD), which binds the human ACE-2 receptor, that are absent in the CoVZC cluster of CoVs. These results indicate that while the S1-NTD of the SARS-CoV-2r viruses possesses the sequence characteristics of the CoVZC cluster of CoVs, their S1-CTD resembles the SARS viruses. Thus, the spike protein of SARS-CoV-2r viruses has likely originated from a recombination event between the S1-NTD of the CoVZC viruses and the S1-CTD of SARS viruses. This inference is also supported by the amino acid sequence similarity of the S1-NTD and S1-CTD from SARS-CoV-2 compared to the CoVZC and SARS CoVs. We also present evidence that one of the pangolin-CoV_MP789, whose receptor-binding domain is most similar to the SARS-CoV-2, is also derived by a recent recombination between the S1-NTD of the CoVZC CoVs and the S1-CTD of a SARS-CoV-2 related virus. Several other identified CSIs are specific for others clusters of sarbecoviruses including a clade consisting of bat SARS-CoVs (BM48-31/BGR/2008 and SARS_BtKY72). Structural mappings studies show that the identified CSIs are located within surface-exposed loops and form distinct patches on the surface of the spike protein. These surface loops/patches are predicted to interact with other host components and play important role in the biology/pathology of SARS-CoV-2 virus. Lastly, the CSIs specific for the SARS-CoV-2r clade provide novel means for development of new diagnostic and therapeutic targets for these viruses.


Conserved signature indels (CSIs) specific for SARS and SARS-CoV-2-related viruses. Molecular markers distinguishing different clades of Sarbecovirus, Evolutionary relationships between SARS and SARS-CoV-2-related viruses, Origin of SARS-CoV-2 and Pangolin CoV_MP789 viruses, Novel sequence and structural features of spike and nucleocapsid proteins. Genetic recombination.


Biology and Life Sciences, Virology

Comments (2)

Comment 1
Received: 26 August 2020
Commenter: Radhey Gupta
Commenter's Conflict of Interests: Author
Comment: I have changed the title of the manuscript so that it now more clearly states/emphasizes the novel findings that are presented in here. Some change in the conceptual drawings (none in data figures) have been made so that the main inferences from this work could be more easily understood.
+ Respond to this comment
Comment 2
Received: 12 November 2021
Commenter's Conflict of Interests: This is an update regarding our own publication
Comment: This article is now published in the PeerJ journal. The Title as well as link to the final publication is noted below.

Conserved molecular signatures in the spike protein provide evidence indicating the origin of SARS-CoV-2 and a Pangolin-CoV (MP789) by recombination(s) between specific lineages of Sarbecoviruses
In addition to other important information regarding the origin of SARS-CoV-2 related viruses, this work indicates that recombination between an unidentified virus that is most closely related to the SARS-CoV-2 (>70% of the sequence of this virus known) and Bat-CoVZC/Prc31 virus has led to the formation of Pangolin-CoV (MP789).
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 2
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.