Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Origin, Phylogeny, Variability and Epitope Conservation of SARS-CoV-2 Worldwide

Version 1 : Received: 27 May 2021 / Approved: 31 May 2021 / Online: 31 May 2021 (11:36:29 CEST)

How to cite: Vale, F.F.; Vítor, J.M.; Marques, A.T.; Azevedo-Pereira, J.M.; Anes, E.; Gonçalves, J. Origin, Phylogeny, Variability and Epitope Conservation of SARS-CoV-2 Worldwide. Preprints 2021, 2021050750 (doi: 10.20944/preprints202105.0750.v1). Vale, F.F.; Vítor, J.M.; Marques, A.T.; Azevedo-Pereira, J.M.; Anes, E.; Gonçalves, J. Origin, Phylogeny, Variability and Epitope Conservation of SARS-CoV-2 Worldwide. Preprints 2021, 2021050750 (doi: 10.20944/preprints202105.0750.v1).

Abstract

The coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) challenges include understanding what triggered SARS-CoV-2 emergence, how this RNA virus is evolving or how the genomic variability may impact the primary structure of proteins that are targets for vaccine. We analyzed 19471 SARS-CoV-2 genomes and 199,984 spike glycoprotein sequences available at the GISAID database from all over the world and 3335 genomes of other Coronoviridae family members available at Genbank, collecting SARS-CoV-2 high-quality genomes and distinct Coronoviridae family genomes. Here, we identify a SARS-CoV-2 emerging cluster containing 13 closely related genomes isolated from bat and pangolin that showed evidence of recombination, which may have contributed to the emergence of SARS-CoV-2. The analyzed SARS-CoV-2 genomes presented 9632 single nucleotide polymorphisms (SNPs) corresponding to a variant density of 0.3 over the genome, and a clear geographic distribution. SNPs are unevenly distributed throughout the genome and hotspots for mutations were found for the spike gene and ORF 1ab. We describe a set of predicted spike protein epitopes whose variability is negligible. All predicted epitopes for the structural E, M and N proteins are highly conserved. This result favors the continuous efficacy of the available vaccines.

Subject Areas

COVID-19; SARS-CoV-2 genomics; spike protein; epitope prediction; coronavirus comparative genomics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.