Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Origin, Phylogeny, Variability and Epitope Conservation of SARS-CoV-2 Worldwide

Version 1 : Received: 27 May 2021 / Approved: 31 May 2021 / Online: 31 May 2021 (11:36:29 CEST)

A peer-reviewed article of this Preprint also exists.

Vale, F.F.; Vítor, J.M.B.; Marques, A.T.; Azevedo-Pereira, J.M.; Anes, E.; Goncalves, J. Origin, Phylogeny, Variability and Epitope Conservation of SARS-CoV-2 Worldwide. Virus Research 2021, 304, 198526, doi:10.1016/j.virusres.2021.198526. Vale, F.F.; Vítor, J.M.B.; Marques, A.T.; Azevedo-Pereira, J.M.; Anes, E.; Goncalves, J. Origin, Phylogeny, Variability and Epitope Conservation of SARS-CoV-2 Worldwide. Virus Research 2021, 304, 198526, doi:10.1016/j.virusres.2021.198526.

Abstract

The coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) challenges include understanding what triggered SARS-CoV-2 emergence, how this RNA virus is evolving or how the genomic variability may impact the primary structure of proteins that are targets for vaccine. We analyzed 19471 SARS-CoV-2 genomes and 199,984 spike glycoprotein sequences available at the GISAID database from all over the world and 3335 genomes of other Coronoviridae family members available at Genbank, collecting SARS-CoV-2 high-quality genomes and distinct Coronoviridae family genomes. Here, we identify a SARS-CoV-2 emerging cluster containing 13 closely related genomes isolated from bat and pangolin that showed evidence of recombination, which may have contributed to the emergence of SARS-CoV-2. The analyzed SARS-CoV-2 genomes presented 9632 single nucleotide polymorphisms (SNPs) corresponding to a variant density of 0.3 over the genome, and a clear geographic distribution. SNPs are unevenly distributed throughout the genome and hotspots for mutations were found for the spike gene and ORF 1ab. We describe a set of predicted spike protein epitopes whose variability is negligible. All predicted epitopes for the structural E, M and N proteins are highly conserved. This result favors the continuous efficacy of the available vaccines.

Keywords

COVID-19; SARS-CoV-2 genomics; spike protein; epitope prediction; coronavirus comparative genomics

Subject

Biology and Life Sciences, Biochemistry and Molecular Biology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.