Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

A Comprehensive Comparative Phylogenomics and Demographic Evolutionary History of the SARS-CoV-2

Version 1 : Received: 19 May 2020 / Approved: 21 May 2020 / Online: 21 May 2020 (03:27:52 CEST)
Version 2 : Received: 17 June 2020 / Approved: 21 June 2020 / Online: 21 June 2020 (16:10:26 CEST)

How to cite: Doğan, Ö.; Korkmaz, E.M.; Budak, M.; Çıplak, B.; Başıbüyük, H.H. A Comprehensive Comparative Phylogenomics and Demographic Evolutionary History of the SARS-CoV-2. Preprints 2020, 2020050332 (doi: 10.20944/preprints202005.0332.v1). Doğan, Ö.; Korkmaz, E.M.; Budak, M.; Çıplak, B.; Başıbüyük, H.H. A Comprehensive Comparative Phylogenomics and Demographic Evolutionary History of the SARS-CoV-2. Preprints 2020, 2020050332 (doi: 10.20944/preprints202005.0332.v1).

Abstract

A new form of beta coronavirus called severe acute respiratory disease coronavirus type 2 (SARS-CoV-2) causing a recent pandemic outbreak possesses a linear positive ss-RNA genome with a length of 29,903 nt. Here, the genomes of SARS-CoV-2 from 821 samples were characterised for its better understanding of the genomic and evolutionary patterns. The phylogeny of SARS-CoV-2 was reconstructed using concatenated dataset consisting of all peptide encoding sequences under Bayesian Inference (BI) and Maximum Likelihood (ML) approaches. Comparison of all peptide encoding sequences reveals high divergence of amino acid sequences proportional to divergence of nucleotides, indicating that the viral genomic evolution has not been strictly neutral. The most part of the genome was under neutral evolution, however, the specific sites for peptide encoding sequences were evolved under positive selection. As well as providing reliable evidence on transmission routes of the SARS-CoV-2 outbreak, the phylogenetics and network analyses suggest the sample reported from Guangdong province is likely ancestor of the pandemic virus form. The overall substitution rate of SARS-CoV-2 genome was estimated to be 1.65 x 10-3 per site per year, falling within the range for previously reported RNA viruses. Median estimation of tMRCA from Bayesian coalescent analyses corresponds to 10 September 2019. The exponential growth rate (r), doubling time (Td) and R0 were estimated to be 47.43 per year, 5.39 days and 2.72, respectively. These findings convincingly emphasise that the use of more comprehensive genome data improves robustness and also enhances understanding of the demographic history of the outbreak.

Supplementary and Associated Material

https://www.gisaid.org/: The data used in this study obtained from from GISAID’s EpiCOVTM Database.

Subject Areas

coronavirus; origin; substitution rate; positive selection; demographic dynamics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.