Working Paper Brief Report Version 2 This version is not peer-reviewed

Identification of Novel Missense Mutations in a Large Number of Recent SARS-CoV-2 Genome Sequences

Version 1 : Received: 26 April 2020 / Approved: 28 April 2020 / Online: 28 April 2020 (07:36:18 CEST)
Version 2 : Received: 20 May 2020 / Approved: 21 May 2020 / Online: 21 May 2020 (04:09:53 CEST)

How to cite: Cai, H.Y.; Cai, K.K.; Li, J. Identification of Novel Missense Mutations in a Large Number of Recent SARS-CoV-2 Genome Sequences. Preprints 2020, 2020040482 Cai, H.Y.; Cai, K.K.; Li, J. Identification of Novel Missense Mutations in a Large Number of Recent SARS-CoV-2 Genome Sequences. Preprints 2020, 2020040482

Abstract

Background: SARS-CoV-2 infection has spread to over 200 countries since it was first reported in December of 2019. Significant country-specific variations in infection and mortality rate have been noted. Although country-specific differences in public health response have had a large impact on infection rate control, it is currently unclear as to whether evolution of the virus itself has also contributed to variations in infection and mortality rate. Previous studies on SARS-CoV-2 mutations were based on the analysis of ~ 160 SARS-CoV-2 sequences available until mid-February 2020.2, 3, 4, 5 By mid-April, > 550 SARS-CoV-2 sequences had been deposited in GenBank, and over 8,200 in the GISAID database. Methods: We performed a sequence analysis on 474 SARS-CoV-2 genomes submitted to GenBank up to April 11, 2020 by multiple alignment using Map to a Reference Assembly and Variants/SNP identification. The results were verified on a larger scale, 8,126 hCoV-19 (SARS-CoV-2) sequences from GISAID database. Results: We identified 5 recently emerged mutations in many isolates (up to 40%). Our analysis highlights 5 frequent new mutations that have emerged since late February 2020. These mutations are: one each missense (non-synonymous) mutation in orf1ab (C1059T), orf3 (G25563T) and orf8 (C27964T), one in 5’UTR (C241T), one in a non-coding region (G29553A). The final mutation (G29553A) was found to be almost exclusive to the US isolates. The first 3 mutations are non-synonymous, leading to amino acid substitutions in the viral protein sequence. Except for C241T, all the novel mutations identified are absent in the isolates from Italy and Spain in the SARS-CoV-2 genomes deposited in GenBank and GISAID by April 13, 2020. Conclusion: The results of current study indicate that new mutations are emerging as COVID-19 pandemic are spreading to different countries and that geography specific mutants may exist. The findings of current study lay the foundation for further investigation into the impact of SARS-CoV-2 mutations on disease incidence, severity, and host immune response. In addition, it may also provide insights into vaccine development and serological response detection for the virus.

Subject Areas

COVI-19; SARS-CoV-2; virus; mutation; polymorphism; genome sequence

Comments (1)

Comment 1
Received: 21 May 2020
Commenter: Hugh Cai
Commenter's Conflict of Interests: Author
Comment: 1. Last paragraph of discussion: Update on May 19, 2020: Among the 29,633 complete SARS-CoV-2 genomes in the GISAID hCoV-19 database, 6, 367 (21.5%) had C25563T (group D) mutation found in mostly in US (3244) and some other countries including Spain (9) but none was from Italy; 516 (1.7%) had the C27964T (group E) mutation, 451 from the US, 3 from Canada, and 62 from Australia; 294 (1%) had G29553A mutation, 293 from the US and 1 from Iceland.
2. Changed the format of abstract to iclude sections of Background, Methods, Results and Conclusion 
3. Improved figure S1
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.