Brief Report
Version 2
This version is not peer-reviewed
Identification of Novel Missense Mutations in a Large Number of Recent SARS-CoV-2 Genome Sequences
Version 1
: Received: 26 April 2020 / Approved: 28 April 2020 / Online: 28 April 2020 (07:36:18 CEST)
Version 2 : Received: 20 May 2020 / Approved: 21 May 2020 / Online: 21 May 2020 (04:09:53 CEST)
Version 2 : Received: 20 May 2020 / Approved: 21 May 2020 / Online: 21 May 2020 (04:09:53 CEST)
How to cite: Cai, H.Y.; Cai, K.K.; Li, J. Identification of Novel Missense Mutations in a Large Number of Recent SARS-CoV-2 Genome Sequences. Preprints 2020, 2020040482 Cai, H.Y.; Cai, K.K.; Li, J. Identification of Novel Missense Mutations in a Large Number of Recent SARS-CoV-2 Genome Sequences. Preprints 2020, 2020040482
Abstract
Background: SARS-CoV-2 infection has spread to over 200 countries since it was first reported in December of 2019. Significant country-specific variations in infection and mortality rate have been noted. Although country-specific differences in public health response have had a large impact on infection rate control, it is currently unclear as to whether evolution of the virus itself has also contributed to variations in infection and mortality rate. Previous studies on SARS-CoV-2 mutations were based on the analysis of ~ 160 SARS-CoV-2 sequences available until mid-February 2020.2, 3, 4, 5 By mid-April, > 550 SARS-CoV-2 sequences had been deposited in GenBank, and over 8,200 in the GISAID database. Methods: We performed a sequence analysis on 474 SARS-CoV-2 genomes submitted to GenBank up to April 11, 2020 by multiple alignment using Map to a Reference Assembly and Variants/SNP identification. The results were verified on a larger scale, 8,126 hCoV-19 (SARS-CoV-2) sequences from GISAID database. Results: We identified 5 recently emerged mutations in many isolates (up to 40%). Our analysis highlights 5 frequent new mutations that have emerged since late February 2020. These mutations are: one each missense (non-synonymous) mutation in orf1ab (C1059T), orf3 (G25563T) and orf8 (C27964T), one in 5’UTR (C241T), one in a non-coding region (G29553A). The final mutation (G29553A) was found to be almost exclusive to the US isolates. The first 3 mutations are non-synonymous, leading to amino acid substitutions in the viral protein sequence. Except for C241T, all the novel mutations identified are absent in the isolates from Italy and Spain in the SARS-CoV-2 genomes deposited in GenBank and GISAID by April 13, 2020. Conclusion: The results of current study indicate that new mutations are emerging as COVID-19 pandemic are spreading to different countries and that geography specific mutants may exist. The findings of current study lay the foundation for further investigation into the impact of SARS-CoV-2 mutations on disease incidence, severity, and host immune response. In addition, it may also provide insights into vaccine development and serological response detection for the virus.
Keywords
COVI-19; SARS-CoV-2; virus; mutation; polymorphism; genome sequence
Subject
Biology and Life Sciences, Virology
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (1)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment
Commenter: Hugh Cai
Commenter's Conflict of Interests: Author
2. Changed the format of abstract to iclude sections of Background, Methods, Results and Conclusion
3. Improved figure S1