Non-Synonymous Mutation analysis in SARS-CoV-2 variants isolated from Humans and Prediction of Conserved Linear Antibody Epitopes for Use in Country-wise Epitope-based Vaccine Development

: COVID-19 pandemic has caused a large-scale havoc in almost every country across the globe, putting major challenges for the healthcare system in many parts of the world. Several of the laboratories are running in the race with undying efforts for developing potential vaccine, drugs or therapeutics to treat or prevent the infection. However, with the limited time window and high rate of infection, the task is very big for humanity to find a cure. With hundreds of genomic data of SARS-CoV-2 virus isolates from humans are being submitted almost every day, it is coming into knowledge that virus is mutating, slower in countries with sporadic cases, but higher in countries experiencing large outbreak. These types of mutations in virus may bring challenges in vaccine or therapeutic development for use in each and every country, as each hotspot region may have their own pattern of mutations in virus with ongoing outbreak. In our current study, we retrieved non-synonymous mutation data of around 12,225 SARS-CoV-2 virus samples isolated from humans globally, and discovered all mutations that are collectively happening in antibody epitope regions of the virus country-wise. We found a few numbers of epitope regions in SARS-CoV-2 that are highly conserved collectively in all variants and may be used for epitope-based vaccine development for whole world. We also found epitope regions that are conserved collectively in SARS-CoV-2 variants country-wise and can be used for customized epitope-based vaccine development in each different country. virus isolates from humans from various countries have been highlighted in predicted epitope regions of viral proteins. The goal of this study is to discover antibody epitope regions that are most conserved collectively in all SARS-CoV-2 variants isolated from humans and can be used for designing a common epitope-based vaccine (or multiepitope-based vaccine) for whole world. We found around 25 epitope regions that are conserved so far collectively in 12,225 number of SARS-CoV-2 virus isolates from human from various countries. Also, we have highlighted all mutations that have happened in epitope regions of SARS-CoV-2 virus country-wise. We find that there is different portfolio of mutated epitopes in different countries, for example, in nsp1 protein, almost all of the nsp1 epitopes have been found mutated in virus isolates from United States, but this has not happened in virus isolates from China, and the possible reason may be very high outbreak of COVID-19 in United States as compared to China. This also suggests that high number of COVID-19 outbreak may lead to high number of mutations in virus. Countries like United States with high abundance of mutations in epitope regions may find challenges in finding a successful vaccine for treatment of COVID-19. However, on a positive note, in our current study, we analyzed and highlighted mutations collectively happening across all epitope regions of SARS-CoV-2 virus variants country-wise and find that there are epitopes which are still conserved in countries with highest burden of COVID-19. For example, we found nsp2 epitope region 806- 818 that is conserved in SARS-CoV-2 isolates from United States, the epitope is found mutated in other countries like Great Britain, Australia, Turkey and Sweden. Overall, needed to vaccine COVID-19 also considers mutagenesis in SARS-CoV-2 virus. We suggest to look for conserved epitopes in SARS-CoV-2 country-wise and develop a customized mul-tiepitope-based


Abstract:
COVID-19 pandemic has caused a large-scale havoc in almost every country across the globe, putting major challenges for the healthcare system in many parts of the world. Several of the laboratories are running in the race with undying efforts for developing potential vaccine, drugs or therapeutics to treat or prevent the infection. However, with the limited time window and high rate of infection, the task is very big for humanity to find a cure. With hundreds of genomic data of SARS-CoV-2 virus isolates from humans are being submitted almost every day, it is coming into knowledge that virus is mutating, slower in countries with sporadic cases, but higher in countries experiencing large outbreak. These types of mutations in virus may bring challenges in vaccine or therapeutic development for use in each and every country, as each hotspot region may have their own pattern of mutations in virus with ongoing outbreak. In our current study, we retrieved non-synonymous mutation data of around 12,225 SARS-CoV-2 virus samples isolated from humans globally, and discovered all mutations that are collectively happening in antibody epitope regions of the virus country-wise. We found a few numbers of epitope regions in SARS-CoV-2 that are highly conserved collectively in all variants and may be used for epitope-based vaccine development for whole world. We also found epitope regions that are conserved collectively in SARS-CoV-2 variants country-wise and can be used for customized epitope-based vaccine development in each different country.

Introduction:
Since the first report of a novel coronavirus SARS-CoV-2 in late December 2020 from Wuhan province of China, millions of cases have been reported worldwide, affecting most of the countries [1, 2]. After China shared first genetic sequence of this novel human coronavirus, based on the genome sequence of SARS-CoV-2, the virus belongs to Blineage of beta-coronavirus family of the beta-coronaviruses [1,3]. Virus is 96% identical to bat coronavirus at whole genome level [4]. SARS-CoV-2 genome encodes four major structural genes, namely, nucleocapsid protein (N), spike protein (S), membrane glycoprotein (M) and additional membrane glycoprotein (HE). Virus genome also code for a very long unstructured polyprotein, Orf1ab, which can yield nsp1-nsp16. Several of important enzymes from SARS-CoV-2including helicase, 3'-5-exonucleae, Endo-RNAse, Based on the whole genome sequencing data, several mutation hotspots with nonsynchronous mutations have been identified in SARS-CoV-2 [5][6][7]. It is observed that mutations become unprecedently high with increase in burden of infections in various geographical regions. However, there has been no single study on finding relationship between viral mutations and human immune response from various geographical locations. Virus may have to evolve under the pressure of diverse human immune response to gain thermodynamic fitness. A study finds that acute immune response to SARS-CoV-2is very dynamic in humans and this should be taken into consideration for pathogenesis studies in COVID-19 [8]. More studies are required to gain better understanding of host-viral interactions, host immune response and pathogen immune evasion and discover if mutations in SARS-CoV-2virus have any role in them. A phylogenetic analysis based study on S gene from 144 sequences of SARS-CoV-surmised that virus evolves to evade the host immune system with non-synonymous mutation as part of the positive selection and more evolved virus may have greater fitness to cause more outbreak [9]. It is believed that amino acid substitutions may alter the immunogenic determinants of the virus and consequently reducing the immunogenicity by hampering the immune cell recognition of SARS-CoV-2virus [9]. A USA strains is found to have a nonsynonymous mutation in S protein, D614G, that shows less immunogenic response possibly because of alteration in S-protein epitopes [9].
There have been several attempts on prediction of epitope-based subunit vaccine candidates through in-silico approaches [10][11][12][13][14] or also development of monoclonal antibodies [15]. Highly conserved epitopes in Receptor Binding Domain of S-protein have been reported that can bind a neutralizing antibody CR3002 that was actually developed for SARS virus [16]. However, if the SARS-CoV-2virus is mutating to evolve more, and the mutations are in B or T-cell epitopes, this can bring more challenges in developing a vaccine. Also, mutations in SARS-CoV-2virus develops variants that may show different response to inhibitory drugs [5].
There is a need to understand the mutations in SARS-CoV2 that are happening at large scale globally, especially in epitope region of the virus. If we can tabulate the portfolio of mutations happening in epitope regions of the SARS-CoV-2 virus globally, it may be possible to discover epitopes that are conserved in a country or region and can be good candidate for customized epitope-based vaccine development. However, we may have to keep an eye on genomic data that are being submitted almost every day to track the mutations in virus epitope regions in each and every country for customized the epitope-based vaccine. In our current study, we have retrieved the mutation data of around that have happened collectively in predicted antibody epitope regions of SARS-CoV-2 proteins country-wise. We further listed most conserved epitopes regions in all virus variants globally. This knowledge can be very useful in designing customized epitopebased vaccine country-wise.

Prediction of antibody epitopes in SARS-CoV-2
ElliPro was used for prediction of antibody epitopes based on a protein 3-dimensional structure (PMID 19055730). We only searched for linear antibody epitopes due to very high number of mutations in SARS-CoV-2 proteins from across the globe from several countries. ElliPro predictions with a score above 0.5 were only selected in the study.

Retrieval of global mutation data of SARS-CoV-2 from human isolates
Mutation data for 12,225 genome sequence of SARS-CoV-2 virus from human isolates was retrieved on 24 th April 2020 from China National Centre for Bioinformation (CNCB), (https://bigd.big.ac.cn/ncov/release_genome). CNCB has genomic data source from NGDC, NMDC, GISAID, GenBank and Genome Warehouse. Data was tabulated countrywise for every single mutation in SARS-CoV-2 from human isolates. After gathering predicted epitopes from ElliPro, further emphasis was laid on retrieval of country wise mutated epitope data.

Highlight of mutant regions
In the 3-dimensional structure of various proteins from SARS-CoV-2, mutant residues were highlighted in red color in PyMOL, Schrödinger LLC.

Result and Discussion:
All types of collective mutations in SARS-CoV-2 (from 12,225 genomic data) in nonstructural polyprotein (Orf1ab), which yield nsp1-nsp16, have been highlighted in red color in three-dimensional structural model of the proteins (Figure 2-4). All predicted epitopes in different proteins (nsp1-nsp16) of Orf1ab have been tabulated (Supplementary table 1-16) and red color highlighted residues in predicted epitopes have been found mutated in several countries listed in the table. Most conserved antibody epitopes regions were also found in Orf1ab proteins that were not mutated in any country (Table 1) We further looked in SARS-CoV-2 epitopes that have very high tendency of mutations in maximum number of countries (in at-least more than 30 countries). We find that . Several amino acid residues in 184-208 residue, namely, S188, R191, S193, S194, S197, S202, R203, G204 and T205 are found highly mutated almost across the globe. From our overall mutation analysis study in SARS-CoV-2 virus, we find that N-protein epitope region 184-208 show maximum number of mutations in amino acids in virus from across the globe which suggests that N-protein can be the important determinant of immunogenicity.

Conclusion:
This is the first study of its own kind where all the mutations that have happened collectively in thousands of SARS-CoV-2 virus isolates from humans from various countries have been highlighted in predicted epitope regions of viral proteins. The goal of this study is to discover antibody epitope regions that are most conserved collectively in all SARS-CoV-2 variants isolated from humans and can be used for designing a common epitope-based vaccine (or multiepitope-based vaccine) for whole world. We found around 25 epitope re- Overall, a new approach is needed to develop vaccine against COVID-19 that also considers mutagenesis happening globally in SARS-CoV-2 virus. We suggest to look for conserved epitopes in SARS-CoV-2 variants country-wise and develop a customized multiepitope-based vaccine.
Declaration of Competing interest: The authors declare no competing interests.