Comparative analysis based on the spike glycoproteins of SARS-CoV2 isolated from COVID 19 patients of different countries

SARS-CoV2 popularly known as (COVID-19) has presently received worldwide attention. It has been considered a pandemic by the World Health Organisation. Owing to its high transmittance factor the virus has brought about many deaths and spread to all the major countries of the world. Scientists and Researchers worldwide are giving their full efforts to develop a vaccine. In our present study, we have included the comparative analysis of the different spike glycoprotein sequences of the patients suffering from COVID-19 from different countries where this pandemic has occurred. Spike glycoproteins are the structural proteins that bring about the binding of the SARS-CoV-2 viral molecule to the ACE2 receptor of the host following which infection occurs. Through this data, we have shown the different point mutations in the spike glycoproteins that occurred over time in different countries as the disease progressed.


Introduction
Diseases since time immemorial has always cost mankind health and wealth, not surprisingly the quest for survival of mankind has been an everlasting battle.Epidemics and pandemics are not new to the history of mankind and records of diseases like Plague and Spanish Flu devastating lives have been an integral part of the epidemiological study of the human race. In December 2019, an incident occured in Wuhan, Southern China where a series of pneumonia cases, was reported. It wasnt long before the cases were classified as viral pneumonia and the virus was speculated to belong to β coronavirus. Primarily it was named as 2019-novel coronavirus (2019-nCoV) by World Health Organization (WHO), which later named the disease as coronavirus disease 2019 . Also identified with the name SARS-CoV2, Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 9 April 2020 doi:10.20944/preprints202004.0154.v1 the epidemic COVID-19 progressed by leaps and bounds via human-to-human transmission, that made it basically impossible to contain it within a certain area thus leading to a pandemic that crossed borders spreading into the International community. The SARS-CoV2 being a β-coronavirus  like SARS-CoV and MERS-CoV is responsible for causing severe and potentially fatal respiratory tract infections (Yin et al., 2018) Sharing 96.2% identity to a bat CoV RaTG13, and 79.5% identity to SARS-CoV, it may be presumed that the SARS-CoV2 might be transmitted from bats to humans. However, recent studies comparing the receptors on host surfaces, suggest the possibility of alternative intermediate hosts .

Spike glycoprotein of SARS-CoV 2
The complete genome analysis of a strain of SARS-CoV2, obtained from Wuhan revealed that this enveloped virus has a positive stranded RNA genome with a size about 29.9 kb  The study of genomes of CoVs has revealed a variable number of (6-11) open reading frames (ORFs) .The first ORF (ORF1a/b) encodes 16 non-structural proteins (NSP), and translates two polyproteins, pp1a and pp1ab, and the remaining ORFs encode accessory and structural proteins.
However, four essential structural proteins are encoded by the viruses including spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins Du et al., 2016). Among them, an envelope-anchored spike protein specifically recognizes its host receptor and it serves as a target for development of antibodies, entry inhibitors and vaccines (Du et al., 2016;He et al., 2006) The S protein is trimeric and each monomer is about 180 kDa, and contains two subunits, S1 and S2, The minimal RBD( Receptor Binding Domain) region is the fragment covering the residues 318-510 in S1 subunit (Xiao et al., 2003, Wong et al., 2004. The receptor-binding motif (RBM) is the RBD containing a loop region (residues 424-494), which makes complete contact with the receptor ACE2 (Angiotensin Converting Enzyme), the RBM region being tyrosine-rich, It was observed that six out of 14 residues of RBM were tyrosines that were in direct contact with ACE2 (Zhu et al., 2013). The S protein first binds to a host receptor through the receptor-binding domain (RBD) in the S1 subunit and then fuses the viral and host membranes through the S2 subunit (Liu et al., 2004). In the structure, N-terminal domain (NTD) and C-terminal domain (C-domain) portions of S1 fold as two independent domains, Depending on the virus, either NTD or C-domain (occasionally both) binds to a host receptor and functions as a receptorbinding domain (RBD) (Breslin et al., 2003 ;Lin et al., 2008). Recently,  reported that SARS-CoV-2 uses ACE2 as the receptor which is similar to the S1 C-domain of SARS-CoV in the RBD that recognizes host angiontensin-converting enzyme 2 (ACE2) as its receptor (Babcock et al., 2004;Li et al., 2003). In one of the studies by  it was observed that among the ACE2-contacting residues in the RBD, 9 are fully conserved and 4 are partially conserved among 2019-nCoV and SARS-CoV from human, civet, and bat. ACE2 is a zinc-dependent peptidase that functions in the reninangiotensin pathway and regulates blood pressure (Donoghue et al., 2000;Yagil and Yagil, 2003).
However, the physiological function of ACE2 is not related to its role as the SARS-CoV receptor (Li et al., 2005b). ACE2 contains an N-terminal peptidase domain and a C-terminal collectrin domain. The enzymatic active site of ACE2 is buried in a claw-like structure with two lobes of the peptidase domain (Towler et al., 2004). The binding interactions between SARS-CoV, RBD and ACE2 largely determine the host range and cross-species infections of SARS-CoV (Lu et al., 2015).Using computer modeling, Xu et al (2020), found that the spike proteins of SARS-CoV-2 and SARS-CoV share 76.5% identity in amino acid sequences having almost identical 3-D structures in the receptor-binding domain that maintains Van der Waals forces. It has been reported that residue 394 (glutamine) in the SARS-CoV-2 receptor-binding domain (RBD), which corresponds to residue 479 in SARS-CoV, can be recognized by the critical lysine 31 on the human ACE2 receptor (Wu et., 2012). Through recent cryo-EM structure studies which further deciphered the S protein of SARS-CoV2 and ACE2 interaction at Angstrom resolution level it was revealed that the overall ACE2-binding mode of S protein of SARS-CoV2 is almost identical to the mode of S protein of SARS-CoV Lan et al., 2020). increasing viral pathogenesis . SARS-CoV possess some residues in RBD that allows the interspecies infection, known as Y442, L472, N479, D480, and T487 (Lu et al., 2015). However, in SARS-CoV2, slight modification of some residues could improve the interaction with the human cellular receptor: L455, F486, Q493, and N501. In SARS-CoV, two main residues (479 and 487) have been associated to the recognition of the human ACE2 receptor (Lu et al., 2015). These residues suffered a punctual mutation from civet to human, K479N and S487T (Li, 2013). In the SARS-CoV2, the residues corresponding to N479 correspond to Q493 and T487 to N501. Moreover, a model shows the presence of the two capping loops in the binding domain which produces a stabilization effect over the interaction with the cellular receptor (Ortega et al., 2020). Thus the amino acid substitutions and the longer capping loops could explain the increase in binding affinities in SARS-CoV2 compared to SARS-CoV. Since mutations play a major role, our focus of the present study was to understand the mutations in the spike glycoproteins from different countries as it could provide us an idea about the constant shift in the structure of the spike glycoproteins and probably enabling it to be transmitted to different regions However, is the mutation dependant on the race or ethnicity of a person or the gene pool is an entire new story altogether.  Table 1). The phylogenetic analysis of SARS-CoV2 spike proteins of the different countries was done using MEGAX software (MEGA-X Version 7.0) (Kumar et al., 2018). The phylogenetic analysis was accomplished through multiple comparisons using the neighbor-joining algorithm in the MEGA-X. Multiple comparisons were done by ClustalW multiple sequence alignment and the neighbor-joining phylogenies were estimated by p-distance method.

Result and Discussion
A total of 22 different amino acid sequences of the spike glycoproteins from different countries were analysed by Multiple Sequence alignment (Table 1, Fig 2). The spike glycoprotein sequence of India showed mutations in the S1 and S2 domains. The mutation Ala930Val in the spike protein of the Indian sequence (Accession Number Q1A985839) has been observed to be in the S2 domain. It is well understood that these point mutation enhances the surface area for interaction with the ACE2 receptor while conserving the physico-chemical property of the side chain. Additionally, increasing the chance of vander-walls interactions and contributing to the protein core stability another mutation of Arg408Ile is noted to in the RBD region of the spike protein of another Indian sequence (Accession number MT012098). It has been seen that the RBD regions are mostly tyrosine rich to ensure proper contact with the ACE2 receptor (Zhu et al., 2013). However in contradiction reducing the binding affinity, the same The findings can fill in an important missing link and lead to development of vaccines and therapeutics associated with the COVID-19 pandemic. Also, the present work gives a better insight to understand the positions of amino acids which may be susceptible to mutations and can drastically aid SARS-CoV2 to evolve in the near future to another potential pathogenic strain.