Genetic Uniformity of a Specific Region in SARS-CoV-2 Genome and In-Silico Target-Oriented Repurposing of N- Acetyl-D- Glucosamine

Causative agent of the viral pneumonia outbreak in the World identified as SARS-CoV2 leads to a severe respiratory illness like SARS and MERS. The pathogen spreading has turned into a pandemic dissemination and increased the mortality rate. Therefore, useful information is urgently necessary for effective control of the disease. Our study shows the existence of unvarying sequence with no mutation, including ORF1ab regions in 134 high-quality filtered genome sequences of SARS-CoV2 downloaded from the GISAID database. We have detected this sequence stability by using MAUVE analysis and pairwise alignment with Global Needleman Wunsch algorithm for each two different sequences, reciprocally. They also confirmed all these results were also with the Clustal W analysis. The first 6500 bp including ORF1ab region is an unvarying sequence. According to the highest TM-score of predicted Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 24 May 2020 doi:10.20944/preprints202005.0397.v1 © 2020 by the author(s). Distributed under a Creative Commons CC BY license. protein structure analysis, the results showed it is very similar to spike protein of feline infectious peritonitis virus strain UU4 (PDB 6JX7) depending on amino acid sequences encoded by this unvarying region, and N-acetyl-D-glucosamine is the ligand of this protein. These results have confirmed that N-acetyl-D-glucosamine could play an important role in controlling of SARS-CoV-2. Also, our molecular docking analysis data supports a strong protein-ligand interaction of N-acetyl-D-glucosamine with spike receptor-binding domain bound with ACE2 (PDB 6M0J) and RNA-binding domain of nucleocapsid phosphoprotein (PDB 6WKP) from SARS CoV-2. Therefore, binding of N-acetyl-D-glucosamine to these proteins could inhibit SARS CoV-2’s replication. In the present work, we have suggested providing a repurposing compound for further in vitro and in vivo studies and new insights for ongoing clinical treatments as a new strategy to control of SARS-CoV-2 infections.


Introduction
Coronaviruses (CoVs) are positive-strand RNA viruses belonging to the order of Nidovirales including three families Arteriviridae, Coronaviridae, and Roniviridae [1]. Relied on the genetic studies, they classify CoVs into four genera as alpha, beta, gamma, and delta CoVs.
The diameter of CoVs is between 80 to 120 nm and their shapes are spherical. The fundamental structural proteins of CoVs are envelope (E), membrane (M), nucleocapsid (N), and spike (S). Its RNA genome composes of six to ten open reading frames (ORFs).
The SARS-CoV-2 outbreak started from a local seafood market in Huanan. Even initial reports showed that human-to-human transmission of the virus was not or limited, now human-to-human transmission exists [2]. It passes from person to person by respiratory droplets and also spreads through contact and fomites [3]. Coronaviruses have error-prone RNA-dependent RNA polymerases, mutations and recombination events occur that is concen with rapidly strengthening and it increases its capacity to cause disease, which enhances also virulence [4]. ACE2, the receptor protein is present in humans in the epithelia of the lung and small intestine [5], and coronavirus binds to this receptor to enter into cell and infects the upper respiratory and gastrointestinal tract of mammals [6]. ORF1ab is a genomic region coding the putative replicase polyprotein. In other coronaviruses has also reported this polyprotein has also as proteases encoded by this ORF region that causes expression of 10 proteins encoding important enzymes, which is essential for the survival of the viruses.
ORF1a is the longest part of the RNA encoding replicases and ORF1b expresses for two large polyproteins including pp1a and pp1ab. The expression of pp1ab polyprotein is playing a role for programmed ribosomal frame shifting of signal conducting a bridge between ORF1a and ORF1ab [7]. This frame shifting signal leads to the expression of RNA-dependent RNA polymerase (RdRP) that is required for the coronavirus replication [8]. The increasing epidemiological and clinical evidence indicates that the SARS-CoV-2 has a stronger transmissibility than SARS-CoV [9]. But the exact mechanism of SARS-CoV-2 is unclear [10]. As a result of a unique mechanism of viral replication, they have a high frequency of recombination [11,12,13,14]. As a strategy, DNA sequence comparisons using single nucleotide polymorphisms (SNPs) are often followed for evolutionary studies to recognize the mutated coronavirus genomes where high mutations occurs due to an error-prone RNAdependent RNA polymerase in genome replication [15,16].
However, to our best knowledge, there is no detailed study comparing whole-genome sequences using algorithmic fragmentation programs such as MAUVE to see the stability of sequences among genomic pairs. Whole-genome comparison using MAUVE can be an efficient manner for aligning multiple nucleotide or protein sequences. Because MAUVE analysis is a method based on constructing multiple genome alignments with large-scale, which shows evolutionary changes and re-arrangements of inversion cases in genomes. We believe that any information resulted from the protein modelling can be also beneficial for drug designing. As an alternative approach, in silico analysis can accelerate discovering novel therapeutics for the prevention and treatment.
The mutations in the viral genome can be important for adaptation to host conditions, but the mechanism of these changes remains unclear [17]. Therefore, recent studies will fill the knowledge gaps to reveal how the virus is evolving and adapting to new conditions and which parts of the genome have stability than the other regions of genetic structure. Related information on its genetic stability can help us for treatment of SAR-CoV2.
In this study, we have investigated unvarying regions with less mutation than other parts of the genome on 134 different genome sequences of the GISAID database from distinct parts of the World. Our study aims to show stable regions in the viral genome, to do prediction on protein structure and docking analysis to find an effective molecule interacting with proteins of SARS CoV-2 to control its replication.

Homology genome blast and genomes information
We retrieve total 134 complete genome sequences from the GISAID database [18] as of April 19, 2020. Only the complete genomes of high-coverage are included in the dataset. The complete genomes of the countries and territories infected by SARS-CoV-2 and shared are given in supplementary material (S1 File).

Phylogenetic analysis
To analyse the obtained SARS-CoV-2 genomes, sequence alignment was performed using Multiple Sequence Comparison by MAUVE and Clustal W of MegAlign from DNAstar software [19]. The phylogenetic tree was conducted by a maximum likelihood using for the tree topology estimated with 1,000 bootstrap replicates. The maximum likelihood phylogenetic tree was constructed.

Nucleotide and amino acid sequence alignment and analysis
Nucleotide sequence editing and alignment were conducted using MAUVE and ClustaI W of MegAlignPro, DNASTAR software [19]. The evolutionary history was inferred using the Neighbor-Joining method in MegAlignPro software. The sequences were analysed and common regions of all genomes were detected using MAUVE from pairwise alignment results obtained with Global Needleman-Wunsch algorithm [20]. Each unvarying genomic region was excised of whole sequences and subjected to protein similarity program of NCBI database using BlastX. This obtained FASTA sequence was converted to protein sequence using ExPASy proteomics server (https://web.expasy.org/translate/) [21] then loaded to I-TASSER (Iterative Threading ASSEmbly Refinement) server of Michigan University, US (https://zhanglab.ccmb.med.umich.edu/I-TASSER) for prediction of proteins [22].

Homology modelling and protein prediction
Corresponding homology models predicted by I-TASSER server System for each target protein were downloaded from Protein Data Bank (PDB) (www.rcsb.org). Alignment of the protein sequences and subsequent homology modeling were done using ExPASy proteomics server [21] to study on the protein sequence and further structural details.

Ligand retrieval
The structure of NAGD was retrieved from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/). This structure was used for docking calculations. The selected 3D structure of the ligands was retrieved from PubChem Compound database in SDF format followed by conversion in the PDB format. The ligand parameters were analyzed using PRODRG online server (http://prodrg2.dyndns.org/cgi-bin/prodrg.cgi) [23]. Further shape complementarity principle was applied with clustering RMSD 4.0 for docking calculations.

Molecular docking studies
Homology modeling and protein prediction analysis have directed us to test of protein receptors of SARS-CoV-2 with our ligand. Later on PatchDock server (http://bioinfo3d.cs.tau.ac.il/PatchDock/php.php), a geometry based molecular docking algorithm was used for docking analysis using cluster RMSD at default value of 4.0 and protein-small ligand complex type as the analysis parameters [24,25,26,27]. Analysis on PatchDock yielded results for geometric shape complementarity score (GSC score) and approximate interface area (AI area). Additionally CHIMERA, an AutoDock software Vina based tool for molecular docking was used. The flexible docking study was carried out using Autodock v 4.0 [28]. The interaction analysis of protein-ligand complexes and their amino acid position with bond distances were calculated and visualized through the PyMol.
Molecular docking simulations were confirmed by Protein dock server SWISSDOCK (http://www.swissdock.ch/docking) within protein receptors and ligand [29]. Later, Pymol software has been used to get insight into their all binding preferences within the active site of these receptors.

Phylogenetic tree
The maximum likelihood phylogenetic tree shows main clades containing several clusters and the viral genome sequences show genetic diversity according to Mauve ve Clustal W analysis, respectively (S2a and S2b Files).

Nucleotide and amino acid sequence alignment and analysis
Our results showed high mutational changes in whole genomes except for the first 6500 bp which is constantly unvarying part in whole sequences. MAUVE results have confirmed the Clustal W analyses, reciprocally (Fig 1).

Homology modeling and protein prediction
The excised uniform regions of sequences subjected to aligment for protein similarity indicated that 6500 bp region including ORF1ab is consists of constantly unvarying sequences (Fig 1). These stabile sequences were selected as a template for further protein structural predictions (Fig 2). The results were structurally very close to 6JX7 (Cryo-EM structure of spike protein of feline infectious peritonitis virus strain UU4) as a target protein according I-

Protein Docking
For docking analysis of NADG with 6M0J and 6WKP, the ligand structure of NADG retrieved from PubChem database was analysed (Fig 3a, 3b), using PatchDock server and The results of SWISSDOCK server confirmed our results obtained with CHIMERA and Vina softwares calculation. Furthermore binding possibilities of ligand on protein surface have been confirmed with results of SWISDOCK (Fig 4, 5, S6, S7 Files).

Discussion
We have clearly shown the existence of the genetical specific, unvarying region in whole SARS-COv-2 genomes. After alignment of all sequences by MAUVE, we have seen this showing no mutational differentiations (Fig 1). We have determined that MAUVE is the most effective method for genome comparisons. Pairwise alignment by Global Needleman-Wunsch algorithm has shown this uniform sequence with no mutations in all paired sequences up to 6500 bp by MAUVE and also confirmed with Clustal W. Only does the first 6500 bp seem unvarying region rather than remaining part of the viral genome. The determined unvarying part in viral genome has special characteristic properties to encompass further immunologic studies. We suggest Global Needleman Wunsch pairwise alignment analysis for observing uniformity of genome sequences as an effective method.
We have reported our detected unvarying sequences including ORF1ab to be an important region responsible for the putative replicase polyprotein of proteases secretions [30].
Correspondently, we observed sequence variation with high ratio of genomes (except for 6500 bp fragment) (Fig 1) seems not very convenient as a target point for drug discovery. Hence, it can be hypothesised according to our findings that during the transmission and evolutionary processes the first genetically stable 6500 bp could be an appropriate target for antiSARS drugs. Also, our data showed N-acetyl-D-glucosamine interacts with proteins encoded by ORF1ab region (S3 File, I-TASSER analysis data). Previous studies have also reported the effectiveness of NADG against influenza [31]. In another study, gluosamine has been reported to influence on replication of hepatitis B virus by in-vitro and in-vivo experiments [32]. Therefore, NADG can also be suggested as an antiviral drug for SARS-Cov2.
As known glycosylation is a major process which affects the binding of monoclonal antibodies to the coated virus in the vaccine development process but deglycosylation reduces binding of the antibodies in vice versa. Therefore, binding of neutralising monoclonal bodies to virus protein depends on glycosylation of the virus [33]. N-linked glycans on an immune cell's surface will help for the migration pattern of the cell with specific glycosylations [34].
These patterns on the various immunoglobulins give specific shape and unique effector properties for affinities of immune receptors. It could also involve glycans in "self" and "nonself" discrimination, which could apply to response against virus as previously reported on the various autoimmune diseases [34]. Glycans consist of different derivatives of NADG and suggests having an important role in the immune system.
Moreover, Pant et al. reported asparagine supply is a critical barrier and limiting factor for replication of virus proteins to development of antiviral drugs [35]. We assume that virus prefers glutamine to glucose for efficient replication, and the viral replication reduces in glutamine-free medium. Asparagine supplementation compensate of glutamine depletion, for viral replication. Asparagine-linked glycosylation is an enzyme-catalyzed, co-translational protein modification which influences either the protein folding process or the stability of the native conjugated glycoprotein form [36]. In our study, we have found NADG interacts with proteins encoded by ORF1ab region; we suggest binding of NADG to asparagine and inhibition of virus replication as reported by Pant et al. [35]. We have observed by docking analysis that binding of NADG to asparagine is also possible. In unvarying sequences, we have detected 34 asparagine amino acid residues that can be a target point for binding of NADG as ligand molecule (Fig 2). Particularly, the effect of NADG has also tested against HIV1 with different concentrations (0.25 mM, 1 mM, 4 mM, and 16 mM) [37]. We can suggest the same mode of action to HIV1 [35,37]. Our predictional protein structure and docking analysis showed N-acetyl D glucosamine is a major compound showing high interaction possibility, which can interact with our tested proteins 6M0J and 6WKP of SARS-CoV-2. A previous study have reported to seven glycosylation sites playing a role on the S protein, which is critical for DC/L-SIGN-mediated virus attacking to asparagine residues at amino acid positions that are distinct from residues of the ACE2-binding domain [38].
Defections in secretion and infectivity of several flaviviruses concerned with blocking of the N-linked oligosaccharides have confirmed the role of glycosylation [39]. A previous study has reported to have the effect of removing the terminal glucose residues on the N-linked glycans for altering the mechanism of controlling protein folding mediated by ER chaperones for virus replication [40]. The results of another study were evidence that some viruses (some members of the NCLDV, such as Chlorella viruses) use the host ER/Golgi system for their glycoprotein production, which is the machinery required for the glycosylation of its structural proteins [41]. A virus-encoded uridine diphosphate-N-acetylglucosamine pathway associated with NAG is a ubiquitous sugar which represents a fundamental process for virus. Therefore, NADG can be a substitute of NAG and could convert all process to support immunity defence mechanism [42]. Our molecular docking analysis on NADG, which could mimic NAG, could keep the cell of the SARS-CoV-2's viral integration into ER and Golgi system. Our results also show that NADG has an interaction with 6M0J and /or 6WKP (Fig 4, 5).
However, the role of asparagine availability in virus replication remained unclear up to now [36]. The influences of NAG on cell surface signalling proteins alter signal transduction depending on the degree of branching of N-linked glycans [43]. Therefore, this signal transduction could change in immunity system's favour by NAGD mimicing the same role in signalling instead of NAG. We have found binding possibilities of NADG with 6M0J and 6WKP could defect the attachment on human cell and replication mechanism of the virus.
Our study has indicated that highly frequent mutations in all SARS-CoV-2 genomes except for first 6500 bp regions (Fig 1). The highly frequent SNP mutations discovered with pairwise alignment using comparative computational analysis, our results show correspondence with other studies reporting the changes in transmissibility and virulence of the virus [44]. computational methods are useful to those who wish to understand essential information about SARS-CoV-2 for subsequent analyses. We stress in-silico studies are important tools for the elucidation of major effective compounds interacting with the virus. We purpose, the recent advances in drug discovery by in-silico screening [46,47] give scientists an opportunity for rapid detection of efficient molecules target-oriented on SARS-CoV-2 [48].

Conclusion
The SARS-CoV-2 epidemic gave rise to substantial health emergency and economic drawbacks in the world. Hence, understanding the nature of this virus and to monitor its spreading in the epidemic are critical in disease control. Potential importance in targetting ORF1ab region should draw the attention of researchers for future preventive strategies in pharmaceutical and vaccine development studies. Given attention to the finding of new targets to effectively treat SARS-CoV-2, understanding the molecular effects of repurposed compounds can be in prioritizing pharmacological strategies. Our suggested approach can be drastically helpful for the clinical inefficacy of common antiviral drugs [49,50]. We strongly suggest testing different concentrations of NADG to SARS-CoV2, considering interaction with proteins including ORF1ab region showing constantly unvarying piece of sequence in whole paired genome sequences. Our results are likely to increase the underpinning data for drug repurposing in the therapeutic options against SARS-CoV in the future.