Drug and vaccine design against Novel Coronavirus (2019-nCoV) spike protein through Computational approach

The recent outbreak of the new virus in Wuhan city, China from the sea food market has led to the identification of a new strain called the corona virus and named as novel corona virus (2019nCoV) belonging to Coronaviridae family. This has created major havoc and concern due to the mortality of 250 persons and affecting more than 10,000 people. This virus causes sudden fever, pneumonia and also kidney failure. In this study a computational approach is proposed for drug and vaccine design. The spike protein sequences were collected from a protein database and analysed with various bioinformatics tools to identify suitable natural inhibitors for the N-terminal receptor binding domain of spike protein. Also, it is attempted to identify suitable vaccine candidates by identifying B-Cell and T-cell epitopes. In the drug design, the tanshinone Iia and methyl Tanshinonate were identified as natural inhibitors based on the docking score. In the vaccine design, B-cell epitope VLLPLVSSQCVNLTTRTQLPPAYTN was found to have the highest antigenicity. FVFLVLLPL of MHC class-I allele and FVFLVLLPL of MHC class-II allele were identified as best peptides based on a number of alleles and antigencity scores. The present study identifies natural inhibitors and putative antigenic epitopes which may be useful as effective drug and vaccine candidates for the eradication of novel corona virus.

Recent outbreaks of ongoing viral pneumonia with unknown etiology in Wuhan City, China. A new virus was described as a novel coronavirus (2019-nCov) and named on 7 January 2020 [5]. On 10 January 2020, the viral genome was released on NCBI Genbank (Wuhan-Hu-1, GenBank accession number MN908947), followed on 12 January by the Global Initiative for the Integration of All Influenza Data (GISAID) following the release of genomes. This novel virus is characterized by genome sequencing closely related to members of a viral species Sever Acute Respiratory Syndrome (SARS) and coronavirus Middle East Respiratory Syndrome (MERS-CoV). On 31 st december 2020, WHO declared the virus a global health emergency as the total death toll to 213 with almost 10,000 cases worldwide [6]. It is estimated this outbreak surpasses the 2003 severe acute respiratory syndrome (SARS) outbreak that began in china which infected 8,098 people killing 774.
Like all coronaviruses, 2019-nCoV consist of a minimum of three viral proteins namely spike protein (S), a type of glycoprotein, a membrane protein (M) that spans the membrane and an envelope protein (E), a highly hydrophobic protein that covers the entire structure of the coronavirus. The spike (S) glycoprotein in the coronavirus recognise the host cell receptors and causes an important role in viral infection [7].
There is an urgent need for the development of anti-viral drugs and vaccines against the 2019-nCov virus due to the high mortality rate of patients. During the epidemic and pandemic outbreak of new viral pathogens, the conventional method of development of drugs and vaccination is not possible to control as it is a time-consuming process [8,9]. Consequently, the rapid approach based on in-silico informatics has become very popular with recent advances in the sequencing of many pathogen genomes and protein sequence databases [10][11][12][13][14]. The continuous increase of patients and a high mortality rate of 2019-nCov infection highlight the urgent need for the development of a safe and effective vaccine. The aim of the study is to use to computational approach to design both anti-viral drug and vaccine candidates. The spike protein in the novel coronavirus sequence is used to design both anti-viral drug and vaccine candidates. For anti-viral drug design, receptor binding protein of novel coronavirus present in the N-terminal of spike protein is homology modelled and used as a protein receptor. 3C like proteinase (3CLpro) plays a role in viral pathogencity and replication by clevaging the polyprotein. The inhibitors of 3CLpro can block the clevage reaction thereby controling viral replication and pathogenecity. The natural inhibitors were used as ligands and docked with homology modelled coronavirs receptor binding protein. For vaccine design, the full sequence of the spike protein of novel coronavirus is used to predict B-cell and T-cell epitopes. The select best T-cell epitopes based on antigencity used as peptides and docked with human allele protein.

Materials & Methods
The overall workflow of computational drug and vaccine design for spike protein summarised in figure 1

Sequence retrieval
Wuhan, china isolate (BetaCoV / Wuhan-Hu-1/2019) primary sequence of Spike (S)glycoprotein were extracted from the NCBI protein database [15] using the accession number QHD43416.1. For further analysis the sequence was downloaded as the FASTA sequence.

Physio-chemical, domain and structural analysis
Using the Protparam tool [16] physiochemical characterization such as molecular weight, theoretical PI, instability index and Grand average of hydropathicity (GRAVY) is calculated.. The spike protein used for domain predictions using the Conserved Domain Database (CDD) [17] and Inteproscan [18]. Signalpeptide, transmembrane helix and coils are predicted using signaP [19], THMMM and signal-TM [20]. The secondary structure was analysed through the SOPMA server with the default parameter. Subcellular localization was predicted using the Virus-mPLoc server [21]. The Spike protein analysed for binding regions and it is 3D structure determined using the phyre2 server [22].

Non-homology analysis and sequence conservation analysis
The retrieved s-glycoprotein checked to avoid cross-reactivity with the human host, blastp similarity search performed against the human refseq database. Conservancy analysis of the selected s-glycoprotein was searched in virus database available KEGG [23] by using blastP with default parameters. This conservation analysis is essential for determining the broad spectrum of drug target and epitope among all viruses.

Homology modelling of s-protein N-terminal as a drug target
The spike protein was modelled by the Swiss-modeller server [24] with a selected template from the Protein Data Bank (PDB) using the structure of the N-terminal domain (NTD)of SARS-CoV spike protein (PDB ID: 5X4S). The template is a high-resolution structure of the trimeric MERS-CoV and SARS-CoV proteins in its pre-fusion conformation by single particle cryo-electron microscopy [25].

Ligand retrieval
The search was carried out 3C-like proteinase inhibitors related to SARS coronavirus [26], since this novel corona virus is genetically closely to the SARS virus and in causing infection. The 9 natural inhibitors ligands were downloaded from pubchem is SDF format. Using openbabel tool all ligand in SDF format are converted to PDB format and optimized for further docking purposes.

Molecular docking of N-terminal Spike protein with natural inhibitors
The molecular docking was carried using autodock [27] available in the MTIopenscreen server. The homology modelled N-terminal spike protein was docked with 2-Thiophenenecarboxaldehyde,Cryptotanshinone, Dihydrotanshinone, Furfural, Methyl Tanshinonate, Miltirone, Tanshinone I,Tanshinone Iia and Tanshinone Iib ligands. Its server automatically pre-process the N-terminal spike protein by removing all hetero atoms and hydrogen atoms are added to the structure.

Antigenic protein identification
To predict the antigenic property of the selected protein, the VaxiJen V2.0 server[28] was used with a threshold greater than equal to 5. Grand average of hydropathicity (GRAVY) -0.079

Prediction of B-cell epitopes
The B cell epitope prediction will enable to find the potential antigen that would interact and initiate an immunoreponse with B lymphocytes. IEDB (Immume-Epitope Database an Analysis Resource) server [29] was used for B-cell eptiope prediction. The B-Cell epitopes were identified based on antigenicity, prediction of the linear epitope, hydrophilicity, accessibility of surface and flexibility. BepiPred linear epitope prediction from the Immune Epitope Database was used to predict linear B cell epitopes with a default threshold. While the prediction of candidate epitopes antigenicity sites was achieved for the identification using Kolaskar and Tongaonker antigenicity method with default threshold 1.027.

Prediction of T-cell eptiopes
MHC class-I peptides predicted using the Propred-1 server [30] and MHC class-II peptides predicted using the Propred tool. Both servers use a large number of human-leukocyte-antigens (HLA) for predicting epitopes.

Non-allegenicity and non-toxicity prediction of selected T cells epitopes
The selected T-cell epitopes were predicted for allergenicity by using the AllerCatPro server [31] and toxicity prediction using a toxpred tool.

Structural modelling and molecular docking of protein-peptide docking
T-cell epitopes of MHC class-I peptides were modelled using PEPFOLD server available RPBS MOBYL protal. The three-dimensional structure of the HLA-B7 allele (PDB ID: 3VCL) was retrieved from the PDB database and used as receptor protein. The 3D modelled peptide of the T-cell epitope was docked with HLA-B7 receptor protein using HPEPDOCK webserver [32] with default parameters.

Sequence & Structural analysis of Spike protein
The physiochemical properties calculation by using protparam showed that spike protein contained 1273 amino acids (aa) with a molecular weight of 141178.47 kDa, which represents a strong characteristic for both as a drug target and vaccine candidate. The target protein's theoretical isoelectric point (PI) was 6.24, suggesting its negative existence. An isoelectric point below 7 reveals a protein with negative charges. Of the 1353 residues, 110 aa were found to be negatively charged while others were found to be positively charged. Protparam estimated the instability-index (II) as 33.01, this protein grouped as stable. The other characteristics are shown in Table 1.
The domain search using CDD and interproscan spike receptor domain from 330-583 amino acids, S2 domain form 686-1270 and heptad repeat domain is between 1165-1212 amino acids as shown in Figure 2. The transmembrane predicted between transmembrane helix between 1214-1236 amino acids and coil predicted between 1176-1203 amino acids (Figure 2). Prediction of secondary structure using SOPMA [33] showed that 28.59 per cent alpha helix, 23.5 per cent extended strand, 3.38 per cent beta turn, 44.78 per cent random coil is present in structure as shown in Figure 3C. Gene ontology [34] predicted biological process Membrane fusion (GO:0061025), Receptor-mediated virion attachment to host cell (GO:0046813). Cellular Component predicted as Viral envelope (GO:0019031), Integral component of membrane (GO:0016021) and molecular function none. The whole spike protein was homology modelled by using the phyre2 server ( Figure 3A). Based on CASTp [35] analysis for binding pocket, the protein has area 6868.604 and volume 24275.249 ( Figure 3B).
The spike protein showed no significant matches with human protein by blast search. conservation analysis of spike protein through blastp search against viral genome showed Severe acute respiratory syndrome-related coronavirus[1] with 76% and with Bat coronavirus BM48-31/BGR/2008 [36] with 72% sequence identity ( Table 2). The homology modelled of N-terminal spike protein containing receptor binding protein checked for structural quality by using the Ramachandran plot. The homology model of protein shown in Figure 4A. The binding pocket of the N-terminal spike protein predicted using the CASTp server showing pocket regions in the receptor binding domain ( Figure 4B). The natural inhibitor tanshinone Iia ( Figure 4C) and methyl tanshinonate ( Figure 4D) shown the highest binding affinity score based on molecular docking with N-terminal spike protein (Table 3).

Antigenic prediction
The spike protein of novel coronavirus evaluated for antigenicity by using the Vaxijen 2.0 server with a threshold setting ≥0.5 for virus organisms. The antigenicity prediction of spike protein is 0.4646 showing the characteristic of the expected antigen.

Prediction of B-cell epitopes
The spike protein was predicted for the B-cell epitope using the IEDB server. over 30 B-cell were predicted and 8 epitopes were selected based on the antigencity score by using Vaxijen 2.0. The predicted epitope VLLPLVSSQCVNLTTRTQLPPAYTN showed a high antigencity score of 1.0555 between the position 6-30 as shown in Table 4.  The continuous B-cell epitopes predicted based on parameters such as hydrophilicity, flexibility, accessibility, turns, exposed surface, polarity and antigenic propensity. B-cell epitopes predicted for antigenic determinants based on Kolaskar and Tongaonkar antigenicity measuring tool. This tool predicts antigenic determinants based on physico-chemical properties with an accuracy of 75%. B-cell epitope predicted for hydrophilicity based on Parker's hydrophilicity prediction scale. This method based on retention times during hing-performance liquid chromatography of a reverse-phase column. The mobility of protein predicted based on Karplus and Schulz flexibility scale on the basis of the known temperature B factors. The rationale for predicting turns to predict antibody epitopes is based on the Chou and Fasman beta turn prediction.

Prediction of T-cell epitopes
Propred-I server used for prediction of the T-cell epitope of MHC Class-I. The peptide 'FVFLVLLPL' as best probable based on antigenic score 0.8601 predicted by Vaxijen out of 4 predicted peptides (Table 5). Properd server used for prediction of MHC class-II peptide and predicted 3 peptides, based on antigencity score of 0.8601 and profiling the peptide 'FVFLVLLPL' predicted to be the best probable peptide ( Table 6).

Profiling features of selected T cells epitopes
The selected best T-cell epitopes were checked for allergenicity by using allercatpro, toxicity, hydropathicity, hydrophilicity, hydrophobicity, and charge prediction using toxpred as shown in Table 7. Based on the docking of protein-peptide docking ( Figure 5A-D), the best selected epitope is FVFLVLLPL (Table 8).

Discussion
The domain search by CDD and interproscan predicted Spike receptor binding domain from 330-583 amino acids. Spike is a glycoprotein envelope that assists viral entry into the host cell [37]. Coronavirus S2 glycoprotein predicted from 686-1270 amino acids. Transmembrane prediction using TMHMM2.0 predicted outside 1-1213 aa, TMhelix between 1214-1236 aa and inside 1237-1273 aa. SingalP-TM predicted 1-18. Coil predicted between 1176-1203. Coronovirus spike glycoprotein, heptad repeat 2 domain predicted between 1165-1212 amino acids. This superfamily represents one of the two heptad repeat domains found in the S protein which involves in helix-helix interfaces [38]. The secondary structure analysis reveled that a greater number of the random coil is presently followed by helix structure. Through the disorder prediction there are no disordered residues present in the spike protein, the spike protein blast similarity analyses against the viral genome in the KEGG genome database revealed close to spike protein of SARS followed by bat coronavirus BM48-31/BGR/2008.  As of now the screening process confined to drugs & vaccines is under development and it will take several months to years to come into the market The virus of 3C-like protease (3CLpro) responsible for pathogencity by replicating the viral genome upon the host cell infection. [39]. The clevage function of 3CLpro can be blocked by using inhibitors thereby inhibiting viral replication. so, it is considered as attractive targets for anti-viral drugs. Accordingly, inhibitors that block the cleavage function of 3CLpro can be expected to inhibit virus replication, making this enzyme one of the most attractive targets for anti-nCOV. To speed up the screening process in view of the ongoing nCov outbreak, the virtual screen process was carried out in search of natural 3CLpro natural inhibitors in NPASS (Natural Product Activity & Species Source Database). NPASS [40] is a comprehensive database that provides unique natural products isolated from source organisms with activity records on targets. By virtual screening of 9 compounds, Tanshinone Iia [41]  In this study using spike protein of noval coronavirus predicted B-cell and T-cell epitopes for vaccine development. The epitopes predicted by using IEDB webserver using parameters like solvent accesibility, flexibility, antigencity. The predicted B-cell epitope is VLLPLVSSQCVNLTTRTQLPPAYTN and T-cell epitope of MHC-I (FVFLVLLPL) and MHC-II (FVFLVLLPL). The predicted epitopes checked for allergencity and toxicity. Based on docking score peptide FVFLVLLPL considered as the best probable T-cell epitope considered for a vaccine candidate. The predicted epitopes from this study can be further tested for therapeutic potency for vaccine development for novel coronavirus.

Conclusion
In the present study, both drug and vaccine design was applied to identify drug and vaccine candidates. This approach will be cost effective and can save time in the design of drugs and epitopes. The suggested drug inhibitors and vaccine candidates from this study will help to develop anti-viral and vaccine which may help to control this global threat.

Funding:
No funding for this study