An Integrated In-Silico Approach to Develop Epitope-Based Peptide Vaccine against SARS-CoV-2

SARS-CoV-2 has been the talk of the town ever since the beginning of 2020. The pandemic has brought the complete world on a halt. Every country is trying all possible steps to combat the disease ranging from shutting the complete economy of the country to repurposing of drugs and vaccine development. The rapid data analysis and widespread tools, software and databases have made bioinformatics capable of giving new insights to the researchers to deal with the current scenario more efficiently. Vaccinomics, the new emerging field of bioinformatics uses concepts of immunogenetics and immunogenomics with in silico tools to give promising results for wet lab experiments. This approach is highly validated for the designing and development of potent vaccines. The present in-silico study was attempted to identify peptide fragments from spike surface glycoprotein that can be efficiently used for the designing and development of epitopebased vaccine designing approach. Both B-cell and T-cell epitopes are predicted using integrated computational tools. VaxiJen server was used for prediction of protective antigenicity of the protein. NetCTL was studied for analyzing most potent T cell epitopes and its subsequent MHCI interaction through tools provided by IEDB. 3D structure prediction of peptides and MHC-I alleles (HLA-C*03:03) was further done to carry out docking studies using AutoDock4.0. Various tools from IEDB were used to predict B-cell epitopes on the basis of different essential parameters like surface accessibility, beta turns and many more. Based on results interpretation, the peptide sequence from 1138-1145 amino acid and sequence WTAGAAAYY and YDPLQPEL were obtained as a potential B-cell epitope and T-cell epitope respectively. This insilico study will help us to identify novel epitope-based peptide vaccine target in spike protein of SARS-CoV-2. Further, in-vitro and in-vivo study needed to validate the findings.


Introduction
The sudden outbreak of disease COVID-19 has brought the complete world under lockdown. The disease has led to a worldwide concern ever since it has originated [1]. The pandemic has affected lakhs of people all across the globe and thousands of people have lost their lives because of the deadly disease. SARS-CoV-2, the causative agent of COVID-19 is believed to originate from Wuhan, China and spread in all corners of the world [2][3][4]. Today, repurposing drugs and development of vaccines against SARS-CoV-2 has become an urgent need and lies as a challenge before the researchers of all countries. With the advancement in informatics-based approaches and time-consuming development process of the conventional methods, the field of vaccinomics for the development of vaccines has gained much popularity. Vaccinomics is an emerging field of biotechnology that integrates bioinformatics with immunogenomics and immunogenetics [5]. Peptide vaccines refer to an alternative vaccine strategy, that uses peptide fragments to elicit a strong immune response against a pathogen, instead of using whole organism for the purpose [6]. The use of peptide vaccines is growing ever since with the advances in biotechnology, because of the in silico tools and databases that give safe and promising results [7,8]. In such a pandemic situation, when the world is fighting against coronavirus, vaccinomics approach for predicting potent peptide vaccines against SARS-CoV-2 can help.
Human coronavirus is a member of Coronaviridae family and is known to contain a large enveloped, single stranded, positive sense RNA of approximate 30kb length. Entry of virus into the human cell is one of the most essential steps for the spread of COVID-19. SARS-CoV-2 is known to bind with angiotensin-converting enzyme 2 (ACE2) of human cell as its receptor in order to gain entry into the cell. This virus encodes a surface glycoprotein known as spike protein [9]. The receptor binding domain (RBD) of the spike protein is responsible for binding with the host cell receptor (ACE2) in order to mediate the entry of the virus into the cell. The entry of virus into the host cell is further facilitated by the release of spike fusion peptide formed due to cleavage of spike protein by the host protease [10,11]. Therefore, it is suggested the use of potent peptide fragments of spike glycoprotein of SARS-CoV-2 as peptide vaccine against COVID-19. The current in-silico study was attempted to identify peptide fragments from spike surface glycoprotein that can be efficiently used for the designing and development of epitopebased vaccine using integrated computational tools for vaccine development. Figure 1: Complete methodology adopted for the study

Data collection and identification:
The National Centre for Biotechnology Information (NCBI) is a meta-database comprising of a series of databases that act as an important tool and resource in the field of biotechnology and biomedicine.
On the basis of literature survey, spike glycoprotein present on the surface of SARS-CoV-2 was selected as the target protein. Spike protein is known to attach to the receptor protein of host, ACE2.The sequence of spike protein of SARS-CoV-2 was retrieved from NCBI having Accession ID QIA20044. The protein was identified to be 1273 amino acids long.

Antigenic protein identification
VaxiJen v2.0 server [12] is an in silico tool for prediction of antigens and subunit vaccines. Default parameters of this server were used to identify most potent antigenic peptides from the protein sequence.

T-cell epitope identification and conservancy analysis
NetCTL 1.2 server [13] is an in silico tool used for identification of T-cell epitopes. The prediction method adopted for identifying T-cell epitopes comprised of peptide major histocompatibility complex class I (MHC-I) binding, proteasomal C terminal cleavage and TAP transport efficiency. The threshold parameter for prediction was set to 0.75 to obtain 0.80 sensitivity and 0.97 specificity. IEDB combined predictor tool was used to analyze MHC class I interaction with T-cell epitopes. IEDB analysis resource used stabilized matrix method (SMM) for the prediction [14]. The half maximal inhibitory concentration (IC50) was set as a parameter for selection. Those epitopes were chosen which had IC50< 200nM for their binding to class I molecules. Prior to the run, the length of peptides was set at 9.0.

Allergenicity assessment
Web based AllerCatPro tool [15] was used to determine the allergenicity of the predicted antigens. Since, peptide antigens act as a foreign substance for human body, it is a very vital step to predict the allergenicity of the epitopes to ensure that the epitopes do not cause any inflammation or allergic reactions in an individual.

Designing three-dimensional (3D) structure of epitope and MHC-I alleles
PEP-FOLD3 [16] is a web-based server which was used to predict the 3D structure of the most potent epitope in order to carry out docking studies in future. PEP-FOLD3 predicted five most provable structures. The model which had the lowest energy, was selected for further analysis.
The structure of MHC-I allele was unavailable in PDB; therefore, it was necessary to design the structure of the allele. UniProt was used to determine the protein sequence of each MHC-I allele. Further, the three-dimensional structure of the allele was designed using an in silico tool, SWISS-MODEL [17]. The validation of the predicted structure was done using Procheck and ERRAT.

Docking studies
The T-cell epitope which showed affinity with maximum number of MHC alleles was selected and docked using AutoDock4.0 [18] with specific MHC class I alleles in order to visualize the bonding of the antigen with the MHC so that the antigen can be presented to T-cell in human body and elicit immune response against it. The docked model was visualized using ICM Browser [19].

B cell epitope identification
B cell epitopes are identified to predict potential antigens that can interact with B lymphocytes and generate an immune response. IEDB tools were utilized to predict B cell epitopes were identified on the basis of parameters like antigenicity, flexibility, hydrophilicity, prediction of linear epitope and accessibility of surface [20]. BepiPred linear epitope prediction analysis [21], Kolaskar and Tongaonkar antigenicity scale, Emini surface accessibility prediction, Chou and Fasman prediction tool and Karplus and Schulz flexibility prediction tools provided by IEDB were used to predict flexibility, surface accessibility, linear epitopes and hydrophilicity [22].

Antigenic protein prediction
VaxiJen server was used to predict antigenic properties of the protein. The server predicted spike protein of SARS-CoV-2 as antigenic having an overall prediction score of 0.4671.

T cell epitope identification
The NetCTL server identifies the potent T cell epitopes from the given protein sequence in a preselected environment. Based on high combinatorial score, the best five epitopes were selected for the further study. Table 1 list down the selected epitopes along with their score.

MHC class I interaction analysis
The interaction of MHC-I allele with the five best epitopes was predicted through stabilized matrix base method (SMM). Those MHC-I alleles were selected for which the epitopes showed higher affinity (IC50<200nM). Table 2 lists down the five best epitopes along with their interacting MHC-I alleles, their total processing score and IC50 value.
The epitopes predicted retained the highest conservancy of 100.00% as observed from IEDB conservancy analysis tool. The results obtained are tabulated in table 3. The overall score predicted by IEDB defines the intrinsic potential of each peptide to be a T cell epitope. Higher overall score denotes higher processing capabilities of each peptide. Among the five T cell epitopes, 9mer WTAGAAAYY epitope was found to interact with maximum number of class I MHC alleles with high affinity.

The 3D Structure Prediction and Validation of HLA
The 3D structure of HLA-C*03:03 protein was modeled using online server SWISS-MODEL (Figure2A). Procheck and Errat server was used to validate the modeled structure of HLA. The Ramachandran plot showed 91.3% residues of the protein in the favored region, reflects the good stability and reliability of predicted structure (Figure2B). The ERRAT plot error frequency rate of 89.5911 indicates good quality rate of structure (Figure2C).

Allergenicity prediction of epitopes
Since, the T-cell epitopes are foreign to the host body, it is very essential to predict if the selected vaccine fragments are susceptible to cause certain allergic reactions in the host body. Therefore, allergenicity prediction of all 5 T-cell epitopes was done using AllerCatPro and it was predicted that there were no evidences about the specific peptide sequences arousing any allergic reactions into the host body.

Prediction of 3D structure of Epitope
The 3D structure of the epitope WTAGAAAYY was modelled using in silico tool, PEP-FOLD3. Figure 3 shows the 3D structure of the epitope obtained.

B cell epitope identification
Different analysis methods were used for the prediction of continuous B cell epitopes on the basis of different parameters.
The Kolaskar and Tongaonkar method is an antigenicity prediction approach for the analysis of antigenicity of protein on the basis of physicochemical properties of amino acids and their abundance in already known epitopes. The average antigenic propensity of the protein was 1.041, with maximum of 1.256 and minimum of 0.881 antigenic propensities. The antigenic determination threshold for the protein was 1.00; therefore, all the values greater than 1.00 were determined as potential antigens. Out of 49 predicted antigens, 14 antigens were found to have score above the threshold. Figure5 depicts the graph for antigenic propensity vs sequence position where X-axis represents sequence position and Y-axis represents antigenic propensity score. Table 4 lists down the predicted B cell epitope peptides obtained using Kolaskar and Tongaonkar antigenicity prediction method. Surface accessibility is another important parameter for a potent B cell epitope. The prediction of surface accessibility of protein was done using Emini surface accessibility prediction method. The threshold value for the prediction was set to 1.000. The regions from 807 to 817 amino acid residues were found to be more accessible. Out of 24 predicted peptides, 17 peptides had score above the threshold value. Figure 6 depicts the graph of surface probability (Y axis) vs sequence position (X axis). Table 5 lists down peptide fragments with most accessibility.  Chou and Fasman beta turn prediction was done to analyze beta regions of the protein because beta regions are often accessible and considerably hydrophilic in nature. Both these properties are antigenic regions of a protein. The region from 249 to 259 was predicted to be beta-turn regions. Figure7 depicts the graph of score (Y-axis) vs position (X-axis). Table 6 lists down the top 5 epitopes on the basis of beta turn prediction. Since, flexibility of the peptide is correlated to antigenicity, therefore, Karplus and Schulz flexibility prediction method was used to determine the flexibility of the peptide. The region from 250 to 258 was found to be most flexible, threshold being set at 1.000. Figure8 represents the graph obtained from Karplus and Schulz prediction method wherein Y-axis represents the score and X-axis represents sequence length. Table 7 lists down top five epitopes which were predicted to be more flexible.  Finally, the last parameter taken into consideration for prediction of B-cell epitope was Bepipred linear epitope prediction tool 2.0. The tool is based on Hidden Markov Model and is one of the best methods for prediction of linear B-cell epitopes. Method predicted total 33 peptides that had score above the threshold value. Table 8 lists down the B-cell epitopes predicted by Bepipred linear epitope prediction tool 2.0. Figure 10depicts the graph obtained by the prediction with Score at Y-axis and sequence position at X-axis, the threshold being set at 0.500.

Discussion
Covid 19 emerged out to be severe health threat to world population. Currently there is no effective cure measurements are available. There is no widely used vaccines are available for the deadly virus despite so many attempt by scientific community around the world. Hence, there is a need of more innovative approaches for better diagnosis and treatment options to stop the development and progression of the disease. The current in-silico study was attempted to identify peptide fragments from spike surface glycoprotein that can be efficiently used for the designing and development of epitope-based vaccine using integrated computational tools for vaccine development. Interpreting the results obtained using different in silico tools and databases demonstrate that the epitope sequence WTAGAAAYY binds to more number of MHC alleles and is not allergic to human beings. Thus, WTAGAAAYY epitope sequence can be used as a T cell epitope for designing of peptide vaccine against SARS-CoV-2. Moreover, the results obtained from IEDB suggest that the protein sequence from 1138-1145, that is, YDPLQPEL had allergenicity, surface accessibility and linear epitope prediction above the threshold value. Also, beta turn prediction score of the peptide sequence was accessed to be 1.136 and the flexibility of the peptide was obtained to be 1.029. The peptide sequence, YDPLQPEL, follows all the essential parameters of a B-cell epitope. There were no clear evidences found for the sequence to exhibit allergenicity effects on the host body. Therefore, the present study concludes that the epitope sequences WTAGAAAYY and YDPLQPEL can be used as possible candidates for epitope-based peptide vaccine designing for COVID-19.