Immunoinformatics-Guided Designing of an Epitope-Based Vaccine Against Severe Acute Respiratory Syndrome-Coronavirus-2 Spike Glycoprotein

Currently, with a large number of fatality rates, coronavirus disease-2019 (COVID-19) has emerged as a potential threat to human health worldwide. It has been well-known that severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) is responsible for COVID-19 and World Health Organization (WHO) proclaimed the contagious disease as a global pandemic. Researchers from different parts of the world amalgamate together inquest of remedies for this deadly virus. Recently, it has been demonstrated that the spike glycoprotein (SGP) of SARS-CoV-2 is the mediator behind the entrance into the host cells. Our group has comprehensibly analyzed the SGP of SARS-CoV-2 through multiple sequence analysis along with the phylogenetic analysis. Further, this research work predicted the most immunogenic epitopes for both B-cell and T-cell. Notably, we focused mainly on major histocompatibility complex (MHC) class I potential peptides and predicted two epitopes; WTAGAAAYY and GAAAYYVGY, that bind with the MHC class I alleles which are further validated by molecular docking analysis. Furthermore, this study also proposed that the selected epitopes were shown availability in a greater range of the population. Hence, our study comes up with a strong base for the implementation of designing novel vaccine candidates against SARS-CoV-2, however adequate laboratory works will need to be conducted for the appropriate application.


Introduction
Pandemics caused by many life-threatening human pathogens have played a significant role in shaping human history throughout the ages. The world is currently battling a global pandemic recombination. These mutations lied on the surface of the protein induce its sustainability and leave the immune system in the blind spot which makes the current COVID-19 infection more superior than other previous strains (6). Among the four genera (alpha , beta, gamma, delta) of coronaviruses, the beta-coronaviruses may cause severe disease and fatalities to the human (7).
Including SARS-CoV-2, seven subtypes of coronaviruses were identified in the last few decades that can infect humans.
SARS-CoV-2 differs from other beta coronaviruses in terms of its significantly higher infectivity and low mortality rate. It belongs to the B lineage of the beta-coronavirus in the Nidovirales order, Coronaviridae family, and Orthocoronavirinae subfamily. After the two previously reported coronavirus-SARS and MERS, this is the third zoonotic coronavirus breakout of the 21 st century and closely related to its predecessors (8,9). As it is a zoonotic virus, there will be an intermediate host by which it can be transmitted to a human. Preliminary investigations predict that it was caused due to zoonotic transfer from bats (9). Primarily, ome environmental specimens of the Huanan wet market in Wuhan were deemed positive for this infection but no specific association with an animal is confirmed yet based on the WHO report (3).
As this contagious disease is mainly a respiratory disease, it might affect only the lungs in most of the cases. The infection is transmitted between people during close contact, which occurs via spraying of droplets from the infected individual by coughing or sneezing (10,11). These droplets usually fall into the surfaces after breathing out and touching the contaminated surface followed by touching other parts of the face like eyes, nose, or mouth may result in spreading of the infection (12,13). The symptoms of this coronavirus may range from mild or moderate fever to severe pneumonia. The time from exposure to onset of symptoms may range from 2-14 days with an average of five days. The spreading of the infection may be possible before symptoms appear. The clinical pathology greatly resembled its two predecessors with less upper respiratory and gastrointestinal symptoms. The most common symptoms or combinations of symptoms including fever, fatigue, dry cough, dyspnea, alveolar edema, sore throat, new loss of taste or smell and shortness of breath (14). Older people with medical comorbidities or multi-organ failure such as hypertension, cardiovascular disease, and diabetes are more likely to get infected with worse outcomes (14). Severe cases can lead to cardiac injury, respiratory failure, acute respiratory syndrome, hepatic injury, neurological complications, and death (15).  (17). Among the structural proteins, the S protein is a multifunctional molecular machine that can attach to the specific human host receptor, ACE2 on the surface of human cells and mediates entry of viral particles into the host cells with the aid of its protease which cleaves the Spike protein into S1 and S2 subunits (18,19). The actual binding to the ACE2 receptor on the host cell surface occurs through the RBD found in the S1 subunit of the virus. After that, viral and host membranes are fused by the S2 subunit. The viral genome RNA is released into the cytoplasm after this membrane fusion. So for developing neutralizing antibodies the RBD of SARS-COV-2 SGP might be an ideal target.
The identification of B-cell and T-cell epitopes for spike glycoproteins are critical for developing an effective vaccine. Although humans may mount an antibody response against viruses normally, only neutralizing antibodies can block the entry of viruses into human cells completely 6 (20). Antibody binding sites location on a viral protein strongly affects the body's ability to produce neutralizing antibodies (21). It is crucial to find out whether SARS-CoV-2 has any potential antibody binding sites (B-cell epitopes) with its known human entry receptor, ACE2 near their interacting surface.
Apart from neutralizing antibodies, the human body also relies on cytotoxic CD8 + T-cells and helper CD4 + T-cells to eliminate viruses from the body. The presentation of a peptide will allow Effective promiscuous epitopes binding to a variety of HLA alleles for wider dissemination with no human cross-reactivity is crucial since COVID-19 infection has affected populations all over the world. Our present study has been embarked upon with the clear objective of designing an epitope-based peptide vaccine against COVID-19 infection using in silico methods and considering SGP of the SARS-CoV-2. Here, we targeted the epitopes in the SGP because of their lingering immune response reported against SARS coronavirus previously (26). For the 7 identified epitopes, we incorporated the information on the associated MHC alleles so that we can provide a list of epitopes that would maximize population coverage across the world.
Therefore, we designed an epitope-based peptide vaccine in the quest for finding potent targets against SARS-CoV-2 SGP using different computational tools with an expectation that the wet laboratory research will validate the outcome of our investigation.

Materials and Methods
The methodologies used for peptide vaccine development for SARS-CoV-2 SGP are shown in

Protein sequence retrieval and sequence analysis
The SARS-CoV-2 SGP sequence was extracted from UniProt database (UniProt entry: P0DTC2) in FASTA format (24). The understanding of the features, function, structure, and evaluation is mainly based on the process of sequence analysis which depicts the process of subjecting DNA, RNA, or peptide sequences to wide ranges of analytical methods. We implied BLASTp that screens homologous sequences from its database and selected those sequences that are more similar to our SARS-CoV-2 SGP; we also performed multiple sequence alignment (MSA) using the ClustalW web server with default settings and a phylogenetic tree was established using Clustal tree format using EMBL-EBI web server (25).

Protein antigenicity prediction
To determine the potent antigenic protein of the SARS-CoV-2 SGP, we used the online server VaxiJen v2.0, with a default threshold value (26). All the antigenic proteins of SARS-CoV-2 SGP with their respective scores were obtained then sorted in Notepad++.

Protein secondary and tertiary structure prediction
The secondary and tertiary structure of the SARS-CoV-2 SGP was predicted by using Phyre2 web server, which create a model protein from the given sequence and predict both 2D and 3D structure of the protein (27). The three-dimensional model was validated using PROCHECK and ERRAT web servers (28,29).

CD8 + T-cell epitope identification
NetCTL 1.2 server was used in this experiment for the identification of the T-cell epitope, using a 0.95 threshold to maintain sensitivity and specificity of 0.90 and 0.95, respectively (30). The tool expands the prediction for 12 MHC-I supertypes and unified the prediction of peptide MHC-I binding, proteasomal C-terminal cleavage with TAP transport efficiency. These predictions were performed by an artificial neural network, weighted TAP transport efficiency matrix and a combined algorithm for MHC-I binding and proteasomal cleavage efficiency was then used to determine the overall scores and translated into sensitivity/specificity. Based on this overall score, ten best peptides (epitopes) were selected for further evaluation.
For the prediction of peptides binding to MHC-I, we used a tool from the Immune Epitope Database (IEDB) and calculate IC50 values for peptides binding to specific MHC-I molecules.28 For the binding analysis, all the frequently used alleles were selected with a word length of nine residues and binding affinity < 200 nm for further analysis. Another tool (named as MHC-NP) provided by the IEDB server was used to assess the probability that a given peptide was naturally processed and bound to a given MHC molecule (31).

Epitope conservancy and immunogenicity prediction
The degree of similarity between the epitope and the target (i.e. given) sequence is elucidated by epitope conservancy. This property of epitope gives us the promise of its availability in a range of different strains. Hence for the analysis of the epitope conservancy, the web-based tool from IEDB analysis resources was used (32). Immunogenicity prediction can uncover the degree of influence (or efficiency) of the respective epitope to produce an immunogenic response. The Tcell class I pMHC immunogenicity predictor at IEDB, which uses amino acid properties as well as their position within the peptide to predict the immunogenicity of a class I peptide MHC (pMHC) complex (33).

Prediction of population coverage and allergenicity assessment
The population coverage tool from IEDB was applied to determine the population coverage for every single epitope by selecting HLA alleles of the corresponding epitope.
Allergenicity of the predicted epitope was calculated using AllerTop v2.0, which is an alignment-free server, used for in silico based allergenicity prediction of a protein-based on its physiochemical properties (34).

Epitope model generation
A web-based server, PEP-FOLD were used to predict the 3D structures of the selected epitopes (35). For each sequence, the server predicted five probable structures. The energy of each structure was determined by SWISS-PDB VIEWER and the structure with the lowest energy was chosen for further analysis (36).

Retrieval of HLA allele molecule
The three-dimensional structure of the HLA-B*15:25 was not available on Protein Data Bank (RCSB-PDB). We selected homology modeling using SWISS-MODEL web server to generate the three-dimensional structure of the HLA-B*15:25 (Accession id: HLA00188) (37). The validation of the predicted structure was done using PROCHECK, VERIFY 3D, ERRAT (28,29,38).

B-cell epitope identification
The prediction of B-cell epitopes was performed to find the potential antigen that assures humoral immunity. To detect B-cell epitope, various tools from IEDB were used to identify the B-cell antigenicity, together with the Emini surface accessibility prediction, Kolaskar and Tongaonkar antigenicity scale, Karplus and Schulz flexibility prediction, and Bepipred linear epitope prediction analysis (43)(44)(45)(46). Since antigenic parts of a protein belonging to the beta-turn regions, the Chou and Fasman beta-turn prediction tool was also used (47). Results

Sequence retrieval and analysis
In this study, the protein sequence of the SARS-CoV-2 SGP was retrieved from the UniProt database and then performed BLASTp. From plenty of homologues, we have selected only 17 homologues having more than 60% identical sequences. MSA were performed (Supplementary Data-1) and a phylogenetic tree was constructed ( Figure S1). The findings from the MSA and phylogenetic data documented that the protein sequences have a closer relationship.

Antigenic protein prediction
The most potent antigenic protein of SARS-CoV-2 SGP was predicted by VaxiJen v2.0, which is based on the auto-cross covariance transformation of protein sequences into uniform vectors of principal amino acid properties. The overall antigen prediction score was 0.4683 (probable antigen) at 0.4 threshold value.

Protein structure prediction and validation
In this study, we determined the secondary structure and tertiary of the SARS-CoV-2 SGP using Phyre2 web server, which depicts that 26% of the residues were from α-helix, whereas 37% of the residues were from β-strands. Previous research has already unveiled that the antigenic part of the protein mostly remains as β-sheet. Moreover, we carried out the homology modeling of SARS-CoV-2 SGP using Phyre2 web server. Ramachandran Plot analysis using PROCHECK web server showed that the modeled SARS-CoV-2 SGP has 74.7% residues in most favored region and 25% residues in the allowed region ( Figure S2) and ERRAT predicted the quality factor of 68.5484 (Data not shown).

CD8 + T-cell epitope identification
On the basis of highest combinatorial score and MHC class I binding, top twenty three epitopes were predicted by NetCTL 1.2 server from the given protein sequence. Further, the antigenicity of each selected peptides was predicted by VaxiJen v2.0 server and we found that total ten epitopes has shown being antigenic (Table S1) (Table 1). In addition, two other epitopes, namely GAAAYYVGY and STQDLFLPF interacted with MHC I alleles, the former interacted with five HLAs and the number was exhibited nine in case of later, but, like WTAGAAAYY, both the epitopes interacted with HLA-B*15:25, where the tendency of binding were mostly shown by epitope GAAAYYVGY, whereas the epitope WTAGAAAYY delineated greatest immunogenicity predicted by the I-pMHC immunogenicity prediction analysis (Table 1).
Furthermore, all the predicted epitopes had a maximum identity for conservancy hit and 100% maximum identity was found (Table 1). Therefore, we considered the aforementioned three epitopes for further analysis. In addition, MHC-NP finding unleashed that the epitopes were bound with the MHC class I alleles naturally (Table S2).

Population Coverage
The cumulative amount of the population coverage was obtained for the three predicted epitope,

Allergenicity assessment
The AllerTop server was used for the identification of the allergic reaction caused by a vaccine in an individual which might be harmful or life-threatening. The allergenicity of the selected epitope was calculated using the AllerTop tool and predicted as probable non-allergen.

Molecular docking analysis for HLA and epitope interaction
In this experiment, the validation of the interaction between the HLA molecules and our predicted potential epitope was done by molecular docking simulation using Autodock Vina in residues from allowed region (Figure 2). Further, the ERRAT server predicts the overall quality for the non-atomic bond interactions and the score found for the modeled HLA-B*15:25 allele was 95.4717, which was greater than the threshold value of 50 ( Figure 2). Finally, the validation of sequence-to-structure agreement was depicted by VERIFY 3D. The results from the VERIFY 3D server for the modeled HLA-B*15:25 alleles asserted pass, showing that 98.91% of the residues have averaged 3D-1D score more than 0.2 (Figure 2).
For docking analysis, we considered the modeled HLA molecule as receptor and the selected three epitopes as ligand molecules. The results of the docking experiments showed that the epitope GAAAYYVGY binds with HLA-B*15:25 allele with the greatest binding affinity which was calculated as -8.8 kcal/mol, and the binding affinity of the epitope WTAGAAAYY were almost equal to the epitope GAAAYYVGY, but the binding affinity of the epitope STQDLFLPF were found only -7.8 kcal/mol (Table 3). From the visualization of the docking results, it has been lucid that the epitope WTAGAAAYY formed conventional hydrogen bonds (H-bonds) with five amino acid residues, including, Asn94, Arg121, Trp171, Glu176, Gln179 and pi-pi stacking with Tyr123 and Tyr183 residues with an unfavorable bond with Ser148 respectively (Figure 3 and Figure S7). However, no unfavorable bond was visualized for the epitope GAAAYYVGY and more H-bonds were visualized for the epitope GAAAYYVGY as well as attractive charges with Lys170 and Arg121 residues (Figure 4 and Figure S8).

B-cell epitope prediction
In the current study, by utilizing amino acid scale-based method, the B-cell epitope was identified. Different analysis methods were used for the prediction of continuous B-cell epitope, which were shown in Table 4, Figure 5 and Figures S9-S10.
First of all, Bepipred linear epitope prediction was used, which is regarded as the best single method for predicting linear B-cell epitopes using a Hidden Markov model. Our analysis revealed that the peptide sequences from 805 to 816 amino acid residues were able to induce the desired immune response as B-cell epitopes.
The β-turns were predicted by Chaus and Fasman β-turn prediction method. The region 807-813 residues were predicted as a β-turn region with a score of 1.484, which was higher than the average score. Previously, it has been well recognized that the development of vaccines was rudimentarily dependent on B-cell immunity, but recently, it has been well established that T-cell epitopes provide more long-lasting immune response which is mainly mediated by CD8 + T-cells and as a result of antigenic drift (53). In the present study, concentrating on MHC class I potential peptide epitopes, we were mainly predicted not only T-cell but also B-cell epitopes, that are capable of showing immune responses in several ways. Many criteria are needed to consider while identifying a protein sequence-based epitope into a vaccine candidate of which allergenicity is regarded as one of them. We served the SARS-CoV-2 SGP sequence into the VaxiJen server and predicted the antigenicity of the protein sequence. Further, a total of ten potent epitopes have been predicted from the NetCTL 1.2 server and the epitopes were further taken for the progressive analysis which also showed MHC class I interaction respectively. Besides, all peptides except GAEHVNNSY were able to interact with the MHC class I alleles, and Importantly, the B-cell epitope elicits a stronger immune response but does not cause any side effects. As a corollary we also predicted the B-cell epitope utilizing the IEDB database and found that the protein sequences from 805-816 residues as B-cell epitope. The predicted region mayhap stimulate the potential immune response and presumably important for the development of a vaccine candidate.

Conclusions
The improvement in the field of immunoinformatics has become a potential field for predicting peptide-based vaccines. Viruses show not only T-cell but also humoral immunity. As a result, our predicted epitope presumably improves immunity against SARS-CoV-2. The presumption is based on the fundamental principles of immunity, that confers the attachment of foreign bodies with the host immune cell, provoking immune responses, and transfers the relevant information to T-cell and B-cell. In the present study, our investigated epitopes simulate the interaction to CD8 + cells antigen presentation utilizing in silico approaches. However, the present study is a preliminary approach for predicting epitope-based vaccine against SARS-CoV-2 and we hope that the predicted epitope will pave the way for further laboratory analysis for designing novel vaccine candidates against SARS-CoV-2.

Conflict of Interest
The authors report no conflicts of interests in this work.

Funding
This work is conducted with the individual funding of all authors.

Data availability statement
The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.   1 The potential CD8 + T-cell epitopes along with their interacting MHC class I alleles and total processing score, epitopes conservancy hits and pMHC-I immunogenicity score      The interacting residues were shown as ball and stick, conventional hydrogen bonds were shown as green line, pi-pi/pi-alkyl stacking were shown as pink lines, unfavorable bumps were shown as red lines. The interacting residues were shown as ball and stick, conventional hydrogen bonds were shown as green line, pi-pi/pi-alkyl stacking were shown as pink lines, carbon-hydrogen bonds were shown as white lines, attractive charges were shown as golden lines.

FIGURE 5
Combined B-cell linear epitope prediction showed the region from 803 to 816 amino acid residues had the highest antigenic propensity for B-cell linear epitopes.
Surrounded by six differently coloured lines, which cover the region 800-820 amino acid residues in SARS-CoV-2 SGP, each line indicating different analysis methods with the maximum scores.

Supplementary data
Supplementary data 1 Multiple sequence alignment of SARS-CoV-2 spike glycoprotein.

Supplementary tables
TABLE S1 Antigenicity prediction of selected epitopes using VaxiJen 2.0 server