Binding epitopes of 2019-nCoV proteins for perspective diagnostic and therapeutic applications: insights from computational approaches

The appearance of the novel betacoronavirus 2019-nCoV represents a major threat to human health, and its diffusion around the world is predicted to have dramatic economic consequences. The knowledge of the 3D structures of 2019-nCoV proteins can facilitate the development of diagnostic and therapeutic molecules. Herein, we apply our energy-based method for the prediction of potential epitopes on viral proteins to design peptide-based molecules that can subsequently be used in diagnostic and therapeutic applications. We discuss these aspects in the paper. The designs have not been tested. Our aim is to share information that can be useful in the development of novel biomolecules with potential interesting activities against 2019-nCoV. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 March 2020 doi:10.20944/preprints202003.0221.v1 © 2020 by the author(s). Distributed under a Creative Commons CC BY license.


Introduction
The outbreak of the new betacoronavirus 2019-nCoV represents a world-wide epidemic threat that has been causing major disruptions in societies and economies [1][2] . For these reasons the new virus has been internationally declared a public health emergency.
One of the factors that has favored the diffusion of the infection has been the lack of efficient diagnostic methods able to reveal 2019-nCoV infection. Initial symptoms are often mild and common to the ones of seasonal influenza 3 . Further, it is becoming increasingly clear that there may be a large number of asymptomatic (and potentially undiagnosed) patients, whose engagement in everyday activities could have favored the diffusion of the virus, especially in densely populated areas.
In this context, serological assays can be proficiently used for the large-scale screening of people who may have come into contact with the virus as well as for epidemiological surveillance. These approaches can be aimed at revealing antibodies (Abs) elicited in the host in response to the infection or/and at using specifically raised/selected antibodies to detect relevant 2019-nCoV antigens. In general, serological approaches of this kind can be simple and quick to perform and could aptly represent a valid complement to swabs followed by molecular nucleic acid-based methods, currently used as the front-line response to detect the virus in the window period in which host B-cell responses have not yet been elicited. Progress in this field requires new molecular probes (immunoprobes) with facile synthetic accessibility and the potential to function both as antigen-mimicking baits to target virus-specific Abs in patients' fluids, and as immunogens to generate new Abs able to detect relevant antigens 4-5 .
The first step in this direction is the prediction/identification of the substructures on antigenic proteins that have a high probability of being immunoreactive and bind Abs. Such substructures are known as epitopes. The subsequent synthesis of mimics of the epitope sequences in the form of peptides represents a simple and versatile implementation of the prediction into usable molecules, and permits to overcome the technical limitations associated to the production, use and management of whole protein antigens. The synthetic peptide mimics of the designed epitopes can then be tested for immunoreactivity against sera or other fluids from diverse groups of patients, namely infected, recovered, asymptomatic, and healthy controls. In this context, they can be aptly modified by attaching biotin or click-chemistry reactive groups for oriented display on Elisa plates, microarrays, or other devices 6 . Biotin and the click-chemistry reactive groups can also be separated from the terminal by the presence of spacers (e.g. PEG). In a complementary way, the synthetic mimics can be used to generate antibodies that specifically target the cognate protein to which the predicted epitope belongs. As peptides alone are known to be generally too small to elicit an immune response sufficient to generate antibodies, they can be easily conjugated to carrier proteins (KLH, Ovalbumin, BSA, Multiple Antigen Peptides (MAPS)) that stimulate T-helper cells, ultimately inducing the B-cell response that generates the peptide(epitope)-specific antibodies. At variance, one advantage of using peptides is that they can be conjugated to high density delivery systems, such as liposomes, micelles or gold nanoparticles for possible vaccinological applications [7][8][9] . This allows on the one hand to increase the concentration of administered peptides to stimulate the immunogenic response, and on the other hand, it raises the possibility to deliver two or more peptides simultaneously attached to the same particle. This last possibility is particularly useful when more than one epitope sequence is need to elicit a proper stimulation of the immune system, as in the case of conformational epitopes that are spatially close but far in sequence.
Structural knowledge of the proteins of 2019-nCoV can set the stage to design epitope mimics that function both as diagnostic probes and as immunoreagents to elicit antibodies. In a therapeutic perspective, the prediction of potential interaction regions on 2019-nCoV proteins can lay the foundation for the design of inhibitors aimed to perturb their functional interactions with host molecules. Raised Abs, on the other hand, could ideally be used in 2019-nCoV neutralization tests.

Results and Discussion
Here, we took advantage of the recent publication of the structures of 2019-nCoV spike protein in its pre-fusion state (doi.org/10.1126/science.abb2507) and of the main protease of the virus 10 , to apply our computational epitope prediction method MLCE to design sequences of epitopes that can be tested in a number of experimental settings. Importantly, one of the predicted reactive sequences on the spike RDB coincides with the region complexed by the ACE2 receptor, described in a very recent publication (doi.org/10.1101/2020.02.19.956235) appeared at the time of writing of this report 11 . It is important to underline here that the sequences and constructs we propose have not been tested yet, and thus have no guarantee to function. Based on our previous experience on the application of MLCE to design effective probes 12-14 , our aim is to propose the new sequences as experimentally verifiable hypotheses that can be taken up by interested investigators.
Epitope prediction is carried out here considering only the crystal structures of the proteins under examination. MLCE integrates the analysis of the structural and energetic properties of proteins to identify non-optimized, low-intensity energetic interaction-networks on the surface of the isolated antigens. Specifically, the MLCE method is based on the eigenvalue decomposition of the matrix of residue-residue energy couplings, filtered by the contact matrix, to detect patches of physically proximal residues that have minimal interactions with the stability core of the protein 12,15 . Minimal energetic coupling with the rest of the protein allows these substructures to undergo conformational changes, to be recognized by a binding partner (the Antibody), and possibly to tolerate mutations that would facilitate escaping immune recognition, with minimal energetic expenses. All these properties are hallmarks of epitopes. Based on these ideas, MLCE has been extensively tested and has proven successful in the design of synthetic epitopes for the capture of specific Abs against both bacterial and viral pathogens 13, [16][17][18][19][20][21][22][23][24] . Furthermore, in previous work, antibodies raised against a designed peptide were successfully tested in bacterial killing experiments 13 .
In the following we report on the proteins analyzed and the sequences designed.
The application of MLCE to the Receptor Binding Domain (RBD, aa 319-591) of the 2019-nCoV spike protein (pdb code 6vsb; doi: 10.2210/pdb6vsb/pdb) ( Figure 1) revealed two regions that contain potential epitopes. The sequences we propose as potential immunoreactive/immunogenic probes are reported in Table 1. Such designs entail both linear and conformational or discontinuous epitopes. The latter consist of organized substructures formed by different segments that come together in the three-dimensional structure, but are distant in the primary sequence. In the case of discontinuous epitopes, to generate one single synthetic molecule, we measured the distance between the termini of the segments making up the patch and bridged them with a number of Gly residues sufficient to approximate the distance in the experimental structure.
Most interestingly, after we ran our predictions on RBD, a crystal structure of this domain in complex with the cellular receptor protein ACE2 was reported by Lan et al. (aa 480-488) 11 . Importantly, one of epitope regions we predict almost completely overlaps with the substructure of spike protein RDB engaged in binding with ACE2 (478 TPCNGVEGF 486). We thus propose sequence TPSNGVEGFNSY, in which cysteine residues have been substituted by serine residues for synthetic reasons, or the original sequence TPSNGVEGFNCY as a higher priority candidate for the synthesis of mimics. It is tempting to speculate that this peptide or derived peptidomimetics could also be used to block RBD binding to the ACE2 cell receptor, providing the basis for protein-protein inhibitor development aimed at outcompeting an interaction fundamental for viral entry. Furthermore Tian et al. recently released a study aimed at probing the binding of SARS-CoV RBDspecific mAbs to 2019-nCoV RBD, where they identify CR3022 mAb as a potent binder 25 . Interestingly, competition assays show that CR3022 binding to RBD is not affected by ACE2, indicating that the mAb binds the domain in a position different from the Receptor Binding Motif, most likely targeting the C-terminal region. Of note, most of our predicted epitopes in the RBD reside in the C-terminal region.
To further increase the probability of identifying successful epitope candidates, we applied MCLE to various isolated domains of the Spike protein, different from the RBD: the highly solvent exposed galectin-like N-terminal domain (NTD, aa 27-318) (Figure 2), the sub-domains 1 and 2 (SD1-SD2, aa 592-814), and the region encompassing the fusion peptide, the connecting region and part of the heptad repeat 1 (FP-CR-HR1, aa 815-965) previously predicted to be a potential target for eliciting neutralizing antibodies in SARS-CoV and MERS-CoV 26 ..The sequences resulting from these rounds of prediction are presented in Table 2.
Finally, analysis of the main protease (6lu7 , doi: 10.2210/pdb6lu7/pdb) (Figure 3) returned the solutions shown in Table 3. As a caveat we anticipate that these latter peptides may not be interesting as probes since the protease is used intracellularly by the virus. Still, a recent paper presented a vaccine development study based on sequences of the protease from HIV 27 . For this reason, and in view of different possible applications in other fields, we decided to include them in the present work.
At the moment of writing this report, two structures were released, 6lvn.pdb showing a tetramer of heptad repeat 2 (HR2) and 6lxt.pdb reporting the trimeric arrangement of the HR1-HR2 dimer in its post-fusion state. We would like point out that HR2 is by nature extremely flexible and as such it is not resolved in any of the different coronaviruses Spike structures available. This opens up the possibility for peptides spanning the helices reported for the 2019-nCoV to be good candidate antigenic regions. We specify here that in the present work we preferred to focus our epitope predictions only on the pre-fusion conformation of the Spike protein, the structure that is presented to the host before the virus has encountered the host cell membrane, so that we did not apply MCLE to the post-fusion HR2 configurations reported in these structures.

Conclusion
Testing of the sequences described herein can unveil their potential as effective diagnostic probes, as well as molecules to be used for therapeutic perspectives, namely in the generation of Abs and in the design of protein-protein inhibitors that block interactions fundamental for 2019-nCoV.  Table 2. Epitope sequences for solvent exposed galectin-like N-terminal domain (NTD, aa 27-318), the sub-domains 1 and 2 (SD1-SD2, aa 592-814), and the region encompassing the fusion peptide, the connecting region and part of the heptad repeat 1 (FP-CR-HR1, aa 815-965) of the 2019-nCoV spike protein (pdb code 6vsb). Bold-face G's indicate glycines inserted to bridge two parts of a conformational epitope; Bold-face, underlined S's or T's indicate serines inserted to substitute for original cysteines or methionines, respectively.  Table 3. Epitope sequences for the main protease (pdb code 6lu7). Bold-face G's indicate glycines inserted to bridge two parts of a conformational epitope; Bold-face, underlined S's or T's indicate serines inserted to substitute for original cysteines or methionines, respectively.

Materials and Methods
The coordinates of the proteins analyzed through the text were downloaded from the pdb with the following codes: 2019-nCoV (COVID-19) spike glycoprotein: PDB ID 6vsb 2019-nCoV (COVID-19) main protease: PDB ID 6lu7 Missing residues belonging to highly flexible loops not solved in the structure of the Spike protein were modeled by homology using the SwissModel web server (https://swissmodel.expasy.org/), that retrieved the structure of the SARS-CoV 5x58 as the best template.
The structures were used as input for epitope prediction using the Matrix of Local Coupling Energies method (MLCE), which combines the analysis of structural determinants of a given protein with its energetic properties 12,15 . This approach allows to identify nonoptimized, low-intensity energetic interaction-networks, corresponding to those substructures that can be more prone to establish interactions with Antibodies, and be suitably recognized by binding partners. Briefly, the contiguous regions on the protein surface that are deemed to have minimal coupling energies with the rest of the structure are selected on the basis of the eigenvalue decomposition of the matrix reporting the non-bonded interaction of all residue-pairs. The eigenvector associated to the most negative eigenvalue permits to reconstruct a simplified matrix which reports the maximal and minimal stabilizing residue-pairs in the protein structure. Filtering of the simplified matrix with the contact matrix allows to identify contiguous residue-pairs characterized by their essential degree of coupling to the rest of the protein. Selection of proximal pairs showing minimal coupling with the rest of the protein defines putative epitopes. Selection is carried out on the basis of a threshold value (called softness), which defines the percentage of the set of putative interaction sites by including increasing residue-residue coupling values until the number of couplings that correspond to the lowest contact-filtered pairs under the threshold was reached. The starting structures of the proteins were refined and minimized by mean of 200 steps of steepest descend using the Amber suite of programs 28 . The MM-PBSA method (Molecular Mechanics energies combined with the Poisson-Boltzmann and Surface Area continuum solvation) was then applied to obtain the free energy profile stored in the MLCE further exploited to perform epitopes prediction.