Accessible Surface Glycopeptide Motifs on Spike Glycoprotein of 2019-nCoV: Implications on Vaccination and Antibody Therapeutics

Corona viruses hijack human enzymes to assembly sugar coat on Spike glycoproteins. The mechanism that human antibodies may uncover the antigenic viral peptide epitopes hidden by sugar coat are unknown. In this study, we analyzed the high-resolution Cryo-EM structure of Spike glycoproteins. The results showed that electron densities of glycans cover most of the SARS-CoV Spike receptor binding domain except FSPDGKPCTPPALNCYWPLNDYGFYTTTGIGYQ. The glycosylated 2019-nCoV Spike protein by homology structure modeling showed a similar exposed sequence YQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQ. Other surface-exposed domains included those located on Central Helix, between amino acids 967 and 1016 of SARS-CoV, and 985 to 1034 of 2019-nCoV Spike protein. As the majority of antibody paratopes bind to peptide portion with or without sugar modification, we propose a snake-catcher model that a minimal length of peptide is first clamped by a paratope, and the binding is either strengthened by sugars close to peptide, or not interfered by sugar modification.


Introduction
Spike proteins are located on the surface of corona viruses and serve as entry proteins for infection (1). The Spike molecule forms trimers, which must be cleaved by cellular proteases so that the fusion peptide can facilitate the fusion of virus membrane with the infected cells. The proteases generate S1 and S2 subunits from Spike molecule, and the S1 subunit contains the critical receptor binding domain (RBD) to bind ACE2 of host cells. The receptor binding motif (RBM) of the receptor binding domain, rich in tyrosine, forms direct contacts with ACE2. The fusion of the virus with the host cells involves several other critical structures of the Spike protein, including Central Helix (CH) and heptad repeat 1 and 2 (HR1 and HR2) domains.
In this study, we analyzed the cryo-EM structure of recombinant SARS-CoV Spike protein expressed by insect (Sf9) cells (19). We further used the homology-modeled structure of glycosylated 2019-CoV Spike protein, to identify surface-exposed epitopes for antibody recognition as well as vaccine design. observed by Cryo-EM for SARS-CoV S protein, and most predicted sites of 2019-nCoV are located similarly to SARS-CoV ( Figure 2E). The RBD domain are overall highly conserved with sequence identity (74.5%), structure (RMSD~1.14Å), and two identical glycosylation-sites near the N terminal ( Figure 2F), while the sequence specificity of epitopes remains unique in some region (Tables 1&2). A similar surfaced exposed region, or "Achilles heel", YQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQ, was identified in RBD of 2019-nCoV.
Interestingly, the "Achilles heel" for both SARS-CoV and 2019-nCoV is also free of glycosylation, while its neighbor fragments are covered or interacted by glycosylation. This region free of glycosylation is favorable for ACE2 and other protein binding ( Figure 2G).

Accessible surface area (ASA) calculated according to electron density of glycans on Spike proteins of SARS-CoV and 2019-nCoV
The ASA profiling was used for mAb epitopes prediction (Supplemental Figure 3). Candidate epitopes were listed in Table 1 and Figure 3. In addition to RBD domains, multiple potential candidate epitopes from amino acid sequences at FP, HR1 and CH domains. Figure 4 shows the alignment of epitopes of Spike proteins of SARS-CoV and 2019-nCoV. Similar sites were found in RBD domains and CH domains of both viruses. However, unique sites were also found for each virus (Table 2 and Supplemental Figure 4). A unique epitope only existing in 2019-nCoV, but not in SARS-CoV, is the RARR (682-685) site for furin recognition (Supplemental Figure 5).

Discussion
Neutralizing antibodies toward Spike proteins are critical for protective immunity. Traggiai et al. reported Spike-specific monoclonal antibodies isolated from a patient who recovered from SARS-CoV infection, with in vitro neutralizing activity ranging from 10 -8 M to 10 -11 M (2).
TLR ligands, delta inulin, monophosphoryl lipid A were reported as effective adjuvants to be combined with subunit vaccines. To avoid the use of adjuvant, inactivated SARS-CoV viruses or recombinant adeno-associated virus encoding RBD of SARS-CoV spike protein have been studied, which induced potent protective antibody responses against infection (30-33). The safety and efficacy of antibody therapeutics and vaccines in human clinical trials remain to be studied, as well as the mechanism for specific vaccine component and formulation. For example, pulmonary pathology was reported when alum was used as adjuvant for Spike protein subunit vaccine (34). Antibody-induced lung injury was also reported in macaque model of SARS-CoV infection (35), which highlights the importance to avoid antibody-medicated inflammation.
RBD domain has been a main focus for antibody and vaccine studies. Three antibodies complexed with RBD of SARS-CoV has been co-crystalized, including 80R, m396, F26G19 (16-18). All three antibodies recognize non-continuous, conformational epitopes (Supplemental Table 1). Several mAb clones that recognize linear continuous peptide sequences have been reported (4D5, 17H9, F26G18, and 201), although co-crystal structures are not available yet.
In this study, we have identified the ASA profiling of RBD of 2019-nCoV, and found a vulnerable region, YQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQ. Previously, the structural counterpart of this region is termed as "the Achilles heel" of SARS-CoV (9). It is mostly overlapped with the interface between ACE2 and S protein ( Figure 1G). For SARS-CoV, multiple mAbs targeting the "the Achilles heel" of SARS-CoV have been generated, including F26G18, 4D5, CR3006, m396, FM39, CR3014, F26G19 and 80R (Supplemental Table 1).
Ongoing studies are being focused on the epitopes at "the Achilles heel" of 2019-nCoV for antibody and vaccine development.
In the past, it is well known that the predicted epitopes of protein antigens may be masked by glycosylation. Complex dataset and algorithm have been developed, which are based on training parameters related to interactions of glycans and surrounding amino acids, such as SEPPA 3.0 (36). However, no experimental data is available on the effect of glycosylation sites on epitope surface. With the recent breakthrough by high-resolution Cryo-EM, many glycoproteins can be solved and modeled with glycosylation sites. Here we directly exploit experiment data of SARS-CoV Spike protein from high resolution Cryo-EM, and screened epitopes for 2019-nCoV Spike protein by ASA profiling based on homology-modeled structure. By this approach, we have identified the "Achilles heel" of 2019-nCoV virus, as well as multiple other surface-exposed epitopes within and outside of RBD. For example, in NBD domain of SARS-CoV Spike protein, mAbs specific for linear epitopes have been reported (3, Supplemental Table 1). MAbs specific to other regions of S1 subunit and S2 subunits of SARS-CoV Spike protein were also reported (6). As summarized in Table 1, promising antibody binding sites within RBD and outside of RBD have been identified for 2019-nCoV, future studies will be focused on vaccination studies to validate their function as neutralizing epitopes with preventive and therapeutic effects in virus challenge experiments.
Dense glycosylation of glycoproteins is a well-known strategy used by viruses to conceal surface peptide epitopes which elicit antibody responses, as exemplified by Env protein of HIV-1 virus.
However, after decades of effort, monoclonal antibodies which bind to conformational epitopes on surface of the Env protein have been identified (36-38). Most of these antibodies bind to Nglycan portion neighboring the peptide epitopes, while some antibodies such as mAb 8ANC195 have evolved to recognize peptide epitope with no dependence on glycan binding (36). For antibodies specific to Spike glycoproteins, there is no data available whether their recognition is interfered by the glycosylation of Spike. We propose a "snake catcher" model that a minimum length of peptide portion, either linear continuous, or conformational, must first be first clamped by a paratope. This clamping effect may either be strengthened by sugars close to the peptide epitope, or not interfered by sugar modification. Clearly, the availability of surface-exposed glycopeptide motifs are critical for inducing antibody responses.
In summary, our study clearly identified list of linear surface exposed epitopes in Spike proteins of SARS-CoV and 2019-nCoV, and demonstrated the advantages to study glycosylation effect with real Cryo-EM data. These epitopes are critical for screening of monoclonal antibody therapeutics to treat 2019-nCoV viruses, as well as mechanistic studies on vaccine development.
The sequence identity of the spike proteins between 2019-nCoV and SARS-CoV is as high as 84%, which is sufficient to build an accurate homolog model. The sequence of MN908947 was submitted and the structure model was built against all available homolog structures as templates by SWISS-MODEL (https://swissmodel.expasy.org). One stable conformation of trimer structure models for 2019-nCov is very close to Spike protein structure from SARS-CoV (PDB: 5X58), and their RMSD of single protein chain is about 1.32 Å after two structures are superimposed and compared in PyMol ( Figure 2D&E).

Calculation according to electron density of glycans on SARS-CoV Spike protein
Glycosylation sites were solved and determined from high-resolution Cryo-EM density map, while only N-Acetyl-D-glucosamine (NAG, GlcNAc) is determined to represent a whole glycan due to the glycan flexibility and disorder. The SARS spike protein structure (PDB:5X58), together with the NAG (GlcNAc) sites, were applied for molecular interface calculation with PISA (http://www.ccp4.ac.uk/pisa/). All the amino acids linking or interacting with NAG (GlcNAc) were selected and excluded in epitope prediction. Besides the interaction between NAG (GlcNAc at reducing end) and amino acids, the effects of larger structure of glycans extending from every NAG (GlcNAc) may also need to be considered, as shown as in Figure 2C, although their electron densities are weak.

Calculation according to homology-modeled structure of 2019-CoV protein
The same molecular interface calculation procedure described above was applied to calculate the ASA and screen the corresponding antigen epitopes, except the glycosylation effect could not be measured due to structure unavailable so far. As most glycosylation sites are conserved due to high similarity of these two spike proteins, we could predict the glycosylation site effects in 2019-nCoV spike structure as well. When predicted epitopes collide with the amino acid residues interacting with NAG (GlcNAc), they were removed from the candidates by cross-reference of the SARS-CoV data.

Conflict of interest disclosures
The authors declare no conflict of interest.

Author contributions
Dapeng Zhou and Wen Zhang designed this study. Dapeng Zhou, Ruibing Qi and Wen Zhang contributed to the collection, analysis and interpretation of data. Dapeng Zhou and Wen Zhang wrote the manuscript. All authors read and approved the final manuscript.     B. Four epitope pairs S1/n1, S2/n2, S3/n3, and S4/n4 compared between SARS-CoV (epitopes in red) and 2019-nCov S protein (epitopes in grey or light blue for site n3 ), and 2019-nCov S protein cartoon shown individually on right panel; the conserved fragments at FP (red), HR1 (yellow) and CH (orange) shown by small cartoon of SARS-CoV trimer (grey) in the middle.
The epitopes pairs are listed in the Table 2.
C. Bottom solvent view of the RBD domain located at one side of trimer structure bottom; D. Comparison of epitopes in RBD domains from SARS-CoV (epitopes in red) and 2019-nCov trimer (epitopes in light blue, RBD cartoon in cyan), together shown with AH (dark blue for whole AH, partially overlapping with AH/ah for epitopes predicted), glycosylation sites (pink) and their interacting amino acids (yellow).