Profiling the immune vulnerability landscape of the 2019 Novel Coronavirus

The outbreak of the 2019 Novel Coronavirus (2019-nCoV) has rapidly spread from Wuhan, China to multiple countries, causing staggering number of infections and deaths. A systematic profiling of the immune vulnerability landscape of 2019-nCoV is lacking, which can bring critical insights into the immune clearance mechanism, peptide vaccine development, and antiviral antibody development. In this study, we predicted the potential of all the 2019-nCoV viral proteins to induce class I and II MHC presentation and form linear antibody epitopes. We showed that the enrichment for T cell and B cell epitopes is not uniform on the viral genome, with several focused regions that generate abundant epitopes and may be more targetable. We showed that genetic variations in 2019-nCoV, though fewer for the moment, already follow the pattern of mutations in related coronaviruses, and could alter the immune vulnerability landscape of this virus, which should be considered in the development of therapies. We create an online database to broadly share our research outcome. Overall, we present an immunological resource for 2019-nCoV that could significantly promote both therapeutic development and mechanistic research.


INTRODUCTION
In December 2019, an outbreak of a novel coronavirus (2019-nCoV) was reported in Wuhan, China (1). 2019-nCoV rapidly spread to other regions of China, and multiple other countries, with a contagious speed that is much higher than the Severe Acute Respiratory Syndrome (SARS) coronavirus and the Middle East Respiratory Syndrome (MERS) coronavirus (2). Despite the lower mortality rate of 2019-nCoV compared with SARS and MERS, the scale of the 2019-nCoV contagion has already caused more casualties than either of them, as of this writing. Early research into 2019-nCoV has mostly described its epidemiological features (1, 3), reported the possible curative effect of remdesivir (4), and characterized its basic genomics features (5,6).
Scant works have reported on the immunological features of 2019-nCoV, which could have significant bearing on the mechanistic studies of viral life cycle. Such analyses could also inform anti-viral immuno-therapeutic development, which can be either T cell-based or B cell-based. Antibodies can neutralize viral infectivity in a number of ways, such as interference with binding to receptors, block uptake into cells, etc. For SARS-CoV, the human ACE-2 protein is the functional receptor, and anti-ACE2 antibody can block viral replication (7). On the other hand, previous studies have indicated a crucial role of both CD8 + and CD4 + T cells in SARS-CoV clearance (8,9), while Janice Oh et al also observed that development of SARS-CoV specific neutralizing antibodies requires CD4 + T helper cells (8). In fact, there are examples of vaccines for influenza that contain both antibody and T cell inducing components (10,11).
In this work, we performed a bioinformatics profiling of the class I and class II MHC binding potentials of the 2019-nCoV proteins, and also a profiling of the potentials of the linear epitopes of the viral proteins to induce antibodies. We correlated this immune vulnerability map of the 2019-nCoV proteins with their possible mutational hotspots. We made the analyses publicly available as a resource to the research community, in the form of the 2019-nCoV Immune Viewer: https://qbrc.swmed.edu/projects/2019ncov_immuneviewer/.

T cell-and B cell-mediated immune vulnerability landscape of 2019-nCoV
We used the netMHCpan suite of software (12,13) to predict the MHC class I and class II binding peptides of all 2019-nCoV proteins, which could elicit CD8 + and CD4 + T cell responses for viral clearance (Fig. 1a). We found that the number of MHC class I and class II binders, weighted by the HLA allele frequency in the Chinese population, are not spatially uniform across the viral genome. And there are a small number of genomic regions that showed high peaks of immunogenicity corresponding to a large number of MHC binders in a small neighborhood, which could be better potential vaccine targets (Sup. Table 1). The MHC binding peptide profiles of a different racial population (i.e. European ancestry) are shown in Sup. Fig. 1. Interestingly, the T cell epitope intensities (number of binders weighted by allele frequency) are higher overall in the European population than the Chinese population, suggesting that the Chinese population may be more vulnerable to 2019-nCoV infection. Individual HLA alleles are examined in Sup. Fig. 2. The above analyses are conducted for the 2019-nCoV reference genome. However, the viral strains that have been sampled and sequenced so far are highly similar to each other (a segment of multiple alignment shown in Fig. 1b).
We also examined the potentials of the viral proteins to encode linear epitopes that can elicit antibody responses, by using the BepiPred 1.0 software (14). For the current analyses, we focused on linear epitopes, rather than conformational epitopes, because linear epitopes are more suitable for vaccine design (15,16). Similarly scanning through all 2019-nCoV proteins of the reference genome (Fig. 1c), we found that the viral genome is also not uniformly enriched for B cell epitopes. In particular, one small segment of the Orf1 protein and the N protein are enriched for predicted B cell epitopes (Sup. Table 2). Lastly, we focused on the receptor-binding motif of the 2019-nCoV S protein, which attaches to the ACE-2 protein for entry into the human cell (17). We blasted the motif binding domain sequence of the 2019-nCoV S protein with that of SARS, and found there is a poor conservation between the two S proteins (Fig. 1d), which suggests that prior antibodies developed for SARS may not work for 2019-nCOV.
For comparison, we also computed the immune vulnerability maps of SARS (Fig. 1e) and MERS (Fig. 1f), which are the two most aggressive coronaviruses, together with 2019-nCoV. We found that the B cell epitope profiles seem to be more consistent among the three viruses, while the T cell epitope profiles are more distinct. This suggests that the T cell biology of 2019-nCoV could be different from that of SARS and MERS.

Potential mutations in 2019-nCoV could affect immune vulnerability and vaccine design
. CC-BY-NC 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.08.939553 doi: bioRxiv preprint Coronaviruses are all RNA virus (18), and RNA viruses generally have very high mutation rates (19). We showed the mutational rates in the viral genomes for 2019-nCoV, SARS and MERS ( Fig. 2a-c). 2019-nCoV only emerged a very short time ago, which likely explains the lack of significant amount of genetic variations (Fig. 2a). In comparison, SARS (Fig. 2b) and MERS (Fig. 2c) have both accumulated significant variation between the different strains, probably due to their much longer contagion history in humans. Interestingly, in 2019-nCoV, the genomic regions with higher mutational rates and lower mutational rates can already be discerned (Fig.  2a), and they seem to be rather conserved with those of SARS and MERS (Fig. 2bc). This indicates that we should be cautious of a similar level of genetic variation in the future, which might yield an aggressive viral strain, despite the current lack of mutations in 2019-nCoV.
Inspired by this observation, we reasoned that the evolution of the immune vulnerability maps of SARS and MERS due to genetic variation may reveal insight into 2019-nCoV. In Fig. 2d, we showed the relative mutational rates of the viral genomic regions that are enriched for abundant CD4 T cell epitopes, CD8 T cell epitopes, and linear antibody epitopes, in each of the three viruses. It can be seen from Fig. 2d that the mutational rates in 2019-nCoV are still much lower than those of SARS and MERS in these epitope-enriched regions, as expected. But interestingly, the mutational rates of the CD8 epitope-enriched regions are higher than those of the CD4 epitope-enriched regions in both SARS and MERS. This may be due to selection pressure inflicted by the CD8 + T cells, which are the major cytotoxic T cell population. The same may happen to the 2019-nCoV virus as well, but this remains to be proved.

A continuously updated database of the immune vulnerability of 2019-nCoV strains
We created the 2019-nCoV Immune Viewer (Fig. 3) to openly share the virus immunogenicity data we created for 2019-nCoV, and also SARS and MERS. In the Viewer, we provided userfriendly visualization functionality for researchers to examine immunogenicity strength of different genomic regions of each of the three viruses (Fig. 3a), where the users can either zoom in or zoom out. To facilitate the examination of how genetic variations could impact the immune vulnerability landscape of the viruses, we also showed the mutational rates of the viral genome along with the immunogenicity maps. Furthermore, the Viewer also displays a phylogenetic tree with annotations of the strains overlaid. The tree allows the users to highlight the strains of virus according to the annotations (Fig. 3b). Overall, we believe that the 2019-nCoV Immune Viewer will be a valuable resource for the research community, and will facilitate immunological research into this virus.

DISCUSSION
In this report, we characterized the immune vulnerability landscape of the 2019-nCoV, and compared it with that of SARS and MERS. Our work should be broadly useful for researchers who study the interaction between this virus and the host immune system. In particular, we discovered focused regions of this virus that encode a high density of T cell epitopes and B cell . CC-BY-NC 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.08.939553 doi: bioRxiv preprint epitopes, which could be more suitable for peptide vaccine and anti-viral antibody development. We also found that the S protein receptor-binding motifs are poorly conserved between SARS and 2019-nCoV. To facilitate wide adoption of our research outcome, we created a publicly accessible database, for the researchers to easily explore and download our results. The database is under continuous development, and will be updated timely when new strains of 2019-nCoV are made available.
Genetic variations can modify the immunogenicity landscape of the virus, and impact its survival fitness. The selection of good vaccination epitopes should focus on parts of viral proteins with good potentials of generating immunogenic epitopes, and with less chance of mutations. The low level of genetic variation in 2019-nCoV could merely be a sampling issue due to the short contagion history. However, the domains of genomes that are highly mutated in SARS and MERS are already more highly mutated in 2019-nCoV. We should remain cautious about the possible genetic variations to happen in 2019-nCoV, and immunological studies should consider their possible impact, knowing where the mutations are likely going to happen.
Overall, our work provides a window into the immunological features of 2019-nCoV, and we hope our work could aid therapeutic development against this virus to stop this pandemic earlier and to aid the vaccine development to prevent future breakouts.

Prediction of T cell and B cell epitopes
NetMHCpan (v4.0) (20) and NetMHCIIpan (v3.2) (13) with default threshold options were used to predict peptides, from the viral proteins, that bind to human MHC class I and II proteins for all the available HLA alleles. Only strong binders (<0.5% percentile rank) were retained. The HLA allele population frequency for the Chinese population was acquired from Kwok et al (21) and population frequency for the European population was from Mack et al (22). The B cell epitope predictions were made by the BepiPred 1.0 software (14) with default parameters. Amino acids with B cell epitope prediction scores >0.6 are regarded as having good likelihood of generating linear antibodies.

DNA and protein sequence alignment
The command-line version of MUSCLE (v3.8.31) (23,24) was used to perform multiple genome sequence alignment with diagonal optimization (-diags). The default number of iteration and the default maximum number of new trees were applied during the alignment. The protein sequence alignment between the S proteins, YP_009724390.1 (2019-nCoV) and NP_828851.1 (SARS), was performed using EMBOSS needle (25) with the BLOSUM62 scoring matrix.

Website development
The Immune Viewer is a dynamic website. It is developed using the HTML (HyperText Markup Language), JavaScript and CSS (Cascading Style Sheets). Specifically, we used the D3.js library to allow users to interactively explore the mutation rates or immunogenic scores across the viral genomic regions. We also used the D3.phylogram.js to visualize the phylogenetic tree and the Select2 library to facilitate users' query for different 2019-nCoV strains across multiple geographic regions.

Statistical analyses
All computations and statistical analyses were carried out in the R computing environment. For all boxplots appearing in this study, box boundaries represent interquartile ranges, whiskers extend to the most extreme data point which is no more than 1.5 times the interquartile range, and the line in the middle of the box represents the median. For the line plots, the viral genomes were binned by every 60 nucleotide, and the number of T cell and B cell epitopes falling into each window is calculated. For T cell epitopes, a sum of the number of epitopes weighted by the corresponding ethnic population's HLA allele (A, B, C, and DRB1) frequency is calculated to form the T cell and B cell immunogenicity strength for that population. The genetic variation rate at each nucleotide is calculated by examining all viral strains and counting the proportion of strains with a different nucleotide or with an insertion/deletion, with respect to the reference genome. The genetic variation rates are also binned by the same length of windows and averaged.

Data availability
The 2019-nCoV Immune Viewer is available at: https://qbrc.swmed.edu/projects/2019ncov_immuneviewer/.   Sup. Fig. 2 The variation of T cell epitope profiles for 2019-nCoV, SARS and MERS across populations. The heatmap represents the number of immunogenic binding epitopes across the binned genomes (500bp) of (a) nCoV, (b) SARS and (c) MERS for the major HLA-A alleles shown as examples (allele frequency larger than 1%) in the European American (EA) and Hongkong Chinese (HK) populations. These major alleles are colored in black, blue or red if they are common to both EA and HK population, unique to EA population, or unique to HK population, respectively. On the right, the band of strength represents the cumulative number of immunogenic peptides, and the bands of EA and HK represent the HLA allele frequency of EA and HK populations, respectively.

Sup. Table 1
Genomics regions of 2019-nCoV that are T cell epitope-enriched Sup. Table 2 Genomics regions of 2019-nCoV that are B cell epitope-enriched