Structural similarity analysis of the spike protein of SARS-CoV-2 and other SARS-related coronaviruses

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has high infectivity in humans, attributed to the strong affinity of its spike (S) protein to angiotensin-converting enzyme 2. Here, we analysed the structural similarity of the S protein between SARS-CoV-2 and other SARS-related coronaviruses. The S1 domain of the unclassified coronavirus RaTG13 was structurally very similar to that of SARS-CoV-2, implying that RaTG13 could be the origin of SARS-CoV-2.

In December 2019, a highly infectious novel coronavirus emerged in Wuhan, China, which was named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1]. This virus has infected more than 200,000 people worldwide in just three months since the outbreak in Wuhan.
Based on phylogeny, SARS-CoV-2 has been included in the species Severe acute respiratory syndrome-related coronavirus (SARSr-CoV) and the genus Betacoronavirus [1].
Like SARS-CoV, the entry of SARS-CoV-2 into a host cell is facilitated by the binding of the spike (S) protein to angiotensin-converting enzyme 2 (ACE2) [2]. The similarity of the amino acid sequence of the S protein between SARS-CoV and SARS-CoV-2 is about 76%, with a very high degree of homology [3]. Importantly, SARS-CoV-2 is reported to have a much higher human-to-human transmission through ACE2 binding compared to SARS-CoV [4], suggesting a stronger binding affinity of the S protein of SARS-CoV-2 to ACE2.
A recent study suggested that an unclassified coronavirus, named RaTG13, from Chinese bats has the highest sequence similarity with SARS-CoV-2, with 96% identity at the whole-genome level [5]. However, it is not yet proven that RaTG13 is the origin of SARS-CoV-2. Here, we hypothesized that if RaTG13 is the origin of the highly contagious SARS-CoV-2, it will share certain structural characteristics of the S protein.
We downloaded the S protein sequence in fasta format from the NCBI database and analysed it with S protein sequences isolated from nine hosts infected with SARS-CoV-2, SARS-CoV, SARS-like CoV, or RaTG13 (Table 1). On the basis of phylogeny analysis with neighbour joining and 100 bootstrap iterations, the amino acid sequence of the S protein of RaTG13 was found to be the most similar to that of SARS-CoV-2 ( Figure 1A). Several insertions were indicated in SARS-CoV, SARS-CoV-2, and RaTG13 ( Figure 1B). Importantly, the amino acid sequence between the positions 331 and 583, representing the receptor-binding domain (RBD) of the S protein of SARS-CoV-2, had more similarity with RaTG13 compared with SARS-CoV ( Figure 1B). This indicates that the RBD of the RaTG13 S protein might be structurally similar to that of SARS-CoV-2. Therefore, the structural similarity between a S protein sequence of SARS-CoV-2 was compared with NP_828851.1 (SARS-CoV), AVP78042.1 (SARS-like CoV, bat-SL-CoVZXC21), and QHR63300.2 (Unclassified CoV, RaTG13). For this, protein structures of the four sequences were predicted by SWISS-MODEL, which involves alignment of a target sequence and template structure [7][8][9][10][11]. We used the template with S protein (PDB ID: 6acd.1.A) [12], which was analysed by electron microscopy.
In the RBD of the S protein, an insertion event appears to have occurred in SARS-CoV and SARS-CoV-2 ( Figure 2A). However, the inserted sequence of SARS-CoV was very different from that of SARS-CoV-2 or RaTG13 (Figure 2A), while the inserted sequences of SARS-CoV-2 and RaTG13 were very similar (Figure 2A). The three-dimensional (3D) structure of the RBD was similar to those of SARS-CoV, RaTG13, and SARS-CoV-2 ( Figure 2B). To quantitatively evaluate the structural similarity, we calculated the template modelling (TM) score between SARS-CoV-2 and each of the three proteins using the web-based software TM-Score [6]. The TM score of RaTG13 against SARS-CoV-2 was 0.8401, while the TM scores of SARS-CoV and SARS-like CoV against SARS-CoV-2 were <0.5. A TM score >0.5 indicates the same fold between two amino acid sequences. In addition, interestingly, the distance between most residue pairs of the S1 domain (but not S2) of the S protein between SARS-CoV-2 and RaTG13 was similar (<5.0 Å) (Figure 3). In conclusion, there are two major domains, including S1 and S2, in the S protein. The S1 domain of the S protein contains the RBD, which is reported to bind to ACE2 directly. Our results reveal that the structure of the S1 domain in RaTG13 is very similar to that of SARS-CoV-2. Functionally, RaTG13 is likely to be the origin of SARS-CoV-2 because of the close similarity of the S1 domain, which is associated with the high-infectivity characteristic of SARS-CoV-2.