Diversity of coronavirus receptors

Several recent surges in COVID-19 cases due to newly emerging variant strains of SARSCoV-2 with greater transmissibility have highlighted the virus’s capability to directly modulate spike-ACE2 interactions and promote immune evasion by sterically masking the immunogenic epitopes. Recently, there have also been reports of the bidirectional transfer of coronavirus between different animal species and humans. The ability of coronavirus to infect and adapt to a wide range of hosts can be attributed to new variants that modify the molecular recognition profile of the spike protein (S protein). The receptorbinding domain of the spike protein specifically interacts with key host receptor molecules present on the host cell membranes to gain entry into the host and begin the infection cycle. In this review, we discuss the molecular, structural, and functional diversity associated with the coronavirus receptors across their different phylogenetic lineages and its relevance to various symptomatology in the rapid human-to-human infection in COVID19 patients, tropism, and zoonosis. Despite this seeming diversity of host receptors, there may be some common underlying mechanisms that influence the host range, virus transmissibility, and pathogenicity. Understanding these mechanisms may be crucial in not only controlling the ongoing pandemic but also help in stopping the resurgence of such virus threats in the future.

The relatively large and diverse Coronaviridae family is divided into two subfamilies: Letovirinae and Orthocoronavirinae [6,9]. The Orthocoronavirinae subfamily is classified into four major genera: Alphacoronavirus (including the human coronaviruses HCoV-NL63 and HCoV-229E), Betacoronavirus (including the human coronaviruses HCoV-OC43, HCoV-HKU1, severe acute respiratory syndrome coronavirus (SARS-CoV), and Middle East respiratory syndrome coronavirus (MERS-CoV), Gammacoronavirus and Deltacoronavirus. Of these, the alpha and the beta genera infect mammals (mainly bat and rodent reservoirs), whereas the gamma and the delta predominantly infect birds [3-5, 10, 11]. In terms of pathogenicity, SARS-CoV and MERS are highly infectious zoonotic pathogens responsible for two significant epidemics in the recent past ( Figure  1A). Compared to these, HCoV-OC43, HCoV-HKU1, HCoV-NL63, and HCoV-229E only cause mild illnesses with symptoms similar to the common cold in infected humans.
COVID-19 is the most recent pandemic caused by a novel coronavirus named SARS-CoV-2 [12], which started in early December 2019 in China's Wuhan City [13][14][15] and was declared to be a "pandemic" by WHO in March 2020. As of 15 January 2021, the COVID-19 pandemic is still progressing at an alarming rate, resulting in over 90 million infections and almost two million deaths worldwide (Coronavirus disease (COVID- 19) Pandemic: WHO). The Severe Acute Respiratory Syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), and SARS-CoV-2 outbreaks within a short span of two decades, and the ability of coronaviruses to transmit from animals-to-humans and humans-to-humans pose a great threat to human health. For example, SARS-CoV is known to be transmitted from rhinolophid bats to the carnivorous wild game (like civet cats) to humans [16][17][18]. Many SARS-related coronaviruses (SARSr-CoVs) in bats also can infect humans [18][19][20][21]. Three novel swine coronaviruses have also emerged in the 21st century. The novel swine acute diarrhea syndrome coronavirus (SADS-CoV), causing 90% mortality in pigs, was shown to replicate efficiently in primary human lung and intestinal cells, demonstrating very high-risk zoonotic transmissions [22]. Recent work has shown the phylogenetic relationship between members of the orthocoronavirus subfamily [6]. Specifically, a maximum likelihood (ML) phylogenetic tree based on amino acid sequences of open reading frame 1ab (ORF1ab) has highlighted the diversity of coronaviruses and shown that human coronaviruses have likely evolved from those of domesticated animals and rodents. Their study suggests that coronavirus transmissions have occurred multiple times in evolutionary history, from animals to humans.
The spike protein is a transmembrane homotrimeric protein that protrudes from the viral surface, giving coronaviruses the signature crown-like appearance (hence the name; corona meaning crown in Latin). Spike-receptor interaction is a crucial step in the viral infection cycle. Viral receptors on the surface of host cells will determine the host range of the virus and its ability to cross species barriers. Various biophysical studies have been performed to identify the potential receptors of coronaviruses since the 2003 epidemic that has helped us identify them in an unbiased way through literature signals ( Figure  1B). In this review, we will discuss the diversity of these receptors across coronavirus host species and how the virus might have evolved mechanisms to gain entry into a broad range of hosts. We will explore the expression, localization, evolutionary conservation, and binding affinity of these receptors to understand a common theme adopted by the virus to identify cell surface molecules as their receptors. We will also delve into the known structures of the spike proteins in complex with its receptors to understand the spikereceptor interaction mode. Literature signals for various biomolecules identified as receptors using nferX signals [23].

Coronavirus genome organization and infection cycle
The coronavirus genome organization is similar to that of the other members of the Nidovirales order, with the non-structural proteins (involved in genome replication, proteolytic processing, and sub-genomic transcription) being encoded by the 5′-proximal end covering two-thirds of the genome, and the major structural proteins (in 5′-3′ order: spike (S), envelope (E), membrane (M), and nucleocapsid (N)) being encoded by rest of the genome at the 3' proximal end [6,24]. The non-structural proteins nsp1 to nsp11 (11 proteins) and nsp12 to nsp16 (5 proteins) are encoded by ORF1a (translated directly from genomic RNA) and ORF1b (translated after a -1 ribosomal frameshift near the end of ORF1), respectively [6]. The structural proteins encoded downstream of the ORF1ab polypeptide are involved in receptor-binding and membrane fusion (S), viral assembly and budding (E and M), and nucleocapsid formation (N). The coronavirus genomic RNA is packaged inside a helical capsid and further surrounded by an envelope. The structural proteins associated with the viral envelope aid in the virus assembly and the entry of the virus into the host cells [3]. In some cases, an envelope-associated hemagglutininesterase protein (HE) is present in some coronavirus species to aid in cellular entry [25].
The first step of the viral life-cycle is the binding of the viral spike protein to the host cell receptor [26][27][28][29]. This interaction triggers a conformational change in the spike protein that enables the fusion of the viral envelope and the host cell membrane for viral entry. Upon entry into the host cell, uncoating the virion releases its genomic RNA into the cytoplasm. This is immediately followed by the translation of two large open reading frames, ORF1a and ORF1b, to produce two large viral precursor polyproteins, pp1a and pp1ab, respectively. These polyproteins are then co-translationally and posttranslationally processed by viral proteinases to smaller non-structural proteins (nsps) involved in viral replication and transcription. Discontinuous transcription of the viral RNA leads to the production of both genomic and subgenomic mRNAs, which encode the structural proteins necessary for generating the virion. These newly translated structural proteins translocate through the endoplasmic reticulum (ER) and transit through the ERto-Golgi intermediate compartment (ERGIC), where they interact with the newly produced N-encapsidated genomic RNA resulting in budding into the lumen of secretory vesicular compartments. The freshly synthesized virions are then exocytosed from the infected host cell and are ready to infect a fresh set of host cells.

Cross-species transmission contributes to the emergence of novel coronavirus strains
Cross-species transmission of coronavirus poses a significant challenge to the control of disease spread. The majority of the virus outbreaks are caused by microevolution of the virus in response to the wide range of physiological and immunological pressure posed by different host reservoirs [30,31]. In coronaviruses, this spillover mechanism is evidenced by the similarity observed in genomic sequences across either the coronaviruses or their corresponding host receptors [32,33]. The beta coronaviruses consist of four distinct lineages annotated as lineage A, B, C, and D based on their phylogenetic similarity. Lineage A includes human coronaviruses HKU1 and OC43. The phylogenetic analysis of SARS-CoV-2 with other beta coronaviruses shows that it clusters closely with certain bat coronaviruses within lineage B, especially Bat-SARSr-CoV RaTG13. A phylogenetic tree based on nucleotide sequences of complete genomes of coronaviruses also revealed that SARS-CoV-2 is more similar to Bat-SARSr-CoV RaTG13, sharing 96% sequence identity in comparison to SARS-CoV with just over 79% sequence identity, hinting at the possible bat origin of COVID-19 [34]. The highly pathogenic MERS coronaviruses fall within lineage C and coronaviruses infecting bats in lineage D. A multiple sequence alignment between all the SARS coronaviruses spike protein sequences was done to assess the degree of sequence identity between them ( Figure 2A).
The lineage trend of SARS-CoV-2 traces its original emergence in bat species (Rhinolophus affinis) followed by transmission to humans through Malayan pangolins (Manis javanica) [35][36][37]. This deadly zoonotic virus is transmitted to other animal species in close contact with humans [38][39][40][41]. In April 2020, SARS-CoV-2 spillover in Denmark mink farms from humans was identified and found to have spread across multiple mink farms within a short period [39]. The most alarming fact of this incident is the concurrent bidirectional transmission that introduced mink-adapted SARS-CoV-2 genomes to the human population, potentially contributing to the emergence of more pathogenic SARS-CoV-2 strains. Even though such a large-scale outbreak is not reported in other animal species, researchers have evaluated the SARS-CoV-2 susceptibility in a few other animal species that are likely to be in contact with the human using animal models. These studies revealed that domestic animals such as dogs [42], cats [42], ferrets [42,43], hamsters [44,45], and rabbits [46] are also susceptible to SARS-CoV-2 infection.  [34], were aligned using MAFFT [47]. A maximum-likelihood tree was generated based on the alignment, and the tree was rooted using the alphacoronavirus S protein sequence HCoV-229E. The number at the nodes indicates the bootstrap support (100 replicates). (B) The protein domain architecture of SARS-CoV-2 S protein. The multiple sequence alignment snapshots corresponding to regions comprising receptor binding motif, S1/S2 cleavage site, and fusion peptide areis depicted.
The host-receptor recognition is one of the initial species barriers overcome by the virus during cross-species transmission [48,49]. The Pangolin-CoV identified as a distant relative to SARS-CoV-2 exhibits a high sequence similarity (~97%) in the spike protein and similarly binds to human ACE2 [37,50]. Such observations support the fact that human coronaviruses emerge through a successive accumulation of mutations that alter host-specificity acquired through various events such as genetic recombination or low fidelity of RNA-dependent RNA polymerase (RdRp) or due to replication slippage. Moreover, scanning for similarity of cell-surface molecules across multiple known host species can reveal their susceptibility [49,51]. Overall, comprehensive sequence surveillance of closely related viruses with different host species is recommended to predict outbreaks and implement appropriate public precautionary measures.

Domain organization of spike protein
TThe spike protein consists of two major subunits, S1 (receptor-binding subunit) and S2 (membrane-fusion subunit) followed by a transmembrane anchor region, and a short intracellular tail [3,52,53] (Figure 3B). A novel feature that sets SARS-CoV-2 apart from SARS-CoV and SARSr-CoVs is the presence of a 'furin' cleavage site (FCS) at the S1/S2 boundary [54], a feature conserved among almost all the SARS-CoV-2 isolates sequenced to date (Table 1). However, this site is found to be ubiquitous in other members of coronavirus such as in gammacoronaviruses, Embecovirus and a few strains of betacoronaviruses (MERS-CoV, bat coronavirus HKU), feline coronavirus (UCD, UCD8), and canine coronavirus [55,56]. The presence of this additional cleavage site aids its pathogenicity leading to increased transmissibility [57][58][59]. Interestingly, this cleavage site (682-RRAR|SVAS-689) was observed to be identical to the furin cleavage site present in the alpha subunit of the human epithelial sodium ion channel (ENaC) protein [60]. Furin plays a vital role in the activation of ENaC alpha through proteolytic processing at this site. The severe respiratory symptoms observed in COVID-19 patients can be potentially due to the hijacking of this furin cleavage site, leading to inactivated ENaC and thus disrupt the homeostasis of airway surface liquid. This hijacking could also potentially explain the lower rate of SARS-CoV-2 infection in cystic fibrosis patients. They are observed to overexpress ENaC alpha that can sequester the furin protease and prevent the furin-mediated activation of the SARS-CoV-2 virus [61,62]. Following the cleavage, in all coronaviruses, the S1 subunit remains covalently bound to S2, contributing to the stability of the prefusion complex. In contrast, the membrane-anchored S2 subunit is further cleaved at the S2' site by host proteases triggering membrane fusion and viral entry mediated by extensive conformational changes [63][64][65][66][67][68]. Importantly, spike protein trimers in highly pathogenic coronaviruses appear to exist in partially open states or spontaneously alternate between closed and open conformations unlike the largely closed conformations seen in common cold-causing coronaviruses [54].
The fusion peptide region of spike-protein is highly conserved across all the coronaviruses. This implies that the mechanism driving the process of membrane fusion resulting in the release of the virus into the host cytoplasm is well conserved. In contrast, the receptor-binding motif region was observed to be highly variable due to various insertions and deletions ( Figure 2B). A four amino acid insertion (681-PRRA-684) at the S1/S2 boundary that is a unique feature among lineage B was found in SARS-CoV-2 but is not shared by either SARS-CoV or the Bat-SARSr-CoV RaTG13 ( Figure 2B). This insertion is predicted to form a protruding loop structure that makes the SARS-CoV-2 S1/S2 site more easily accessible to cleavage by host proteases, suggesting this region could be a key driver of viral evolution. Depending on the viral species, different coronaviruses use distinct domains within the S1 subunit to recognize a variety of host receptors. The S1 subunit can be divided into two domains, the N-terminal domain (NTD or S1A) and the C-terminal domain (CTD or S1B), one or both of which can function as receptor-binding domains (RBDs), that dictate the range of hosts for these coronaviruses [69][70][71]. To date, the S1-NTD has been observed to recognize either protein or sugar receptors, whereas the S1-CTD recognizes only protein receptors [71]. It is interesting to note that S1-CTDs of coronaviruses from different genera can recognize the same protein receptor, but in contrast, those from the same genus can recognize different receptors. The S gene encoding the spike protein in SARS-CoV-2 is divergent (less than 75% sequence identity) from those of SARS-CoV and other Bat-SARSr-CoVs (from the betacoronavirus genus), except for a high sequence identity (over 93%) to that of the Bat-SARSr-CoV RaTG13 [72,73]. The S gene in SARS-CoV-2 is longer than those in other SARSr-CoVs, with three short insertions in the region encoding the S1-NTD [72]. The spike proteins from SARS-CoV (isolated from human, civet, or bat) and SARS-CoV-2 share an overall global protein sequence similarity ranging from 76-78%, with just 50-53% similarity observed within the receptor-binding motif (RBM) present within the receptor-binding domain [73]. It should be noted that four out of the five key residues that interact with the Angiotensin-converting enzyme 2 (ACE2) receptor (see also the Molecular Diversity of Coronavirus Receptors section) in the RBM of SARS-CoV-2 differ in SARS-CoV [72].

Molecular recognition of host receptors through coronavirus spike proteins
The crystal structures of S1 domains of five coronaviruses complexed with their receptors have been deduced since the SARS pandemic. These include complexes of SARS-CoV S1-CTD [74] and HCoV-NL63 S1-CTD with human ACE2 [75], MERS-CoV S1-CTD with human DPP4 [76], porcine coronavirus PRCV S1-CTD with porcine APN [77], and mouse hepatitis coronavirus MHV S1-NTD with murine CEACAM1 [78]. Additionally, the crystal structure of bovine coronavirus, BCoV S1-NTD, that uses sialic acid as a receptor has been independently determined, along with its sugar-binding site that was identified through mutagenesis [79] ( Figure 4A). These six structures have revealed the complex interplay between coronaviruses and their receptors and how they use complicated evolutionary strategies to recognize them (reviewed in [71]). Apart from this, more variant forms of these six representative structures have also been determined, including S1-CTD of a MERS-CoV-related bat coronavirus HKU4 complexed with human DPP4 and S1-CTDs of other SARS-CoV strains complexed with ACE2 from animals [79, 81,82]. The specific molecular interaction between ACE2 and the SARS-CoV-2 has also been characterized well through experimental crystallography and described in the latter part of the review. The S1-CTDs of MERS-CoV and HKU4 similarly recognize DPP4, indicating a close evolutionary relationship between the two viruses [81,83]. Although the core structures of SARS-CoV and MERS-CoV S1-CTDs are very similar, the RBMs are markedly different, suggesting different evolutionary pathways and recognizing different receptors. Alternately, alphacoronaviruses HCoV-NL63 S1-CTD and betacoronavirus SARS-CoV sport very different tertiary structures, but both recognize ACE2, hinting at divergent evolution [71]. A virus-binding hotspot Lys353 on ACE2 was identified to play a critical role in binding these two different coronaviruses [84].

Glycosylation of spike protein
Many cell-surface proteins and lipids are conjugated with sugar moieties. They play a significant role in cell-cell communication and immune processes [85][86][87]. Coronaviruses specifically utilize this glycan shielding to limit the immune system [54,88,89]. The spike trimers are highly covered with both N-linked and O-linked glycans [88][89][90], which modulate surface accessibility to host proteases [90] and neutralizing antibodies [54,89,91] and are also crucial for proper folding [92]. Notably, except for the ACE2 binding domain, glycosylation shields approximately 40% of the surface of the spike protein trimers [93]. As per the cryo-EM map, 16 of the 22 predicted N-glycosylation sites (N-X-S/T, where X is any amino acid except P) were resolved in SARS-CoV-2 spike protein, of which 20 are conserved in SARS-CoV spike protein and at least 19 of them have been experimentally confirmed to be glycosylated in SARS-CoV spike protein [54] (Figure 5A, 5B). Although a low O-linked occupancy was observed in spike protein in its native state [94], 26 O-glycosylation sites and 33 O-linked glycans were unambiguously identified in SARS-CoV-2 spike protein using Signature Ions-Triggered Electron-Transfer/Higher-Energy Collisional Dissociation (EThcD) Mass Spectrometry [95]. Interestingly, spike protein is O-glycosylated on a threonine (T678) present near the furin cleavage site (682-685) [96]. This may be biologically important because the O-glycans present in the vicinity of the cleavage site of a protein may regulate the proteolytic cleavage of spike protein and even affect the activation of the virus. The difference in glycosylation patterns can also affect the conformation of spike protein, and it has been confirmed through biochemical experiments that deletion of glycosylation at N165 and N234 ( Figure 6B) by converting asparagine to alanine significantly reduced binding to ACE2, shifting RBD conformation to a "down" state [97]. Taken together, viral evolution has impacted spike glycosylation, directly modulating spike-ACE2 interactions and promoting immune evasion and greater pathogenicity by sterically masking the polypeptide epitopes [98].
The cross-genera recognition of the same receptor and intra-genus recognition of different receptors makes the coronavirus receptor recognition code a hard nut to crack. For example, both HCoV-NL63 (from Alphacoronavirus) and SARS-CoV (from Betacoronavirus) recognize ACE2 as their host cell receptor through their S1-CTDs [102]. In contrast, betacoronaviruses SARS-CoV and MERS-CoV use their S1-CTDs to recognize different receptors, ACE2 and DPP4, respectively [74,105,[118][119][120]. The cellular entry of SARS-CoV-2 is reported to be mediated by one or more host proteins such as ACE2 [118,[121][122][123][124][125], DPP4 [118,121,126], serine protease for spike protein priming (TMPRSS2) [122,124], and heparin [127]. Expression analyses of these hostreceptors across normal tissues from the GTEx study reveal nearly ubiquitous expression at RNA-level in all the tissues except the brain [128]. Each of these molecules has one or more endogenous roles (described in subsequent sections) and are "hijacked" by the virus to act as its receptor(s).

Angiotensin-converting enzyme 2 (ACE2) is the primary receptor for SARS-CoV and SARS-CoV-2
One of the most well-studied and well-characterized of all the known coronavirus receptors is ACE2, which is a homolog of the Angiotensin-converting enzyme (ACE). Endogenously, it plays a role in the negative regulation of the Renin-Angiotensin-Aldosterone System (RAAS) that controls fluid homeostasis and blood pressure [129]. The enzyme ACE converts angiotensin I (AngI) to angiotensin II (AngII), which is the main bioactive angiotensin peptide in RAAS. ACE2 is a zinc-dependent carboxypeptidase that offsets this effect by cleaving AngII to produce the vasodilator, angiotensin(1-7) (Ang 1-7) [130,131]. ACE2 downregulation has been implicated in lung failure and injury, and thus a coronavirus-induced reduction in ACE2 expression may expedite lung damage [132][133][134]. Upon the occupancy of ACE2 by SARS-CoV-2, the increased serum level of free Angiotensin II (Ang II) due to a reduction of ACE2-mediated degradation promotes activation of the NF-kappa B pathway via Ang II type 1 receptor (AT1R), followed by interleukin-6 (IL-6) production. ACE2 functions as the receptor for the coronaviruses NL63 [102], SARS-CoV [101] and SARS-CoV-2 [125] (Figure 6A). There are 32 protein-modifying polymorphisms in the ACE2 gene across different human populations, and seven are hotspot variants. [136,137]. The use of ACE2 as a receptor may have enabled coronaviruses to increase their host range, as ACE2 exhibits a high degree of conservation across various mammalian species. Viruses also require a protease for its activation and they typically hijack the proteases processing the apical membrane proteins. SARS-CoV-2 also utilizes TMPRSS2 as its co-receptor, allowing efficient fusion with the membrane and viral entry. In comparison to SARS-CoV, it was found that only six substitution mutations in SARS-CoV-2 RBD (Asn439/Arg426, Leu452/Lys439, Thr470/Asn457, Glu484/Pro470, Gln498/Tyr484, and Asn501/Thr487) was sufficient to enhance binding to human ACE2 [101]. Such variations can explain the higher infectivity of SARS-CoV-2 and the prognosis of COVID-19.
Analysis of experimentally solved structures of RBM complexes in the virus and its partner virus-binding motif (VBM) in the host cell receptor has shed some light on our understanding of this crucial interaction [69,74,75,[138][139][140]. A cryo-EM structure of the SARS-CoV-2 spike protein RBD bound to the ACE2 receptor suggests that an ACE2 dimer binds to two spike proteins [125], whereas the crystal structure of the SARS-CoV spike protein bound to ACE2 reveals binding of monomers [74]. However, the superimposition of these structures shows a root mean square deviation (RMSD) of 0.68 Å over 139 pairs of Cα atoms [125], suggesting very similar binding modes of these viruses to the ACE2 receptor. According to the literature, the SARS-CoV-2 spike protein has a 10-20 times higher affinity to the ACE2 receptor as compared to that of SARS-CoV [141]. A more in-depth analysis of the interface residues between the human ACE2 and the receptor-binding domains from NL63, SARS-CoV, and SARS-CoV-2 reveals the differential contribution of ACE2 residues towards the interface ( Figure 6B). The interface residues in ACE2 that interact with the NL63 RBD of S protein are entirely different from the ACE2 residues interacting with the RBDs of SARS-CoV and SARS-CoV-2, suggesting convergent evolution. On the other hand, while SARS-CoV and SARS-CoV-2 receptor binding domains interact with the same set of residues from ACE2 ( Figure 6C), key differences were observed in terms of solvation energy contribution for few residues at the interface. The ACE2 residues His34, Asp30, Glu35, and Ile21 had greater contribution towards the formation of a structural complex with the RBD in SARS-CoV-2, whereas in contrast, higher contributions were observed from Asn330, Phe28, Glu23, Ser19, and Glu329 residues from ACE2 in the SARS-CoV receptor-binding complex structure. Recent data suggests the emergence of new mutations in spike protein with altered binding affinity to ACE2 which will be discussed in the next section.

Novel mutations in spike protein of SARS-CoV-2 emerging variants increase viral transmission and decrease vaccine efficacy
Multiple SARS-CoV-2 variants have been circulating globally, with several new variants emerging in the fall of 2020. Of these, three variants of prominence include the UK variant (known as 20I/501Y.V1, VOC 202012/01, or B.1.1.7), the South African variant (known as 20H/501Y.V2 or B.1.351), and the Brazilian variant (known as 20J/501Y.V3 or P.1) (Figure 7). In total, 18 different mutations and deletions were identified in the UK strain, with the spike protein alone harboring nine of them. The critical spike variants include the N501Y in the RBD domain, 69/70 deletion in the N-terminal domain (leading to a conformational change in the viral spike protein), and P681H in the vicinity of S1/S2 furin cleavage site. Evidence suggests that the UK variant may have a 35-45% higher transmission rate and may soon become the most dominant variant in the US by March 2021 [142]. It was speculated that the UK variant may have evolved from prolonged SARS-CoV-2 infection in a single immunocompromised patient [143]. This was demonstrated in the case of a peculiar COVID-19 case from Boston who accumulated more than 20 mutations over a persistent five month infection cycle [144]. Interestingly, the Boston patient shares the key mutations, N501Y and E484K, also reported in the UK and SA variants indicating these mutations may confer a fitness advantage.The South African (SA) variant emerged independently, although it shares the N501Y mutation with the UK strain alongside two other novel key mutations in the spike protein, K417N and E484K . The Brazilian variant has 17 unique mutations including three in the RBD of the virus spike protein. This troubling new variant was first detected in Amazon city of Manaus, Brazil when a startling resurgence of COVID-19 cases were observed. An earlier study had estimated that nearly 76% had already been infected prior to resurgence, which should have conferred a high level of immunity. This variant appears to possess a unique constellation of lineage-defining mutations, which can potentially be associated with increased transmissibility and propensity for reinfection. Another lineage, B.1.258, carrying both N439K and D614G mutations that emerged in Scotland in March 2020, has been independently spreading in Europe ever since, gaining ground in more than 37 countries. These variants have now been detected in other countries, with the SA variant being already reported in the UK [145] and the UK variant gaining a strong foothold in the US [146]. With the spread of virus, there are now raising concerns that the new variants may have altered sensitivity to antibody neutralization and possibly influence vaccine efficacy. The E484K mutation in the SA variant [147] and N439K mutation in the Scotland strain [148] have reduced binding to both clinical antibodies and convalescent or postvaccination serum samples. Similar studies have shown N501Y variant that is shared by both UK and SA variants to have 3 to 6-fold reduced neutralization by most monoclonal antibodies, convalescent plasma (~3 fold) or vaccine sera (~2-fold), while the E484K mutation was considerably more refractory to convalescent plasma (~11-33 fold) and vaccine sera ( ~6/5-8/6) [149,150].
A study on evaluating Moderna's SARS-CoV-2 mRNA-1273 vaccine on the emerging variants indicated that there was six-fold reduction in neutralizing titers for the SA variant in comparison to the UK variant that showed low but significant impact on neutralization [151,152]. Similarly, preliminary data suggests that the Pfizer COVID-19 vaccine, BNT162b2, may be effective against the UK strain. However, effective neutralization of E484K mutation of the SA variant required higher doses of the vaccine [153,154] (Pfizer Press Release). Notably, in an in vitro study, E484K emerged as one of the escape mutations following serial passages of SARS-CoV-2 through convalescent sera, along with a deletion and insertion in NTD loops, that completely abrogated virus neutralizing activity [155]. These emerging variants that can propagate across populations and time warrants the need for constant molecular surveillance. It is also possible that the currently available vaccines will need to be modified in order to generate equally strong antibody responses against these emerging variants. Exploring the role of immunogenic T and B cell epitopes for peptide vaccine development Conventional vaccines such as recombinantly expressed viral proteins, inactivated viruses or mRNA vaccines can cause autoimmune and allergic reactions, are easily degraded, and require freezing cold storage and transport facilities. Peptide vaccine strategies, on the other hand, can overcome several of these disadvantages [156]. Ideally, peptide vaccines should contain epitope regions optimized for simultaneous activation of B cells, CD4+ T cells, and cytotoxic CD8+ T cells to drive both the humoral and cellular arms of the adaptive immune system with high specificity. With this in mind, researchers are looking at immunogenic regions in SARS-CoV-2 and other coronaviruses, that are distributed across structural proteins such as spike, nucleocapsid, envelope, and membrane proteins [156]. The Immune Epitope Database (IEDB; https://www.iedb.org/; 17 Feb 2020) reports close to 2228 linear T and B cell epitopes, with positive experimental assays, of SARS-CoV-2 in human hosts. Immumodominant Tcell epitopes that recognize multiple structural regions of SARS-CoV-2, including spike and nucleocapsid protein, have been detected in patients recovering from COVID-19 and in unexposed individuals [157][158][159]. These studies indicate the presence of robust, long lasting and pre-existing T cell immunity to SARS-CoV-2 in the general population, possibly contributed by multiple "public" T cell receptor (TCR) sequences recognizing peptides from other common-cold coronaviruses, Human cytomegalovirus (HCMV), human herpes virus-5 (HHV-5) and influenza A virus [157,160,161]. This suggests that exposure to other viruses may confer protection against COVID-19 through cross-reactive T-cells. Several immune-informatics and experimental studies have also predicted putative T and B cell epitopes that could serve as promising targets for peptide-based vaccine development [162][163][164][165][166]. Presently, only a handful of these peptide vaccines are in various stages of preclinical trials, such as the pan coronavirus vaccine from Valo Therapeutics that uses a novel Peptide-coated Conditionally Replicating Adenovirus (PeptiCRAd) technology, FlowVax-COVID19 from Flow pharma, DPX-COVID19 from IVM Inc., and Ii-key from Generex [165]. Still, none of them have been approved for clinical use.

Molecular diversity of other host cell receptors and coreceptors
The primary SARS-CoV and SARS-CoV-2 receptor, ACE2, is expressed in various human tissues, including the lung, liver, stomach, ileum, colon, and kidney [167]. The expression levels are relatively low in the alveolar type II (AT2 cell) of the lung, which is the target cell for SARS-CoV-2 [167]. Based on the co-expression analyses, it has been hypothesized that SARS-CoV-2 infection may be facilitated by the use of co-receptors (auxiliary membrane proteins) along with the primary ACE2 receptor [168]. Membrane proteins exhibiting the most similar expression to ACE2 across various human tissues can act as its putative co-receptors. Many of these ACE2 co-expressing proteins are peptidases, including APN, DPP4, and ENPEP [168], of which APN [121,169] and DPP4 [104,105] have been previously shown to be human coronavirus receptors. The TMPRSS2 protease has also been implicated in priming SARS-CoV-2 for infection [170]. In this section, we will elaborate on other known receptors and co-receptors for coronavirus infection.

Dipeptidyl peptidase 4 (DPP4, CD26): receptor for MERS-CoV and coreceptor for SARS-CoV-2
DPP4 is a transmembrane exoprotease cleaving at the amino-terminal of substrates with a preference for proline, hydroxyproline, or alanine in the penultimate position. DPP4 plays a prominent role in maintaining glucose homeostasis by the inactivation of incretin hormones glucose-dependent insulinotropic polypeptide (GIP) and glucagon-like peptide-1 (GLP-1), due to which DPP4 inhibition is a therapeutic strategy for the management of type 2 diabetes [171]. DPP4 plays various roles in immune processes like activating or inactivating several chemokines including IP-10, I-TAC, and RANTES, activating lymphocyte proliferation, and suppressing cytokine production and T cell activation through its interaction with adenosine deaminase (ADA) [77,78].
DPP4 was identified as a receptor for the MERS-CoV, but recent reports suggest that SARS-CoV-2 also uses DPP4 as a coreceptor along with ACE2 [42]. Fifteen residues in DPP4 interact with the MERS coronavirus spike protein. Analysis of naturally occurring polymorphisms in DPP4 revealed that four polymorphisms K267E, K267N, A291P, and Δ346-348) reduced viral infection. It was found that K267E, K267N, A291P affected binding of DPP4 with spike protein while Δ346-348 caused a defect in receptor transport. These polymorphisms, however, occur at very low frequencies in the population [172].

Aminopeptidase N (APN, ANPEP, CD13): receptor for HCoV-229E
APN metabolizes a wide range of physiological peptides, including neuropeptides that affect pain and mood regulation, as well as vasoconstrictive peptides that regulate blood pressure. Apart from its function as an aminopeptidase, APN also plays a role in cell adhesion and motility, tumor migration, and angiogenesis [173,173,174]. Alphacoronaviruses, including HCoV-229E, porcine transmissible gastroenteritis virus (TGEV), and feline infectious peritonitis virus (FIPV), are known to bind to the APN receptor via the RBD located in the CTD of the spike protein. Notably, residues 132-295 were found to be essential for receptor function and a stretch of only 8 amino acids in a hypervariable region was sufficient to convert porcine APN to receptor enabling HCoV 229E entry underscoring the importance of changes that extended the host range of these viruses [175]. The patients with Parkinson's disease (PD) appear to be susceptible to worse outcomes from COVID-19. It is speculated that overexpression of ANP in these patients could be one of the causes for it [176,177].
Carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1): a receptor for MHV CEACAM1,the prototype receptor of betacoronavirus murine hepatitis virus (MHV), is usually involved in cell-cell adhesion and immune modulation. Unlike other coronavirus receptors, CEACAM1 binds to the NTD of the MHV spike protein, where it functions as a lectin [78]. The lectin function at the N terminus of the spike is important from an evolutionary standpoint in that it enabled the ancestral coronavirus to bind to carbohydrate moieties, expanding its tropism and infectivity to humans [78,178]. The fusogenic activity and conformational changes to spike protein are triggered by the Nterminal domain of CEACAM1 [108]. One of the unique features of the MHV, JHM strain is its ability to spread independently of the CEACAM1 receptor, as demonstrated in CEACAM1-deficient mice models [179,180].

DC-SIGN (CD209) and L-SIGN (CD209L): receptors for HCoV-229E, SARS-CoV and SARS-CoV-2
C-type lectins dendritic cell-specific intercellular adhesion molecule-3-grabbing nonintegrin (DC-SIGN) and the closely related liver/lymph node-specific intracellular adhesion molecule-3-grabbing non-integrin (L-SIGN) are cell membrane glycoproteins. DC-SIGN receptor has already been reported to be involved in the dendritic cell-mediated transmission of other viruses such as HIV, dengue, and CMV as well. The heterologous expression of human L-SIGN through retroviral vectors in Chinese hamster ovary cells rendered them susceptible to both SARS-CoV and HCoV-229E infection [109,181]. Recently both these proteins have also been reported to be important for the entry of the SARS-CoV-2 virus into the host cell [182].
Neuropilin-1 (NRP1): a secondary receptor for SARS-CoV-2 NRP1 is a transmembrane glycoprotein cell surface receptor that plays a major role in angiogenesis, lymphangiogenesis, neural and cardiovascular development [183]. NRPs (Neuropilin-1 and Neuropilin-2) are known to interact with various extracellular signaling proteins such as vascular endothelial growth factor (VEGF) isoforms, class 3 semaphorins, and transforming growth factor beta (TGFβ) [184,185] that triggers various downstream signalling cascades. Recently, NRP1 and its homolog Neuropilin-2 (NRP2) were also identified to interact with SARS-CoV-2 spike-protein [112]. Moreover, through autopsies of COVID-19 positive patients, SARS-CoV-2 is reported to infect NRP1 positive olfactory epithelial cells [110]. The presence of the distinctive furin cleavage site (679-NSPRRAR-685) in the SARS-CoV-2 spike protein is shown to enhance its association with the NRP proteins. This feature is a structural overlap observed across the host binding partners of NRPs and the SARS-CoV-2 spike protein, as the NRPs specifically bind to proteins agreeing to C-end Rule (CendR) peptide motifs. Markedly, to date, this protein was not reported as receptors to any other coronaviruses. The SARS-CoV-2 virus uses the spike protein receptor binding domain to exploit the receptor-ligand interaction network of the host and this potentially may have enhanced its infectivity and diversified the tissue tropism.
BSG (CD147, Basigin, extracellular matrix metalloproteinase inducer (EMMPRIN)): a secondary receptor for SARS-CoV and SARS-CoV-2 CD147, earlier identified as one of the host cell receptors for SARS-CoV, is now reported to perform a similar function to assist SARS-CoV-2 cell invasion [113,186]. Some of the compelling observations that relate CD147 to coronavirus infections is the expansion of its expression pattern from basolateral to apical membrane in kidney samples of COVID-19 positive patients [187] and susceptibility of BHK (Baby Hamster Kidney) cells expressing human CD147 towards SARS-CoV-2 infection [113]. CD147 is present in lung cells [116], RBCs [188] and it is highly expressed in activated T-lymphocytes [127,128] and macrophages [191]. One of the manifestations of severe COVID-19 is lymphopenia, i.e., low lymphocyte counts [192]. As ACE2 is expressed in very low amounts in T lymphocytes [193,194], CD147 may provide a possible new alternate route for viral infection in T cells. Hence, CD147 is being explored as one of the potential targets for COVID-19 treatment, vis-a-vis Azithromycin with some known antiviral activity mediated by CD147 [114].

What unifies the different coronavirus receptors? Distinct pattern of coexpression observed across different coronavirus receptors
Detailed knowledge of the expression and regulation patterns of the coronavirus receptors required for the viral entry will enable us to devise effective therapies. In order to investigate the common thread that unifies all the coronavirus receptors, we investigated their co-expression and interaction patterns. Remarkably, ACE2 is highly coexpressed with CEACAM1 and DPP4 with other peptidase receptors, such as ANPEP and ENPEP. Further, cellular proteases like TMPRSS2, FURIN, and CathepsinL, which act as critical factors for spike activation, fusion, and entry [195], are coexpressed with the coronavirus receptors DPP4, CEACAM1, and ANPEP. Single-cell expression analysis of several SARS-CoV-2 and coronavirus-associated receptors and proteases in multiple healthy human tissues revealed co-expression was highest in intestinal and airway goblet cells, enterocytes, and kidneys [196]. A positive correlation was also seen between promoter methylation patterns in TMPRSS2 and ACE2 using bisulfite conversion and qMS (quantitative methylation-specific) PCR on the saliva of COVID-19 patients following infection, indicating that their expression is also strongly linked and influenced by epigenetic modifications [197]. Interestingly, DPP4 and ANPEP are co-localized in cortical neurons and involved in ischemia-related inflammation [198]. Additionally, both the receptors and proteases are strongly co-expressed in the same tissues, facilitating their coordinated action, such as in the upper and lower airways (bronchi), respiratory tract, and lungs [199]. NRP1, the recently identified SARS-CoV-2 receptor, is coexpressed with ANPEP and is also highly expressed in the respiratory and olfactory epithelium, with the highest expression in endothelial and epithelial cells [112].

Coronavirus receptors show evolutionary conservation
Another unifying factor among the coronavirus receptors is the high degree of conservation observed in membrane ectopeptidases (ACE2, DPP4, and APN) across many animal species [103]. The broad host range of SARS-CoV-2 is ascribed mainly to ACE2 being highly conserved across several vertebrates and mammalian species revealed by comparative genomics and structural analysis [200]. Such conservation, especially in the virus-receptor binding interface, may have been evolutionarily selected by the virus to enable their usage in different host species and permit zoonotic transmissions.

Binding affinity dictates infectivity of the virus
Studies have shown that the binding affinity of the viral spike protein RBD with the host ACE2 regulates its pathogenicity, infectivity, and host range. Both SARS-CoV and NL63 bind ACE2, but SARS-CoV causes severe respiratory disease while NL63 frequently causes only a mild respiratory infection. This difference in pathological consequences has been partly ascribed to the lower binding affinity of NL63 spike to ACE2 compared to SARS-CoV, Bat-SARSr-CoV, RaTG13, that resulted in a tighter ACE2-RBD binding. This may have possibly precipitated the bat-to-human transmission and highly transmissible SARS-CoV-2 with a devastating outcome. Earlier in the year, the D614G spike variant was identified in Italy to be ten times more contagious than the original Wuhan strain, responsible for the rapid spread of SARS-CoV-2 in Europe [201]. Although this residue is outside the RBD, the substitution alters a hydrogen-bond interaction with T859 of a neighboring spike in the trimer, which shifts the RBD to an 'up' conformation, enhancing binding with the ACE2, leading to higher viral infectivity [202]. Preliminary genomic characterization of the emerging B.1.1.7 SARS-CoV-2 strain in the UK that is currently found to be spreading 70% faster than the older strain, shows this variant carries a mutation in one of the six key residues within the RBD, namely N501Y, which increases its binding affinity to human and murine ACE2. Apart from this, it has a deletion in spike 69-70 that has implications in immune evasion, and P681H mutation that is adjacent to the Furin cleavage site (COVID-19 Genomics Consortium UK (CoG-UK; https://www.cogconsortium.uk).

Common pathways used by coronaviruses for host cell entry are poorly understood
The cellular host signaling pathways that underlie the mechanisms of coronavirus infection have been poorly elucidated. Identifying the common pathways impacted by coronavirus infection may help to shed more light on the coronavirus pathophysiology and modes of transmission. It has been shown that coronaviruses can enter host cells through a pH-dependent endocytic pathway involving either cellular receptors, sugars, or lipid rafts via clathrin-and caveolae-independent mechanism, which is consistent with their broad cellular tropism [196]. Additionally, neuropilins, like NRP1, are known to mediate an endocytic pathway resembling macropinocytosis through the internalization of CendR ligands [203]. Both SARS and MERS diseases involve acute respiratory distress, gastrointestinal disorders, and cardiovascular complications. In silico experiments suggest DPP4, previously established as a receptor for MERS, may also bind SARS-CoV-2 [204]. Both ACE2, a component of the renin-angiotensin-aldosterone system (RAAS) and DPP4 are overexpressed in diabetes and cardiovascular disease, which are considered as high risk factors for severe COVID-19 [118]. Due to their synergistic effect, a combined inhibition of ACE2 and DPP4 has been suggested as a potential therapeutic strategy to combat this virus [205]. NRP1, along with its co-receptor VEGFR2 is also involved in regulating cardiovascular function and angiogenesis. Although the mechanistic details are still being evaluated, severe COVID19-associated vascular pathologies may be mediated through common pathways involving these receptors.

Conclusions and future directions
One of the critical steps in the viral infection cycle is recognizing receptors on the host cell membrane. How a virus selects a particular molecule to be its receptor amidst a host of other molecules on the cell surface remains one of the crucial unanswered questions in virology. These receptors can aid in the attachment of the virus to the host cell surface along with its uncoating and internalization into the host cell. Based on this, one can infer common characteristics of the receptors: possessing a VBM, ability to promote conformational change, and potential to interact with molecules in the signaling cascade. However, to broaden their host range and to effectively carry out cross-species transmission, viruses might 'opt' for receptor molecules that are more conserved in evolution, at least in their VBMs. However, this also exposes their vulnerability in their choice of highly conserved peptidases that can be exploited for a combined coronavirus therapeutic intervention strategies.
Notably, the viruses may also select receptors that are co-expressed and colocalized with the proteases that are needed to cleave and activate viral fusion and entry into the host cells. Hijacking the host's internal protease machinery to its advantage also exposes an Achilles heel, wherein the use of specific protease inhibitors such as camostat mesylate, which is active against TMPRSS2, can be used as the first line of defense against the virus.
As compared to influenza or polioviruses, the coronaviruses have evolved to recognize a broader range of host cell receptors. The influenza virus recognizes glycans (terminating in α2,3-linked sialic acid for strains affecting birds and in α2,6-linked sialic acid for those affecting humans) [168]. In contrast, polioviruses gain entry into their host cell via the Poliovirus Receptor (PVR) [207]. The diverse nature of molecules selected by the coronaviruses to infect the host cell along with its ability to recombine and shift hosts makes it a long-term threat to human health. Moreover, its ability to jump host species poses a constant threat of generating more contagious strains. Hence, developing vaccines [208] or small-molecule drugs [209] against the conserved structural elements of the SARS-CoV-2 genome (at the level of genomic RNA, as well as translated structural proteins) remains our best bet to curb the ongoing pandemic. Like many other viruses, coronaviruses can mutate (although at a lower rate in comparison to other RNA viruses such as flu and HIV) and give rise to variants. Since the virus is contagious and can easily spread, the variants with higher fitness landscape can get fixed in the viral population giving rise to emergent strains and thus pose a recurring threat to human health. Their ability to cause rapid human-to-human transmissions has led to periodic pandemics in recent decades. The increasing binding affinity of the RBD of the virus for the coronavirus receptors has also evolved a higher capacity for pathogenicity and rapid spread. The use of antagonists to coronavirus receptors, such as recombinant human ACE2 (rhACE2) in human trials (APN01-COVID-19, NCT04335136), is being explored. Such a rhACE2 would not only prevent SARS-CoV-2 entry but also possibly protect from lung injury and vascular dysfunction [118]. Taken together, although coronavirus receptors encompass a rather diverse group with broad specificity and distinct pathogenicity, the SARS-CoV-2 pandemic marks a historic turning point that has led to our accelerated understanding of these viruses and improved and novel strategies to effectively target them.

Consent for publication
All the authors give their consent for the publication of the manuscript.

Availability of data and material
Not applicable.

Competing interests
All the authors are affiliated with nference and have a financial interest in the company.

Funding
No external funding was received for this work.
Authors' contributions P.G. wrote the first draft; P.A. generated the figures; all other authors have contributed to the writing of subsequent drafts of the manuscript. All authors have read and approved the final version of this manuscript.