Variable Structural Networks at the Active Site of the SARS-CoV and SARS-CoV2 Main Proteases

The novel coronavirus SARS-CoV2 (CoV2) emerged in December 2019. This virus has 88% genomic similarity with SARS-CoV (CoV), and both viruses largely depend on their main protease (M pro ) to regulate infection. M pro thus represents an attractive target for anti-SARS drug design. The CoV and CoV2 M pro are 97% identical at the sequence level, with 12 variable residues, and their X-ray structures appear similar. We thus structurally analysed how these variable residues affect the intra-molecular interactions between key residues in the CoV2 M pro active-site. Compared to CoV M pro , the 12 divergent residues in CoV2 M pro exhibit modified intra-molecular interaction networks that ultimately restructure the molecular micro-environment. These altered networks also indirectly affect the networks of other active-site residues at the entrance (T26, M49 and Q192) and near the catalytic region (F140, H163, H164, M165 and H172) of the M pro . This suggest CoV2 indirectly (via neighbours) reshape key molecular networks around the M pro active-site. It seems that the CoV2 M pro deceives us with its apparent structurally identical to the CoV M pro while this viral system accumulates mass mutations (12 variable residues) at key positions. Some of these identified CoV2 M pro networks at the active-site might guide design of efficient CoV2 M pro inhibitors.


Introduction
In March 2020, the WHO declared that the outbreak of a novel coronavirus, SARS-CoV2 (CoV2), constituted a pandemic. This virus causes the transmissible disease, severe acute respiratory syndrome (SARS) [1,2]. Although the source of this virus is still unknown, CoV2 shares 88% genomic similarity with SARS-CoV (CoV) that was identified in 2003 [3,4]. CoV is highly dependent on the main protease (M pro , or 3C-Like protease) for replicase polyprotein processing. By proteolytic cleavage, the M pro generates functional pp1a and pp1b replicases in the host system that help to initiate and regulate infection [5]. M pro is highly conserved among the coronaviruses, including CoV2; due to its essential role in the viral life cycle, it is considered as a major target for drug discovery [6][7][8][9]. Indeed, several studies have suggested that inhibitors of the CoV M pro active site might be repurposed to inhibit CoV2 M pro [10][11][12][13].
The first X-ray structure of the CoV M pro was released with modified N and C terminals soon after the 2003 CoV outbreak [7]. Years later, the authentic wild type structure of CoV M pro and H163 (from protomer A) that serve to open and close the active site for ligand binding ([7,14,16-18]. Understanding the interactions between these functional residues in the new CoV2 M pro is essential. In February 2020, the first structure of the CoV2 M pro (PDB ID: 6LU7, unpublished) was released. At high resolution, the CoV M pro and CoV2 M pro X-ray structures look very similar with only a 0.5 Å structural deviation (Fig. 1C). While sequence alignment between the two M pro shows 97% identity (Fig. 1D), there are only 12 variable residues between them (Table 1).
Furthermore, the two M pro structures accommodate the same ligand (N3) differently (Fig. 1C).
As molecular networks shape protein function, we analysed the impact of these 12 variable residues on their intra-molecular networks and subsequent functional relevance. Because functional studies are time consuming during this period of international emergency, we used a structural systems biology approach to initiate the dissection of these networks.

Materials and Methods
The high-resolution dimeric (protomer A and B) X-ray 3D structures of the CoV M pro and the CoV2 M pro were obtained from the protein data bank (PDB ID: 2HOB and 6LU7, respectively) to compare the equivalent structures. The structure of CoV2 M pro was released by the same team [(Xue, (PDB, February 2020) unpublished] who released the highly active authentic wild type CoV M pro structure [14]. Pymol was used for structural analyses and to represent the molecular structures (www.pymol.org). Sequence alignment was carried out with Clustal Omega [19]. Dimplot in Ligplot with default parameters for hydrogen bonds and non-bonded interactions was used to analyse intra-molecular interactions [20].

The M pro of CoV2 and CoV differ by 12 residues
A parallel sequence alignment of the CoV2 M pro and the CoV M pro confirmed 12 variable residues at positions 35, 46, 65, 86, 88, 94, 134, 180, 202, 267, 285 and 286 ( Fig. 1D and Table   1). The CoV2 M pro X-ray structure of homodimer seemed to structurally mimic the CoV M pro structure in terms of possessing similar domains and a comparable active site ( Fig. 1 A-B).
Most (8/12) of the variable residues were found in the M pro -sheet-rich domains I and II, where the inhibitor/catalytic site is located; the remaining four residues were found in domain III. By These data suggest that despite overall structural similarity with the CoV M pro , 12 divergent residues in the novel CoV2 M pro might affect the activity of the cartalytic domain.

The impact of the 12 variable CoV2 M pro residues on neighbouring residue interactions
We next investigated the divergent interacting partners of the 12 variable residues. The interacting partners and/or interactions of the variable residues differed between the two proteases ( Fig. 2 and Table 1). In CoV2 M pro , variable position 46 is located near the entrance of the binding site and shares the same loop with H41, however it is not interacting with M49 Taken together, divergent interactions mediated by these 12 variable residues implies that they are networking differently in the CoV2 M pro compared to CoV M pro . The consequent changes in the nature of the amino acids (Table 1) at these variable positions might underlie these alterations to the interaction networks.

The variable residues indirectly alter the interaction networks of the M pro active site
To understand the consequences of the modified networks on the protease active sites, we compared the networks established by residues comprising the active site (including the entrance to the binding site region) between the CoV2 M pro and CoV M pro (Table 2, Fig. 2 and Fig. S1). At the entrance, T26 changes its role with its partner T21 (from forming a hydrogen bond (in CoV M pro ) to forming a hydrophobic interaction (in CoV2 M pro )). The M49 in CoV M pro , used P52 and A46 (variable residue) these interactions were not in CoV2 M pro . Residue Q192 at the entrance region forms two new hydrogen bonds with R188 and V186 as a result of the modified networks in CoV2 M pro .
In the oxyanion loop, F140 is considered a functional regulator and (Xue et al., 2007); it forms hydrogen bonds with S1 (B) in CoV M pro and in CoV2 it makes a new link with S147. H163 interacts with C145 in CoV M pro , but has lost a hydrogen bond with G146 in CoV2 M pro . The neighbouring residue H164 in CoV2 M pro , has lost its interaction with L86 but gained an interaction with G174. M165 lies adjacent to the key residue E166 that is necessary to open the substrate binding site in CoV M pro [14]. This residue shows two changes in interacting partners in CoV2 compared to CoV, losing D187 and R188 and gaining F181 and F185. E166 still makes its typical hydrogen bonds with S1 (B) and H172 in CoV2 M pro as described in CoV M pro [14]. In addition, we identified a hydrogen bond between H172 (as it is one of the essential regulators in the active site of CoV M pro ) and S1 (B) in CoV2 M pro , which is not found in the CoV M pro . Altogether, a few of the key active site networks are indirectly modified between CoV M pro and CoV2 M pro as a result of direct changes to the neighbouring networks of the 12 variable residues.

Discussion
Our structural analysis of CoV2 M pro highlights that this new viral system not directly altering any of the key residues E166, F140, H163, H172 and S1 (B) in the protease active site but rather changing the neighbouring residues to modify their micro-environment and their interaction networks (Table 2, Fig. 2 and Fig. S1). Why this indirect approach has been favoured to alter these key networks at the active site is unclear; it might be to preserve the original functional role of these residues (as observed in CoV M pro ) while simultaneously modifying the way they function via their new networks. This concept now warrants detailed experimental analysis. The 97% identity between the CoV M pro and CoV2 M pro shows a similar outlook; however, specific variations conferred by just 12 variable residues that modify interactions, especially in the protease's active site region, are clear at the 3D structural level.
The interactions made by residues in the oxyanion loop (140-145) stabilize the S1 pocket to control the conformational changes in their micro-environment that differ between the active and inactive forms of the CoV M pro [7, 14,17]. In CoV2 M pro , the residue at position 134 changes from being positive (H134) in CoV M pro to hydrophobic (F134); this change causes a modification to its network and it is located on the loop that lead to the oxyanion loop, that might ultimately serve to regulate the active site, as required. At the active site entrance, residues M49 and Q189 are essential gatekeepers for substrate binding [13,14].  [18]. We now need to understand how the resulting altered networks in CoV2 M pro , especially around 286 (seven interactions in CoV M pro vs. ten in CoV2 M pro ), impact the function of the protease. Viral systems evolve rapidly at molecular level by mutations for their functional requirements during the process of natural selection [21,22]. Here, in both the M pro' s, the catalytic residue H41 is sandwiched between variable positions 35 and 46 in the same super-secondary structure; these variable positions have both modified their nature from CoV M pro to CoV2 M pro (polar to hydrophobic and hydrophobic to polar, respectively). H41 is conserved across all coronaviruses [9], thus there could be a functional reason or requirement behind a structural selection or modification near (but not at) this critical site.
The X-ray structure of CoV M pro highlighted the role of residue S1 (from protomer B) in stabilizing the active site by interacting with E166 (A) and F140 (A) and mediating inhibitor binding [14,16]. In CoV2 M pro , S1 (B) also seems to stabilize the active site, thus it is essential to consider a dimeric structure for ligand design. Interestingly, S1 (B) in CoV2 M pro forms a unique hydrogen bond with H172: this new interaction might also contribute to its restructuring process. Going forward, the functional consequences of the 12 variable regions should be assessed at the conformational level and in terms of the regulation of protease activity.
Functional assays, X-ray analyses like those performed previously [14] and molecular modelling approaches [17] are all warranted.    except S1 (protomer B) (removed for the clarity but indicated with symbol *S1(B)). Here, colour codes are: yellow sticks: unique residues in CoV M pro , cyan sticks: unique residues in CoV2 M pro , orange: domain I, violet: domain II, wheat: domain III, sky blue: connecting loop, green sticks: catalytic residues, and magenta sticks: active site residues.