In silico Repositioning for Dual Inhibitor Discovery of SARS-CoV-2 (COVID-19) 3C-like Protease and Papain-like Peptidase

Aims: In late December 2019, early reports predicted the onset of a potential Coronavirus outbreak in china, given the estimate of a reproduction number for the 2019 Novel Coronavirus (COVID-19). Because of high ability of transmission and widespread prevalence, the mortality of COVID-19 infection is growing fast worldwide. Absent of an anti-COVID-19 has put scientists on the urge to repurpose already approved therapeutics or to find new active compounds against coronavirus. Here in this study, a set of computational approaches were performed in order to repurpose antivirals for dual inhibition of the frontier proteases involved in virus replication, papain-like protease (PLpro; corresponding to nsp3) and a main protease (Mpro), 3C‑like protease (3CLpro; corresponding to nsp5). Materials and Methods: In this regard, a rational virtual screening procedure including exhaustive docking techniques was performed for a database of 160 antiviral agents over 3CLpro and PLpro active sites of SARS-CoV-2. The compounds binding energies and interaction modes over 3CLpro and PLpro active sites were analyzed and ranked with the aid of free Gibbs binding energy. The most potent compounds, based on our filtering process, are then proposed as dual inhibitors of SARSC-CoV-2 proteases. Key findings: Accordingly, seven antiviral agents including two FDA approved (Nelfinavir, Valaganciclovir) and five investigational compounds (Merimepodib, Inarigivir, Remdesivir, Taribavirine and TAS106-106) are proposed as potential dual inhibitors of the enzymes necessary for RNA replication in which Remdesivir as well as Inagrivir have the highest binding affinity for both of the active sites. Significance: The mentioned drug proposed to inhibit both PLpro and 3CLpro enzymes with the aim of finding dual inhibitors of SARSC-CoV-2 proteases. on


Introduction
Coronavirus virus is an enveloped positive-sense single-stranded RNA that eventually cause systemic changes and damage in many vital organs (8). Similar to all viruses, two-thirds of the viral RNA is translated into two large polyproteins which are cleaved by two proteases, Papain-like protease (PLpro) and 3C-like protease (3CLpro) (8,9). 3CLpro cleave the viral polyproteins to generate the key proteins required for viral maturation (10,11), replication and the helicase, such as the RNA-dependent RNA polymerase. SARS-CoV-2 3CLpro first crystal structure is recently released (7) and it shows that 3CLpro active pocket is identical to those of other CoVs which means antivirals bind this site can affect wide range of other similar types. PLpro cleaves the two polyproteins to generate the necessary proteins for the virus to replicate. Besides, PLpro has an additional role of stripping ubiquitin to help CoVs evade the host innate immune responses (12,13). 3CLpro and PLpro has no closely related homologues in humans which makes it an attractive molecular target to design anti-SARS drugs (14).
Daily increasing number of individuals infected by SARS-CoV-2 worldwide and death rate has raised the urge for rapid, concise, and precise strategies to investigate large database of compounds for a certain treatment.
In this study, a set of computational approaches were performed in order to discover repurpose antiviral for dual inhibition of the virus replication procedure. In this regard, a rational virtual screening procedure including exhaustive docking techniques performed over a database of 160 antiviral agents according to Drugbank (15). Compound binding energy and interaction modes over 3CLpro and PLpro active sites were analyzed and ranked. The most potent compounds, based on our filtering process, were then proposed as dual inhibitor of SARSC-CoV-2 protease.

Papain-Like Protease Homology Modeling
Since no crystal structure has been reported for SARS-CoV-2 PLpro yet, SWISS-MODEl workspace (https://swissmodel.expasy.org/interactive) was employed for SARS-CoV-2 PLpro homology modeling (16). PL protease protein sequence was obtained from NCBI GenBank (NCBI Reference Sequence: NC_045512.2), and the homology modeling was performed based on SARS-CoV PLpro crystal structure (PDB:3E9S) as the template with 82.80% sequence similar to the correspondence SARS-CoV-2 PLpro. The quality and validity of the created model was assessed based on Global Model Quality Estimation (GMQE) and QMean.

Target preparation
In order to monitor the interaction modes of the target molecules over 3CL protease and PLpro enzymes, the Maestro Molecular Modeling platform (version 11.5) by Schrödinger, LLC was used (17). First, the recently released crystal structures of 3CL protease enzyme retrieved from the Protein Data Bank (PDB: 5R82) (http://www.rcsb.org) and the PLpro enzyme structure was obtained by homology modeling.
Except the one structural water molecule in 3CLpro active site, which bridge the receptor important residues by way of H-bonds (His41, Asp187 and His164), the rest of the water molecules were removed from the enzyme crystallographic structure. The Protein Preparation Wizard (18) module was used to prepare the structure of both enzyme properly. Protein Preparation Wizard consists of a set of procedure including; adding the hydrogen atoms, updating and filling the missing side chains residues, respectively, optimizing hydrogen bonds, removing atomic clashes, adding formal charges to the hetero groups and then optimizing at neutral pH. Finally, the structures were minimized using optimized potential for liquid simulations (OPLS3) force field.
The Glide receptor grids constructed based on the co-crystalized ligand-binding sites in the Glide application (Glide, version) of Maestro. The center of each grid was arranged at the centroid of the crystalized ligand-binding site which is set with inner (acceptable space for the ligand center) and outer (search space surrounded all the ligand atoms) box sizes of 10 and 20 Å, respectively. In the case of 3CLpro the hydroxyl group of active site residues of Thr25, Thr45, Ser46, Thr54 and Ser144 are treated as flexible and for PLpro Ser915, Ser990, Ser1009, Tyr1013, Tyr1018 and Tyr1046 were set to rotatable during the docking experiment.

Database preparation
Total of 160 antiviral agents including; FDA, experimental and investigational agents were collected from Drugbank (15). In order to enrich the database by previously discovered agents known for inhibition of coronavirus 3CLpro, four screened non-peptidic small molecule inhibitors of SARS-CoV 3CLpro were also added to the database. These four compounds were discovered against human SARS-CoV 3CLpro by combination of virtual screening (VS) and high-throughput screening (HTS) of ZINC database, NIH Molecular Libraries and Asinex Platinum collection (19)(20)(21). The LigPrep (22) module were used to prepare ligand structure properly.

virtual screening workflow
The Virtual Screening Workflow in Maestro was used to dock and score the leadlike compounds to identify potential ligands (23). In order to increase the computational complexity, molecular docking was applied at three different levels of Glide docking precision. During these stages the protocol changes to more rigorous incorporating exhaustive conformational sampling which is a critical requirement for the final stages of the screening. The applied docking level consist of; High Throughput Virtual Screening (HTVS), Standard Precision (SP) and Extra Precision (XP) followed by post processing using Prime molecular mechanics /generalized born surface area (MM-GBSA). During all stages of molecular docking full flexibility was assumed for the ligands. In the HTVS step, up to 10 poses were generated for each compound state and 80% of the best states were preserved and entered the next docking step. During SP step, up to 5 poses were generated and all good scoring states were retained. Screening by the XP docking protocol was performed over 40% of the good states of the SP step. In this step, up to 3 poses were generated for each compound state, and only the best scoring states were kept.
Finally, 20% of the best scoring states from the XP docking were generated followed by post-processing using Prime (MM-GBSA) mode to calculate protein-ligand free Gibbs binding energies.

Prime MM-GBSA
The binding energies (ΔG Bind ) were calculated for each compound using Molecular mechanics/generalized born surface area (MM-GBSA) modules (Schrödinger LLC 2018) (24) based on the following equation; where ΔG Bind is the calculated relative free energy which includes both ligand and receptor strain energy. E Complex is the MM-GBSA energy of the minimized complex, and E Ligand is the MM-GBSA energy of the ligand after removing it from the complex and allowing it to relax. E Receptor is the MM-GBSA energy of relaxed protein after separating it from the ligand. The MM-GBSA calculation was performed based on the best pose structure obtained from XP docking complexes.

Active site of 3CLpro
All coronavirus 3CLpro structures are consist of two chymotrypsin-like β-domains; domain I (residues 1-99) and domain II (100-185), and one α-helical dimerization domain III (residues 201-303) and a long loop (residues 185-200) connects domains II and III. The active site is located in the center of the cleft between domains I and II, including a catalytic dyad consisting of His41 and Cys145 (11). The enzyme active site is composed of six subsites (S1-S6). Figure 4a represents the active site cleft along with the residues and the related subsites. S1-subsite is the most important subsite, because is conserved in all coronaviruses. It located at domain I (colored in light green) consists of residues 140-145 and 163-166, which formed the "outer wall" of the S1 site. The S2-subsite, which reside at the interface of domain I and the long loop connecting domain II and III (colored in orange) is formed of Val186, Asp187, Arg188, and Gln189 as well as side-chain atoms of His41, Met49, and Met165. Residues186-188 line the S2-subsite with some of their main-chain atoms.
The S3 side chain is oriented toward bulk solvent. Residues Met165, Leu167, Gln189, Thr190, and Gln192 surround the S4-subsite, which is surrounded at the interface of domain I and II (colored in cyan). The S5-subsite is composed of mainchain atoms of Thr190, Ala191, and Gln192 and S6 sub-site is almost at the outer area of the enzyme (11,27).

Active site of PLpro
The PLpro monomer is consists of 4 domains including the extended ubiquitin-like domain (UBL), the thumb domain, the palm domain and the fingers domain (12).
The active site is located at the interface of the thumb and palm sub-domains and contains a classic catalytic triad, composed of Cys856-His1017-Asp1031, adjacent to the flexible BL2 loop containing Trp851 (according to the homology modeled structure residue numbering). The enzyme has four substrate recognition subsites (S1-S4) (26). Figure 4b represents the active site, the residues and the related subsites. S1 and S2-subsites are mostly buried. S1-subsite includes residues Gly1016, Trp851, Cys856, and Tyr857. Residues Leu907, Asp909, Gly1016, and Tyr1018 are involved in shaping the conserved S2-subsite (residues colored in orange). S3subsite is partially solvent-exposed and include Gly2016. Like S1 and S2, S4 is also buried and is structured by Asp1047, Pro993, Tyr1009, Tyr1013, Tyr1018, and Thr1046 (residues colored in purple) (28). S2 and S4-subsites are the binding sites for SARS-CoV PLpro competitive inhibitors (28, 29).

Reliability of docking protocol
The applied docking procedure reliability was validated by re-docking of RZs and TTT over 3CL protease (5R82) and the homology model of PLpro SARS-CoV-2, respectively. The docked conformations corresponding to the lowest glide score were selected as the most possible binding modes. The root mean square deviation (RMSD) was calculated for ligands to measure the docking prediction accuracy. The pose was counted optimal if its RMSD found to be less than 2 Å. The RMSD of the re-docked conformations of RZs and TTT over 3CLpro and PLpro were 0.40 Å and 0.45, respectively, which is in acceptable value and considered as successfully docked (30,31). As a result, the validity of Glide dock parameters are reasonable in order to predict the related co-crystalized structures.
Docking of the co-crystalized ligand (RZs) over 3CL protease active site showed that the ring center of the pyridyl moiety interacted the binding pocket by establishing T-shape - interaction with Tyr41. In addition, the 4-amide group of

SARS-CoV-2
A structure-based virtual screening was performed following the computational complexity (three different levels of Glide docking precision) protocol of virtual screening workflow module in maestro (Schrödinger) over the recently released 3CLpro SARS-CoV-2 crystallographic structure and the homology model of PLpro, to screen potential small-molecule compounds from the constructed antiviral database (160 compounds). Reliability of the Glide docking procedure was validated by re-docking the co-crystallized ligand over the correspondence crystal structure.
Free Gibbs energy (Kcal/mol) based on MM-GBSA method used to score Glide

Investigating screening compounds over PLpro active site
The virtual screening workflow was also performed over PLpro active site and a set of 22 compounds was elucidated with considerable affinity and the structure of the ones solely inhibit PLpro are shown in Figures 7 and 8.

in silico study of proposed dual inhibitor over SARS-CoV-2 3CLpro and
PLpro active site  Nelfinavir was well located into the S4-subsite active pocket of PLpro, in which the benzamide and phenylsulphanyl ring stabilized through T-shape π-π stacking interaction with the side chain of Tyr1013, while the octahydro-1H-isoquinoline moiety oriented to the solvent accessible area of the active site (Figure 10c).
Similarly, Merimepodib is stabilized over S4 and S2-subsite of the active pocket by H-bond formation with key residues Asp909 and Gln1014 and T-shape π-π stacking interaction with Tyr1013 (Figure 10d).
In the case of Taribavirin, the carboximidamide and 1,2,4-triazole groups formed πcation and T-shape π-π stacking interactions with Tyr1013 and Tyr1009, respectively ( Figure  5734) known as a nucleoside analog is used for as a potential treatment for Ebola (35). Additionally, in 2017, its activity against the coronavirus family (SARS-CoV and MERS-CoV) of viruses was also demonstrated (36,37). Recently, it being researched as a potential treatment to SARS-CoV2 infection (38).

Conclusion
In order to discover repurpose antiviral for dual inhibition of the COVID-19 replication a set exhaustive docking technique were performed through a database of 160 antiviral agents over both 3CLpro and PLpro active sites of SARS-CoV-2.
Compounds binding mode and energy were analyzed and ranked. Accordingly, seven antiviral agents including Nelfinavir, Valaganciclovir Merimepodib, Inarigivir, Remdesivir, Taribavirine and TAS106-106 are proposed as potential dual inhibitors of the proteases enzymes necessary for RNA replication. The interatomic result shows the proposed compound located in the 3CLpro active site in a manner that their ester or its bioisoerter faced toward the Cys145, which known as key catalytic residue for hydrolysis through nucleophilic attack (39). In addition, the results reveals the ability of the mentioned compounds in blocking the entrance of the PLpro active site and inhibiting PLpro enzyme activity. More investigation regarding the capability of the extracted compounds is highly recommended.
COVID-19 pandemic. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors

Declaration of Competing Interest
The authors declare that there are no conflicts of interest. :   Table 1. Binding energy calculation and the type of proposed compound interactions over 3CLpro active site            The active compounds are showed as sticks and colored in green. The hydrogen bonding, π-π stacking, and electrostatic interactions are represented as yellow, blue, and purple dashed lines, respectively. Domain I, domain II and long loop connecting domain II and III colored in light green, cyan and orange, respectively.