In silico Identification of Potent COVID-19 Main Protease Inhibitors from FDA Approved Antiviral Compounds and Active Phytochemicals through Molecular Docking : A Drug Repurposing Approach

In silico Identification of Potent COVID-19 Main Protease Inhibitors from FDA  Approved Antiviral Compounds and Active Phytochemicals through Molecular Docking:  A Drug Repurposing Approach   Vaishali Chandel, Sibi Raj, Brijesh Rathi, Dhruv Kumar   Amity Institute of Molecular Medicine & Stem Cell Research (AIMMSCR), Amity University Uttar Pradesh,  Sec-125, Noida-201313, India  Department of Chemistry, Hansraj College University of Delhi, Delhi-110007 India   *Corresponding Author: Dr. Dhruv Kumar, J3-112, Amity Institute of Molecular Medicine & Stem Cell  Research (AIMMSCR), Amity University Uttar Pradesh, Sec-125, Noida-201313, India Tel: 7082436598   Email: dhruvbhu@gmail.com, dkumar13@amity.edu    Abstract   The Coronavirus COVID-19 Main Proteases play critical role in the propagation of the Novel  Coronavirus (COVID-19). Therefore, it is very important to identify a potential inhibitor  against COVID-19 Main Proteases to inhibit the propagation of the Novel Coronavirus  (COVID-19). We have applied a bioinformatics approach of drug repurposing to identify  possible potent inhibitors against Novel Coronavirus (COVID-19) through targeting COVID 19 Main Protease from FDA approved antiviral compounds and from the library of active  phytochemicals. The compounds were screened using PyRx virtual screening tool. Total 19  best compounds were identified after screening, based on their highest binding affinity with  respect to the other screened compounds. Out of 19, 6 best compounds were further screened based on their binding affinity and best ADME properties. Nelfinavir exhibited highest binding energy -8.4 Kcal/mol and strong stability to interacted with the amino acid residues present on active site of COVID-19 Main Protease. In addition to Nelfinavir (-8.4), Rhein (-8.1), Withanolide D (-7.8), Withaferin A (-7.7), Enoxacin (-7.4), and Aloe-emodin (-7.4) also showed good binding affinity and best ADME properties. Our findings suggest that these compounds can be used as potential inhibitors against COVID-19 Main Protease, which could be helpful in inhibiting the propagation of the Novel Coronavirus (COVID-19). Moreover, further investigation and validation of these inhibitors against Coronavirus would be very helpful to bring these molecules at the clinical settings.

The current outbreak of the novel coronavirus (CoVs) starting from December 2019 has been widely spread from the Hubei province of china to many other countries. The WHO emergency committee on January 2020 has declared a global health emergency based on the rate of increasing spread of the viral infection also estimated the fatality rate of about 4% [1]. Collaborative efforts from scientists worldwide are underway to understand the rapid spreading of the novel corona virus and to develop effective interventions to control and prevent the viral spread. Coronaviruses are positive-single stranded, enveloped large RNA viruses that majorly infect humans as well as a wide range of animals [2]. Tyrell and Bonne had first reported coronavirus in 1966, who cultivated the viruses from patients with common cold [3]. Depending on their morphological features representing a spherical virion with a core shell and projections in the surface resembling a solar corona, the viruses were termed as corona. Corona in Latin means crown based on their shape. Coronaviruses have four subfamilies which includes alpha-, beta-, gamma-and delta subtypes [4]. Alpha and beta coronaviruses are reported to be originated from mammals such as bats while gamma and delta viruses originate from mostly pigs and birds [5]. CoVs are large family of viruses that are common in various other species including camels, cattle, cats and bats. They range in the size of 60-80 nm with genome size varying between 26 and 32 kb. Recombination rates of CoVs are very high due to the ability to develop constant transcription errors and RNA Dependent RNA Polymerase (RdRP) jumps [6]. Most of the RNA content they have encode viral polymerase, RNA synthesis materials, and two large nonstructural polyproteins that are not involved in host response modulation (ORF1a-ORF1b). The other one third of the genome portion codes for four structural proteins (spike (S), envelope (E), membrane (M) nucleocapsid (N), and the other helper proteins [7]. CoVs have high mutation rates with the capability of causing infections in respiratory, gastrointestinal, hepatic and neurologic systems. The viruses are highly pathogenic in nature as they are also associated with severe acute respiratory syndrome (SARS). The novel corona virus (nCoV) emerged in 2019 has been the focus of global attention due to the pneumonia epidemic of unknown cause. The first case of pneumonia was reported on December 12, 2019 where possible pneumonia and corona virus infection were ruled out by clinicians.
The first step in viral infection is the interaction of spike protein with sensitive human cells [8]. After entering to the cells CoVs adapt to their human hosts by genome encoding and facilitating the expression of the genes that encode necessary accessory proteins. Genome alteration by CoVs are done through the mechanism of recombination, gene exchange, gene insertion or deletion [9]. The CoV subfamily is rapidly expanding with new generation sequencing applications that improve the detection and definition of novel CoV species. CoV classification is continually changing. According to the most recent classification of The International Committee on Taxonomy of Viruses (ICTV), there are four genera of thirty-eight unique species. SARS-CoV and middle east respiratory syndrome (MERS-CoV) attach to the host cell respectively and binds to the cellular receptor angiotensin converting enzyme 2 2 (SARS-CoV associated) and cellular receptor of dipeptidyl peptidase 4 (MERS-CoV associated) [10]. After infecting the host, the viral RNA manifests itself in the cytoplasm of the cell. Genomic RNA is modified through the process of encapsulation and polyadenylation and encodes several structural and non-structural genes. Proteases exhibiting chymotrypsin-like activity splits these polyproteins which drives the production of (-) RNA through replication as well as transcription [11]. During the replicative machinery the full length (-) RNA copies of the genome are used as a template for full length (+) RNA genomes. During the process of transcription, rna encoding all structural proteins and a subset of 7-9 sub-genomic RNAs are produced by discontinuous transcription. In the cytoplasm, viral nucleocapsids are combined from genomic RNA and R protein and then are budded into the lumen of the endoplasmic reticulum. Through the process of exocytosis virions are then released from the infected cell.
The released viruses then are capable to infect kidney cells, liver cells, intestines, and T lymphocytes, as well as the lower respiratory tract, where they form the main symptoms and signs. In notion of this, three patients having SARS-CoV infection were found to have CDT lymphocytes lower than 200 cells/mm3 [12]. MERS-CoV is able to affect human dendritic cells and macrophages in-vitro. T lymphocytes are also a target for the pathogen due to the characteristic CD26 rosettes. This virus can make the antiviral T-cell response irregular due to the stimulation of T-cell apoptosis, thus causing a collapse of the immune system.
Main protease domain (Mpro) has been reported to be a conserved target, in favour to design new inhibitors throughout the entire coronavirinae subfamily. The two-third region of 5' in the coronavirus genome consists of the of open reading frame I which encodes two large polypeptides of the replicase machinery: pp1a, and through ribosomal frameshift, pp1ab1. Two proteases encoded in the 5' region of ORF 1: papain-like protease (PLP) and 3C-like protease (3CL or Nsp5) cotranslationally cleaves the two polypetides into mature nonstructural proteins (NSPs) [13]. 3 CL protease is more commonly known as Mpro as it has a dominant role in the posttranslational processing of the replicase protein. Significant homology of Mpros in primary amino acid sequence as well as in 3D architechture has been reported in different human and animal CoVs. Also, they have a similar substrate binding pocket with a requirement for glutamine at P1 position and a preference for leucine/methionine at P2 position. This strong structural basis provides a loop hole to design a wide-spectrum anti CoV inhibitors. In general, there are few or no treatment options for viral diseases that occur suddenly and spread at a higher frequency. In notion of this note, there are no vaccine or effective treatment available to prevent the novel COVID-19 infection. Compounds are being tested in vitro against the COVID-19 infections and in humans for SARS-CoV and MERS-CoV trials. Studies analyzing the antiviral activity of type I interferon and type II interferon have provided significant results on interferon-beta (IFNb) as the most potent interferon that showed reduced in vitro replication of MERS-CoV. A report from South Korea has reported that the use of Lopinavir/Ritonavir (LPV/RTV-Anti-HIV drugs), EJMO 5 pegylated interferon and ribavirirn in combination showed successful viral clearance in human patients [14]. Although, remdesivir another antiviral drug was reported to be successful in United States of America. In vitro studies with this drug showed the termination of RNA transcription at early stage. There are publications demonstrating that remdesivir has a strong antiviral activity in epithelial cell cultures against SARS-CoV, MERS-CoV and related zoonotic bat CoVs. Several measures have to be taken in order to prevent the epidemic at a larger rate, such as early diagnosis, reporting isolation., supportive treatments, avoiding unnecessary panics. Also, basic preventive measures such as regular handwashing, using disinfectant solutions, avoiding contact with patients in order to prevent the spread of viruses by droplets. Healthcare staff should be informed about taking personal protective measures such as the use of gloves, eye masks and N95 masks during the examination of patients with a suspected history of COVID-19. The public services and facilities should provide decontaminating reagents for cleaning hands on a routine basis. Physical contact with wet and contaminated objects should be considered in dealing with the virus, especially agents such as fecal and urine samples that can potentially serve as an alternative route of transmission. A huge variety of biological active agents have been identified for their diverse therapeutical functions. Deep understanding of phytochemicals for antiviral activities have assumed greater importance in the last few decades [15]. A wide variety of active phytochemicals, including the flavonoids, terpenoids, organosulfur compounds, limonoids, lignans, sulphides, polyphenolics, coumarins, saponins, chlorophyllins, furyl compounds, alkaloids, polyines, thiophenes, proteins and peptides have been found to have therapeutic applications against different genetically and functionally diverse viruses. The antiviral mechanism of these agents may be explained on basis of their antioxidant activities, scavenging capacities, inhibiting DNA, RNA synthesis, inhibition of the viral entry, or inhibiting the viral reproduction etc. Large number candidate substances such as phytochemicals and their synthetic derivatives have been identified by a combination of in vitro and in vivo studies in different biological assays.
The preliminary studies done till date are not approved for the therapeutic use against COVID-19 infected patients. Liu et al. (2020) have successfully crystallised the main protease (MPro) of COVID-19, PDB-ID 6LU7, which is now accessible to the globe. 6LU7 represents a potential target for the inhibition of CoV replication. Therefore, in our study we have identified 19 potential inhibitors and 6 best compounds (Nelfinavir, Rhein, Withaferin A, Withanolide D, Enoxacin and aloe-emodin) as potential inhibitor COVID-19 major protease. These inhibitors can be repurposed against COVID-19 major protease to control the spread of Coronavirus.

Data sources
In this study, a dataset of 100 FDA approved antiviral compounds and 1000 active phytochemicals were obtained from FDA and Indian Medicinal Plants, Phytochemistry, and Therapeutics database [16,17].

Preparation of receptor/membrane transporter
The atomic coordinates of the protein, COVID-19 (PDB ID-6LU7) was downloaded from the RCSB PDB (protein data bank) database. Before analysis or docking, the charge assignment, solvation parameters and fragmental volumes to the protein was done using the Autodock Tool 4 (ADT) [18]. The protein molecule was further optimized using Autodock Tool for the molecular docking.

Preparation of ligands
The 3D SDF structure of all the compounds were downloaded from Pubchem database [19] and 2D ligand structures were designed using Chemsketch program (Figure-3). The ligands were optimized using Avogadro and converted into the PDB format with the help of Open Babel. In order to further simplify the analysis, ligands were first optimized and converted to PDBQT format using the graphical user interface version of PyRx virtual screening tool-python prescription 0.8.

Compound screening using PyRx program
Molecular screening of all the compound libraries was performed using PyRx software by autodock wizard as the engine for docking [20,21]. During the docking period, the ligands were considered to be flexible and the protein was considered to be rigid. The configuration file for the grid parameters was generated using Auto Grid engine in Pyrex. The application was also used to know/predict the amino acids in the active site of the protein that interact with the ligands. The results less than 1.0Å in positional root-mean-square deviation (RMSD) were considered ideal and clustered together for finding the favourable binding. The highest binding energy (most negative) was considered as the ligand with maximum binding affinity. Visual analysis of the docking site was performed using Pymol version 2.3.4 and the results were validated using Autodock-Vina [22].

Active site identification
Active site, amino acid residues in 6LU7 was identified using an online software using proteinligand indentifier profiler (PLIP) BIOTEC Du Dresden [23].

ADME analysis
On the basis of canonical SMILES of the selected ligands obtained from pubchem, ADME properties of the studied compound were calculated using online SwissADME program [24]. The major parameters for ADME associated properties such as Lipinski's rule of five, the solubility of the drug, pharmacokinetic properties and drug likeliness were considered. The values of the observe properties are presented in Table-3.

Results and Discussion
Coronavirus belongs to a group of viruses which can infect humans and vertebrate animals. It has killed thousands of people around the globe with an increase in death rate every single day. The infection hampers liver, respiratory, central nervous system, and digestive of humans and animals. Our study was focused on the drug repurposing against the main protease in coronavirus (3CLpro/Mpro), (PDB-ID:6LU7), as a potential therapeutic target for the treatment of coronavirus. 6LU7 is the major protease (Mpro) in COVID-19 that has been repositioned and structured in PDB recently and is accessible to everyone in the world (Figure-1). The Mpro in coronavirus is very important for the proteolytic maturation of the virus. Mpro has been examined as a potential target to prevent the spread of infection by inhibiting viral polyprotein cleavage through blocking active sites of the protein (Figure-2). With this new discovery of Mpro structure in COVID-19, has provided an immense opportunity to identify potential drug candidates for the treatment of coronavirus [25]. In our study, we have applied a computational approach of drug repurposing in order to identify a specific therapeutic agent against COVID-19. We have created a database of 100 FDA approved antiviral compounds and 1000 active phytochemicals from plants. The compounds were screened using PyRx virtual tool and out of the 1000, 19 compounds were selected based on their best binding affinity with COVID-19 major protease (6LU7). Molecular docking was performed with the 19 compounds against COVID-19 (Mpro) structure. Molecular docking is a computational method which aims to identify non-Covalent binding between protein (receptor) and a small molecule (ligand/Inhibitor). Docking predicts the mode of interaction between a target protein and a small ligand for an established binding site. Binding energy suggests the affinity of a specific ligand and strength by which a compound interacts with and binds to the pocket of a target protein. A compound with a lower binding energy is preferred as a possible drug candidate. In order to understand the effect of active antiviral and phytochemicals compounds on COVID-19, molecular docking of 19 active phytochemicals and FDA approved antiviral compounds were selected after screening from PyRx, was performed against COVID-19 (Table-2). Docking results of COVID-19 major protease (6LU7) with selected 6 compounds (Nelfinavir, Rhein, Withanolide D, Withaferin A, Enoxacin, Aloe-emodin) out of the selected 19 showed best docking score and were found to be best molecules at the target site of the protein. Out of the 6 compounds, Nelfinavir exhibited the best docked score (-8.4 Kcal/mol) with COVID-19 protein. TRP207, ILE281, LEU282,  PHE3, PHE291, GLN127, ARG4, GLY283, GLU288, LYS5, LYS137, TYR126, GLY138,  TYR126, SER139 and VAL135 are the amino acid residues participating in the interaction at the binding pocket of COVID-19 (Figure-4). Nelfinavir is one of the recently identified antiretroviral drug against HIV. It is a protease inhibitor used to limit the replication of the virus and boost immune function in individuals affected with HIV. Nelfinavir is a nucleoside reverse transcriptase inhibitors (NRTIs) which is evaluated as first-line therapy in HIV patients [26] ( Figure-5). Withanolides are a group of naturally occurring steroid which are oxygenated and are present in medicinal plants of solanaceae family. Formulations of withanolides have been exploited in various pharmacological activities including immunomodulatory, antioxidant, antibacterial, antiviral, antitumor, angiogenesis inhibitor, hypnosedative and antiarthritic [27]. Withanolide D exhibited (-7.8Kcal/mol) binding affinity with 6LU7. LYS102, PHE103, VAL104, ARG105, ILE106, GLN107, GLN110, PHE294, PHE8, ASN151, TYR154, ASP153 are the amino acid residues participating in the interaction at the binding pocket of 6LU7 (Figure-6). Rhein showed (-8.1Kcal/mol) binding affinity with 6LU7. LYS102, VAL104, ILE106 GLN110, THR29, THR111, PHE294, ASP295, GLN127, PHE8, ASN151, ILE152, ASP153, SER158 are the amino acid residues participating in the interaction at the binding pocket of 6LU7. Rhein (4, 5-dihydroxyanthraquinone-2-carboxylic acid) is extensively found in several medical herbs, such as Cassia tora L., Rheum palmatum L., Aloe barbadensis Miller., and Polygonum multiflorum Thunb, which have been used medically in China for over a decade. It is a lipophilic anthraquinone and has many anti-inflammatory effects, including anticancer, hepatoprotective, antioxidant, nephroprotective, pharmacological effects, and antimicrobial activities [28] (Figure-7). Withaferin A exhibited (-7.7Kcal/mol) binding affinity with 6LU7. PHE294, THR292, ASP295, ASP153, SER158, LYS102, PHE103, GLU178, ARG105, ILE106, GLN110, THR111, GLN178, VAL108 are the amino acid residues participating in the interaction at the binding pocket of 6LU7. Withaferin A is an active component and phytoconstituent of Withania somnifera. It has been exploited as a therapeutic potential and significantly validated for various pharmacological activities including neurological, cardioprotective, immunomodulatory, anti-cancer, anti-stress, neuroprotective activities ( Figure-8) (pharmacological and analytical aspects of withaferin A). Enoxacin exhibited (-7.4Kcal/mol) binding affinity with 6LU7. ASP295, PHE294, THR292, GLY109, THR111, ILE106, VAL104, ASN151, ASP153, GLN110, PHE112, ILE152, PHE8, PHE112 are the amino acid residues participating in the interaction at the binding pocket of 6LU7. Enoxacin belongs to the class of 4-quinolone. It has been recently identified as anticancer, antiinflmmatory and anti-bacterial agent [29] (Figure-9). Aloe-emodin exhibited (-7.4Kcal/mol) binding affinity with 6LU7. Aloe-emodin belongs to class of anthraquinones and possesses multiple anti-carcinogenic, anti-proliferative and anti-viral actions on humans [30] (Figure-10). The molecular docking analysis in the present study showed the inhibition potential of 6 compounds, ranked by affinity (ΔG); Nelfinavir > Rhein > Withanolide D> Withaferin A> Enoxacin>Aloe-Emodin.
Lipinski's rule of five is a major criterion to evaluate drug likeliness and if a particular chemical compound with a certain biological and pharmacological activity has physical and chemical properties that would make it a likely orally active drug in humans. Lipinski's rule determines the molecular properties which are important for a drug's. pharmacokinetics in the human body such as absorption, distribution, metabolism, and excretion (ADME) [32]. Lipinski's rule of five states that (i) a molecular mass less than 500 daltons, (ii) no more than 5 hydrogen bond donors, (iii) no more than 10 hydrogen bond acceptors, (iv) an octanol-water partition coeffcient log P not greater than 5. Three or more than 3 violations do not fit into the criteria of drug likeliness and it is not considered in order to proceed with drug discovery. ADME studies of selected 19 compounds showed that out of 19, 15 virtual hits were successful at passing through these ADME test filters (Table-3).

Conclusion
The drug repurposing approach would be the fast and most approapriate option to find therapeutic solutions for the Novel Coronavirus (COVID-19). The bioinformatics approach can be a very useful tool to identify potent inhibitors against the the Novel Coronavirus. In this study, we have used PyRx and Autodock-Vina to identify potent FDA approved inhibitors against COVID-19 Main Proteases which play crucial role in Coronavirus propagation. We have identified 19 potent inhibitors out of thousands compounds and found Nelfinavir, Rhein, Withanolide D, Withaferin A, Enoxacin and Aloe-emodin as most appropriate inhibitors against against COVID-19 Main Proteases. Our findings suggest the protentioal inhibitors agaianst COVID-19 Main Proteases, which can be further explored to test against Coronavirus  in pre-clinical and clinical settings.