COVID-19: A drug repurposing and biomarker identification by using comprehensive gene-disease associations through protein-protein interaction network analysis

COVID-19 (2019-nCoV) is a pandemic disease with an estimated mortality rate of 3.4% (estimated by the WHO as of March 3, 2020). Until now there is no antiviral drug and vaccine for COVID-19. The current overwhelming situation by COVID-19 patients in hospitals is likely to increase in the next few months. About 15 percent of patients with serious disease in COVID-19 require immediate health services. Rather than waiting for new anti-viral drugs or vaccines that take a few months to years to develop and test, several researchers and public health agencies are attempting to repurpose medicines that are already approved for another similar disease and have proved to be fairly effective. This study aims to identify FDA approved drugs that can be used for drug repurposing and identify biomarkers among highrisk and asymptomatic groups. In this study gene-disease association related to COVID-19 reported mild, severe symptoms and clinical outcomes were determined. The high-risk group was studied related to SARS-CoV-2 viral entry and life cycle by using Disgenet and compared with curated COVID-19 gene data sets from the CTD database. The overlapped gene sets were enriched and the selected genes were constructed for protein-protein interaction networks. Through interactome, key genes were identified for COVID-19 and also for high risk and asymptomatic groups. The key hub genes involved in COVID-19 were VEGFA, TNF, IL-6, CXCL8, IL10, CCL2, IL1B, TLR4, ICAM1, MMP9. The identified key genes were used for drug-gene interaction for drug repurposing. The chloroquine, lenalidomide, pentoxifylline, thalidome, sorafenib, pacitaxel, rapamycin, cortisol, statins were proposed to be probable drug repurposing candidates for the treatment of COVID-19. However, these predicted drug candidates need to be validated through randomized clinical trials. Also, a key gene involved in high risk and the asymptomatic group were identified, which can be used as probable biomarkers for early identification.

coronavirus (2019-nCoV) also called Severe Acute Respiratory Syndrome (SARS-CoV-2) causes Coronavirus Disease 2019 (COVID-19) (4). Recent findings suggest that clinical & pathological symptoms caused by COVID-19 resemble SARS, which is caused by the SARS coronavirus (SARS-CoV) (5,6). SARS-CoV and SARS-CoV-2 were both found to be from bat origin (7)(8)(9)(10)(11). It is thought that human to human transmission is through an intermediate host (12), while for SARS-CoV it is through civet cat. The SARS-CoV-2 intermediate host is still unknown (13,14). Though some studies predicted intermediate hosts to be though pangolin it is still not proven (15). SARS-CoV-2 and SARS-CoV are also known to infect humans using the same angiotensin-converting enzyme 2 (ACE2) receptor (16). While at the level of the whole genome, SARS-CoV-2 and SARS-CoV were found to be distantly related to sequence identity (79.6 %), but the spike-protein between two viruses was found to be very similar in structure (17).
The Spike protein present in both SARS-CoV and SARS-CoV-2 binds to the host cell through the receptor-binding protein called angiotensin-converting enzyme 2 (ACE2), which is located on the host membrane cell surface. While both SARS-CoV and SARS-CoV-2 bind to the same host cell as ACE2, the SARS-COV-2 binding affinity to ACE2 is significantly higher than that of SARS-CoV. The viral protein responsible for hosting and replication of SARS-CoV-2 entry is identical in structure to 19). To date, there are no antiviral agents and vaccines available for SARS-CoV-2, although the possible antiviral drugs such as remdesivir, chloroquine, hydroxychloroquine, ritonavir/lopinavir with inteferon beta are used as preventive agents for COVID-19 for the treatment of this disease (20-22). Many computational studies are underway to identify potential anti-viral drugs and vaccines (23,24). According to the WHO and CDC, the common symptoms for COVID-19 are runny nose, sore throat, cough, fever, and difficulty in breathing for severe cases. In a recent report from Wuhan hospital based on clinical course and outcome of 107 patients, the clinical progression of COVID-19 is shown as a tri-phasic pattern that involves mild and severe cases of . According to the CDC, the severity of cases is mostly for those patients who have high-risk factors like hypertension, diabetes, heart disease, cancer, and lung disease (26).
The popular diagnostic element in the detection of SARS-CoV-2 is in respiratory specimens by next-generation sequencing or RT-PCR methods in real-time. The throat-swab or nasopharyngeal swab specimen collected from patients will be PCR re-examined at every other day. Also performed are regular blood count laboratory review, serum biochemical examination, coagulation profiling, myocardial enzymes, interleukin-6 (IL-6), serum ferritin, and procalcitonin. In addition to that CT scan or chest, radiographs are used for a routine check for the patients. The patient is considered to recover from COVID-19 if fever is absent for at least 3 days, improvement is noted in lung and chest CT, improvement in respiratory symptoms and negative for SARS-CoV-2 RNA for at least 24 hours from the collected throat-swab specimen of the patient (27-29).
Due to the over-welcoming rush of patients to hospitals, many countries have begun to accept COVID-19 patients only with severe conditions, while mild conditions such as fever and cough have been requested to self-quarantine for 14 days to avoid infecting others. Treatment is desperately needed at around 15 percent of COVID-19 patients with serious illness. Scientists are attempting to repurpose drugs that have already been approved for other similar diseases and have proved to be fairly effective rather than coming up with substances from scratch that may take years to develop and test (30)(31)(32). In this study, a gene-disease association study was performed for COVID-19 by comparing genes involved in causing symptoms, high-risk factors, and clinical outcomes to identify key genes involved in individual high-risk factors and asymptomatic symptoms for COVID-19.

Data source & retrieval:
DisGeNET(33) is one of the largest and comprehensive databases containing human genedisease associations. All gene-disease association genes were retrieved from the DisGeNet database. This database contains a collection of genes associated with human diseases that contain integrated data from GWAS catalogs and animal models. All the gene-disease association genes were retrieved using the common Human Genome Organisation (HUGO) gene symbol. The gene-disease association was retrieved based on COVID-19 symptoms, clinical outcomes, risk factors, and SARS-CoV infection. Gene Ontology (GO) is the representation of genes with their biological properties. The all gene ontology related to viral entry and viral life cycle was downloaded from the amigo gene ontology database. The human gene-disease association related to COVID-19 was retrieved as follows: The curated dataset related to COVID-19 gene sets were downloaded from CTD (34). These gene sets were collected from the MeSH terms (C000657245) under category respiratory tract disease & viral disease.

Data pre-processing
All gene-disease association of 22 lists containing related to COVID-19 (GD1) was compared using a multiple comparison tool called multiple list comparator tool available at molbiotools. The tool compares based on pairwise intersections with a full symmetrical matrix based on the Jaccard index. After the comparison, the common gene sets were obtained. All the genes were selected based on the Jaccard index of more than 0.3 from the DisGeNET. All gene sets constructed from disgenet (GD1) were compared with a curated dataset of COVID-19 (GD2) released from the Comparative Toxicogenomics Database containing 473genes. The overlapping gene sets (GD3) were selected for enrichment analysis

Gene enrichment analysis
The overlapping genes selected from gene set (GD3) were enriched for gene ontology mapping using with setting Benjamini and Hochberg with P-value less than 0.05 by using the panther tool(35).

Construction of comprehensive Protein-Protein Interaction (PPI) network
The Protein-Protein Interaction network was constructed using the STRING database (36) by using selected enriched genes. The STRING is a database containing information on proteinprotein interactions of both known and prediction-based. The selected genes were used to construct a PPI network using the String database with setting to 0.4 and above.

Protein-Protein network analysis and identification of key genes
The PPI network was visualized and analyzed by Cytoscape (37). The key genes were identified by using the cytohubba (38) app available in Cytoscape. It predicts important nodes or hubs in an interactome network by using several topological algorithms. In this study, Maximum Clique Centrality (MCC) was used to identify key/hub genes from the whole network.

PPI network construction for high-risk factor group
Apart from the comprehensive network, the PPI network was constructed only for high-risk factor groups separately to understand the mechanism of disease. For these, four separate networks were constructed for hypertension, diabetes, heart disease, lung disease, kidney disease, and cancer by using SARS-CoV disease-gene association, viral and viral life cycle from gene ontology.

PPI network construction for the asymptomatic group (without fever)
The PPI network was constructed for very mild symptoms like cough, runny nose, diarrhea to understand the mechanism of the asymptomatic group.

Drug-gene interaction analysis
The identified hub genes were predicted for therapeutic target or drug-using drug-gene interaction database (39) (DGIdb2.0; Http: //www.dgidb.org/). The setting was limited to the FDA approved drug database.

STITCH drug-gene network construction
The predicted FDA approved drugs from hub genes through the drug-gene interaction database were used for drug-protein network construction through the STITCH database (40). The drug was prioritized based on a network score of more than 0.9.

Identification of common genes for COVID-19
Based on symptoms, clinical outcomes of mild, moderate & severe cases of COVID-19 related disease, the high-risk factor involved in the COVID-19 severe cases-based diseaseassociated genes were selected for the study. The overall framework of workflow is shown in Figure 1. As the human disease-gene association is lacking for SARS-CoV-2 infection, gene sets related to SARS-CoV was used to relate various symptoms. A clinical outcome of other gene sets of viral entry and viral life cycle was included from the amigo gene ontology database (41). This comprehensive gene set was compared with the pairwise intersection method by using the Jaccard index. These genes selected based on the Jaccard similarity score, Jaccard score, disease-gene association score, and disease-disease association score based on the DisgeNet database.
Although these gene-disease associations cannot exclude false-positives, some diseases are better studied than others which can affect the gene-set. Because of this reason, the datasets probably will be noisy and incomplete due to the nature of the curation process. For this reason, the gene sets are selected only from human-data and any gene related to mouse and rat model is discarded. The common genes selected based on the Jaccard similarity score were 1930 (Supplementary Table 1). These genes were compared with the curated list of COVID-19 from the CTD database containing 473 genes (Supplementary Table 2). The non-redundant overlapping genes were selected for gene enrichment (Figure 2).
The common genes are mapped through gene ontology and genes are selected based on the statistical significance of p-value less than 0.05. The selected genes were also compared with the STRING protein-protein interaction database and only the genes which have greater than 0.4 interactions were further selected for network construction. Based on the above criteria, 279 genes were selected as statistically significant enriched genes (Table 1).

Protein-Protein interaction network analysis for COVID-19 related genes
The process by which two or more proteins from a complex through non-covalent bonds is called protein-protein interaction (PPI). The molecular mechanisms of disease or new drug targets can be identified by using PPI network analysis. Moreover, this gene was used to construct Protein-Protein interaction and genes were selected based on the interaction score of more than 0.4 ( Figure 3). The PPI network was constructed using the STRING database and analyzed by Cytoscape. The hub genes were identified by using cytohubba using the MCC method (Table 2). This method uses 11 centrality measures to identify the hub genes from the network. The identified top genes function predicted through gene mania webserver revealed that most of the genes were involved in an inflammatory response, cell chemotaxis, cytokine activity, cytokine receptor binding, regulation of inflammatory response and adaptive immune response (Figure 4). The identified top 10 hub genes are as follows:

VEGFA
This is important for viral infection and its associated pathology(42). Vascular Endothelial Growth Factor promotes SARS-CoV viral entry.

TNF
Inflammation is a biological reaction resulting in a possible threat. This response may be natural but, under some circumstances, the immune system may attack the normal cells or tissues of the body that cause an abnormal inflammation due to viral entry. TNF-has been identified as a key inflammatory response regulator. TNF signaling responses in the lung to promote viral entry and persistence, pro-inflammatory cytokine tumor necrosis factor-alpha can be readily detected after infection (43,44).

IL-6
Interleukin 6 (IL-6) is developed in response to induced infection and tissue damage. It is stated that the up-regulation of IL-6 can promote viral survival or alleviation of the disease during viral infections (45).

CXCL8
ELR-containing CXC chemokines CXCL8 promotes Neutrophil infiltration. Neutrophil (PMN) infiltration plays a central role in inflammation and is a major cause of tissue damage. This neutrophil infiltration may perform phagocytosis and cause adverse effects of inflammation due to viral associated damage (46).
Interleukin-10 (IL-10) is an immunoregulator to prevent tissue damage, however, the virus evolves to exploit immunoregulatory mechanisms for their survival in the infected host (47).

CCL2
The CCL2 gene significantly enhances the pathogenesis and replication of viruses (48-50)

IL1B
IL-1B gene is reported to be mediating acute pulmonary inflammation through inflammation of lung cells during viral infection (51)(52).

TLR4
The TLR4 Toll-like receptor 4 activation helps to create a defensive immune response but an excessive inflammatory response can lead to damage to the host during viral infection (53,54).

ICAM1
ICAM-1 (Intercellular Adhesion Molecule 1) gene is stated to play a major role in infectious disease in viral replication modulation and also as a site for the cellular entry of certain viruses. ICAM-1 is caused by interleukin-1 and tumor necrosis factor (TNF) and expressed by the lymphocytes and vascular endothelium (55,56).

MMP9
MMP9 is developed by a variety of cells in the respiratory tract and has been reported to play a key role during pulmonary viral infection due to immune response modulation. MP9 has anti-Respiratory Syncytial Virus properties that enhance viral clearance, neutrophil recruitment, and loss of MMP9 expression (57). It will be interesting to study the role of MMP9 in innate responses to SARS-CoV-2 infections further.
An inflammatory cytokine is a signaling molecule secreted from helper T Cells which includes interleukin-1. Tumor necrosis factor-alpha plays an important role in mediating the innate immune response. The excessive production of inflammatory cytokines due to COVID-19 disease contributes to inflammatory disease. Such cytokines include interferons, interleukins, chemokines, colony-stimulating factors, and tumor necrosis factors and lead to coronavirus infection symptoms such as redness, swelling/edema, fever, and pain. The overproduction of pro-inflammatory cytokines can lead to a "cytokine storm," during which inflammation spreads throughout the body through the circulation (58,59). This proinflammatory cytokine has negative adverse effects such as inflammation of the kidney, lungs, and heart, which is the reason for patients to be prone to a high-risk group for COVID-19 (60).

Protein-Protein Interaction network analysis for asymptotic person
The protocol is usually practiced at all entry points to assess body temperature for fever and is isolated for laboratory research. However, for people who have no symptoms or very mild, cold-like symptoms like runny nose, cough, and sore throat are overlooked. In general, asymptomatic infections cannot be identified until they are confirmed by RT-PCR. Yet it is treated as a silent carrier. Finding genes related to asymptomatic showing just sore throat, cough, runny nose, headache without fever will improve understanding of COVID-19 transmission and spectrum of the disease it causes and it will provide insight into the pandemic cause. The protein-protein network was constructed with symptoms like cough, runny nose, sore throat along with SARS-CoV, viral entry and viral life cycle gene sets and compared with CTD curated COVID-19 gene data set. The key genes involved in an asymptomatic group of COVID-19 predicted genes are IL6, TNF, CXCL8 IL1B, IL10, CCL2, ICAM1, IL2, STAT3, and CCL5. These IL1B and STAT3 can only be found in the asymptomatic group when compared to other groups. Upregulation of STAT5 dimers gene expression has been observed for inflammation-related genes. Signal transducer and transcription activator 3 (STAT3) is a central regulator of many physiological functions, including immune response. Interleukin 1 beta (IL-1β) also known as leukocytic pyrogen is a cytokine protein encoded by the IL1B gene in humans. This cytokine is an essential mediator of inflammatory reactions and is involved in several cellular activities, including cell proliferation, differentiation, and apoptosis. These genes can be used as biomarkers to identify COVID-19 in the asymptomatic group (Table 3).

Drug-gene interaction analysis of COVID-19
Based on the drug-gene interaction database (DGIdb2.0), the identified FDA approved drugs with the gene were used for STITCH prediction for each drug-gene association ( Figure 5-14). The drugs were selected based on the network interaction score above 0.9 as follows ( Table 2): Chloroquine is a medication used to prevent and treat malaria (61) and is suggested for COVID-19 treatment. Chloroquine has antiviral effects that work by increasing endosomal pH resulting in impaired virus/cell fusion that requires a low pH. The presence of nitrogens in chloroquine and the number of related isoquinoline and quinoline drug family members prevent the endosome from acidifying and thereby disrupt viral replication. When more nitrogens are added, either by making extra branches of ionizable nitrogens or by lengthening one of the chains by adding extra carbons and other nitrogens around it which can have an even greater effect.

lenalidomide (0.940)
Over the past ten years, lenalidomide has been used widely to treat both inflammatory conditions and cancers.

Penicillin (0.933)
Penicillin is a group of antibiotics. The combination of antibiotics with an anti-viral drug is proved effective in controlling viral replication.

Pentoxifylline (0.990)
This is used as a drug to treat muscle pain in people with peripheral artery disease. Studies have demonstrated a reduction in the risk of hepatorenal syndrome. Pentoxifylline, a phosphodiesterase inhibitor potently suppresses cytokine production as a neonatal antiinflammatory agent. It is reported to be more effective at improving blood vessel function and reducing inflammation than antiretroviral medications alone in people infected with HIV(62,63).

Thalidome (0.980)
Thalidome used for cancer diagnosis is also used for treating a variety of HIV-related conditions(64).

Sorafenib (0.909)
This is used for treating cancer of the kidneys, liver, and lung (65). It is reported sorafenib inhibited replication of New World alphaviruses and two Old World alphaviruses, Sindbis virus, and chikungunya virus, leading to a reduction in viral protein production and overall viral replication (66).

IL8
Paclitaxel (0.947) is used to treat several types of cancer and reported to have anti-viral activity (67).

IL10
Rapamycin (0.985)-Rapamycin, a powerful mTOR inhibitor, has proven effective in the treatment of some diseases. Immunomodulatory drug rapamycin (RAPA) possesses anti-HIV properties and can be a valuable medication that should be used for viral infection prevention and treatment(68).

IL1B
Cortisol (0.958) Cortisol medication used to treat conditions arising from the B-Cell mediated antibody response due to overactivation and prevents the cause of inflammation by limiting the release of inflammatory substances. Corticosteroids are used in the treatment of severe acute respiratory syndrome (SARS-CoV) and it may suppress the "cytokine storm"(69).

Conclusion and limitation of the study
In this study, by using gene-disease association, genes related to COVID-19 symptoms, clinical outcomes, and risk factors were studied using the network-based methodology for identification of drug repurposing and also network analysis for the high-risk group and asymptomatic to identify biomarkers. Based on this analysis, drug targets for prioritized and genes were identified as biomarkers. These results were validated by literature data, but this study has several limitations. All predicted drugs must be validated either through randomized clinical trials or through experimental assays before being used in patients. The network was constructed based on the gene-disease associations and from the curated data set from the disgenet and CTD database, which were based on literature mining. However, it is noted during the writing of this manuscript that the network analysis of this study reported chloroquine as already used in the treatment of COVID-19.

8.
Hu                Table 3: Top 10 key genes of high risk with predicted FDA approved drug and asymptomatic group identified from Protein-Protein interaction network by using Cytohubba