Multiplex-heterogeneous network embedding for drug repositioning

Drug repositioning (also called drug repurposing) is a strategy for identifying new therapeutic targets for existing drugs. This approach is of great importance in pharmacology as it is a faster and cheaper way to develop new medical treatments. In this paper, we present, to our knowledge, the first application of multiplex-heterogeneous network embedding to drug repositioning. Network embedding learns the vector representations of nodes, opening the whole machine learning toolbox for a wide variety of applications including link prediction, node labelling or clustering. So far, the application of network embedding for drug repositioning focused on heterogeneous networks. Our approach for drug repositioning is based on multiplex-heterogeneous network embedding. Such method allows the richness and complexity of multiplex and heterogeneous networks to be projected in the same vector space. In other words, multiplex-heterogeneous networks aggregate different multi-omics data in the same network representation. We validate the approach on a task of link prediction and on a case study for SARS-CoV2 drug repositioning. Experimental results show that our approach is highly robust and effective for finding new drug-target associations. Network embedding, multiplex-heterogeneous network, multi-layer network, drug repositioning, graph representation learning


Introduction
Drug repositioning (also called drug repurposing) is a strategy for identifying new therapeutic targets for existing drugs [32]. This approach is of great importance in pharmacology as it is a 1 their specific type of nodes and edges and connected by bipartite interactions. In the case of drug repositioning, a heterogeneous network can be composed of drug-drug and target-target networks linked by drug-target bipartite interactions. This kind of biomedical network is usually sparse, noisy and incomplete. It appears fundamental to aggregate different sources of data to reduce the noise and add useful data to the network.
Our approach for drug repositioning is based on multiplex-heterogeneous network embedding.
Such a method allows to project in the same vector space the embeddings of both drugs and targets while combining the richness and complexity of multiplex and heterogeneous networks.
Indeed, multiplex networks are multi-layer networks, each layer shares the same type of nodes but their edges belong to different types. For drug repositioning, it allows to aggregate different -omics data in the same network representation. Recent methods of network embedding for drug repositioning focus on the training of a binary classifier after embedding to do link prediction and find new drug-target or drug-disease associations. In this work, we focus on biological modules by clustering the embeddings. The analysis of the cluster can also lead to the possibility of finding new drug combinations.
We validate the method on a task of link prediction and on a case study for SARS-CoV2 drug repositioning. Experimental results show that our approach is highly robust and effective to find new drug-target associations.

Related work
The application of network embedding to drug repositioning is a recent approach. So far, it focused on heterogeneous network embedding [3,25,38,38,37]. Zhou and colleagues focused on a drug-disease heterogeneous network [39]. The method called NEDD, applied meta paths of different lengths to explicitly capture the similarities within drugs and diseases, by which they optimize the embeddings of drugs and diseases. NEDD uses a random forest classifier to predict novel associations between drugs and diseases.
Other approaches focused on drug-disease heterogeneous networks like in [38]. In this article, Yang et al. proposed HED to predict potential associations between drugs and diseases based on a drug-disease heterogeneous network. From the embeddings, similarly to [39], they trained an SVM binary classifier to predict new associations. Chen and colleagues introduced cross-network embedding to embed drugs, targets and diseases nodes using two heterogeneous networks, a drug-target and a drug-disease network [3].
To our knowledge, multiplex networks have never been combined with heterogeneous networks for drug repositioning. Multiplex networks are able to integrate multiple types of data and can manage network noise, which impacts the accuracy of the prediction in the case where only heterogeneous networks are used. In addition, our approach differs from the literature by focusing on biological modules to find new drug-target associations. The methods cited above used binary classification to predict new drug-target associations. Our approach focused on a cluster analysis of specific targets to find new drug-target associations.

Method
We used MultiVERSE for multiplex-heterogeneous network embedding [30]. This method computes the similarities between nodes using random walks with restart on multiplex-heterogeneous network (RWR-MH) and optimizes the embeddings using Kullback-Leibler minimization. We present these two key components of the method in the next subsections.

Random walk with restart on multiplex-heterogeneous network (RWR-MH)
In a classical random walk (RW), an imaginary particle starts from a seed node, and explores the network, going from nodes to nodes by randomly selecting neighbour with a probability defined by its degree. In the RWR-MH algorithm, it has to travel in the two multiplex networks G1 M and G2 M . More formally, the first multiplex network can be defined as it is a L 1 -layer multiplex graph, with n × L 1 nodes, If there is a bipartite edge between two nodes, the particle may jump from a node in one multiplex network to the other multiplex. In multiplex-heterogeneous network, the restart can also happen in different types of node (see Figure 1) The bipartite graph is defined as The different sets of nodes V M and U M are only connected by the edges of the bipartite graph.
It is to note that the bipartite edges should link nodes with every layer of the multiplex graphs.
Therefore, we can now define the evolution of the probability distribution of random walk with restart [24]: p t = (p t (v)) v∈V M H . This distribution can be described as follows: where M denotes a transition matrix that is the column normalization of A M H , which is the multiplex-heterogeneous transition matrix. The vector p 0 is the initial probability distribution.
With a probability r ∈ (0, 1), the particle can jump back to the initial node(s), known as seed(s), at each step. The stationary distribution of Equation (1) represents the probability for the particle to be located at a specific node for an infinite amount of time [24].
Finally, this distribution can be interpreted as a similarity between the seed(s) and the other nodes. We use it to optimize the embeddings.

Learning objective
In the context of MultiVERSE, we need to compute a similarity in the multiplex-heterogeneous Therefore, sim G (v, .), the similarity for any node v of the multiplex-heterogeneous network is expressed as a probability distribution. As sim G (v, .) is defined as a probability distribution, MultiVERSE applies a softmax function to obtain the normalized similarity distribution in the embedding or vector space. Formally, w i is defined as the embeddings of node i in the embedding space, the similarity between two nodes embeddings w u and w v is defined as the dot product w u · w T v and: The aim of MultiVERSE is to approximate the similarity distribution in the embedding space, ). This learning phase is performed using Kullback-Leibler minimization between the two similarities: By keeping only the terms related to sim Emb as sim G is constant, we obtain the following objective function: At each iteration, as sim Emb is defined as a softmax function, it is necessary to normalize it over all the nodes of the network, which is computationally heavy. As in the original MultiVERSE algorithm, we used Noise Contrastive Estimation (NCE) to approximate the computations [12].
To sum up, in this framework, the similarity in the multiplex-heterogenous network is computed using Random Walks with Restart on Multiplex-Heterogeneous (RWR-MH) networks [36]. And from [11] in the PPI layer.
3) The last layer (8537 nodes, 63561 edges) is a molecular complexes layer constructed from the fusion of Hu.map [8] and Corum [10].

Link prediction
Similarly to [30], we used link prediction to evaluate the quality of the embeddings and validate our approach for drug repositioning. The link prediction pipeline is the following: first, we remove randomly 30% of the bipartite edges to obtain a training network, we then train a Random Forest on this training network, and test on the 30% removed edges. In order to train the binary classifier, we have to apply operators (Hadamard, Weighted-L1, Weighted-L2, Average and cosine) to the embeddings.
The aim of this validation task is to assess the quality of the embeddings in order to find drug-target associations. Direct comparisons with other methods are not possible as, to our knowledge, there is no other multiplex-heterogeneous network embedding method projecting both types of nodes in the literature.

Case study on SARS-CoV2 drug repositioning
The second approach we used for validation is to test the method on a case study, here the SARS-CoV2 drug repositioning. Indeed, in these times of pandemic, there is abundant literature on the efficacy of different drugs, both in vitro and in vivo. In addition, with the appearance of variants resistant to vaccines, drug repositioning is still particularly relevant.
In order to find new drug-target associations, we focused on biological modules. Once MultiVERSE has been applied to the drug-target multiplex-heterogeneous network, we used a clustering method on the embeddings and analysed the clusters. This approach allows us to find new drug-target associations but also possible drug combinations as they are included in the same drug-target modules. The clustering method we apply is spherical k-means [2] with k = 500 applied on the embedding. We analyse the 27 clusters corresponding to the SARS-CoV2 proteins [11]. We then confront our results to the clinical and biological literature to evaluate the usefulness and quality of our predictions.

Results on link prediction
MultiVERSE has a score of ROC-AUC superior to 0.9 with the Hadamard, Average and Cosine operators (see Table 1), meaning that the method can predict with high precision the removed  Table 1: ROC-AUC scores for link prediction using MultiVERSE. Link predictions are computed for the bipartite interactions of the multiplex-heterogeneous networks. The scores higher than 0.9 are highlighted in bold.
30% of drug-target links from the corresponding multiplex-heterogeneous networks.
The variance is very small for all operators. The network embedding method is highly robust and steady across each run of the link prediction evaluation test.

Cluster analysis of SARS-CoV2 proteins
In the clusters of the different SARS-CoV2 proteins, we found 88 molecules, out of which 33 are already FDA-approved. Given length constraints, we will present here the most interesting of them.
We found two drugs of interest in the nsp1 protein cluster: cladribine and gallium maltolate.
Cladribine is used for multiple sclerosis and has been associated with mild or no symptoms after COVID-19 infection [6]. Gallium maltolate has also in vitro activity against the virus [1].
In the cluster of the nsp6 protein, we have the anti-malarial drug mefloquine that has in vitro inhibition of SARS-CoV2 [9]. Other anti-malarial drugs are present in this cluster like Halofantrine, that have been proposed for COVID-19 repositioning [33] or Voacamine. Amodiaquine is also a drug that could be a target for this disease [13]. We also found dronedarone that has also been identified as an active inhibitor of the virus [20].
We also found glutathione in the nsp5 C145A protein cluster. This molecule could address the cytokine storm syndrome [16].
In the N protein cluster, we have a repositioning target with S-oxy-L-cysteine. It is a member of the family of L-alpha-amino acids. L-cysteine in combination with vitamin D has been shown to reduce mortality associated with COVID-19 in African Americans [19]. We also found in this cluster a preclinical molecule, the sanglifehrin A, which glue linking IMPDH with cyclophilin A, which itself is involved in viral capsid packaging.
We obtained one interesting drug in the cluster including the orf3b protein of COVID-19.
We found the FDA-approved drug ezetimibe a a potential target for repositioning. It has been reported that patients taking this molecule have significantly reduced odds for SARS-CoV-2 hospitalization [18].
In the orf3a protein cluster, we identified the lumichrome molecule, a constituent of honey that would be an inhibitor of the virus [15]. We have also the riboflavin drug which has been shown to inactivate the virus [22].
Several potential targets for repositioning are in the nsp14 protein cluster of COVID-19. We have the AICA ribonucleotide, an AMP-activated protein kinase activator it has been shown to be effective as a treatment against influenza [27]. In addition, another AMPK activator, metformin, has been associated with decreased mortality in COVID-19 [26]. In this cluster, we also have mycophenolate mofetil which is transformed in its active form the mycophenolic acid as a target for repositioning. This acid is known to inhibit dengue virus and it has been shown that it is active against COVID-19 [21] and has been proposed as a treatment in combination with interferon [7]. We also have the ribarivin anti-viral drug that has been used clinically for COVID-19 treatments in combination with other drugs [17].
The valproic acid is in the cluster of the nsp5 protein of COVID-19. This compound has been proposed a a potential treatment as it reduces ACE2 expression in endothelial cells [34]. We also indentified panobinostat, a histone deacetylase inhibitor that could suppress ACE2 and ABO of COVID-19 [35].

Conclusion and perspectives
We presented the first application of multiplex-heterogeneous network embedding for drug repositioning. The method combines for the first time the richness of multiplex networks with the complexity of heterogeneous networks in order to find new drug targets. We tested the quality of the embeddings on link prediction and showed the method is highly robust to find drug-target associations for COVID-19. We also found several targets that have have anti-viral properties against SARS-CoV2 in vitro or in vivo.
We also identified several other targets for repositioning with this approach. We have for example different anti-cancer drugs like vorinostat or pracinostat in the nsp5 cluster for example that needs further investigations. We also have all the other molecules that are not FDA-approved we included in the multiplex-heterogeneous network.
There are several perspectives to this work. In this article, we projected drugs and targets in the same vector space. An interesting extension would be to project drugs, targets and diseases in the same vector space using either cross-network embedding [3] for multiplex-heterogeneous networks or by extending RWR to 3 multiplex (molecular, drug and disease) and 2 bipartite networks (drug-target and drug-disease for example).