Submitted:
13 July 2023
Posted:
14 July 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Topological Analysis: This analysis is based on topological properties of the network providing information to be used in further analysis decsribed in the next sections.
- Clustering: This is the process of discovering dense regions of a biological network which may indicate important activity for the survival of the organism or sometimes disease states.
- Network Motifs: These are frequently repeating subgraph patterns in biological networks which may indicate some specific function performed by them.
- Network Alignment: The alignment of two networks shows the similarity between them which may be used to deduce hereditary relationships. This affinity may help to discover comserved regions in organisms to aid understanding the evolutionary process.
2. Biological Networks
2.1. Networks In the Cell
- Protein Networks: Proteins are the workhorses of the cell performing vital functions of organisms. A protein is basically a sequence of amino acids contsructed by the code in a gene which is part of the DNA. The 3-D structure of a protein plays an important role in its function which various drug treatment methods use this property to disable functioning of a disease causing virus such as the HIV. A protein interacts with various other proteins through bio-chemical reactions forming a protein-protein-interaction (PPI) network. Nodes with high degrees in a PPI network has fundamental functions in the cell [23]. The PPI network of T. pallidum is depicted in Figure 1 where proteins involved in DNA metabolism are shown as enlarged red circles.
- Gene Regulation Networks:The main function of a gene in DNA is to provide the code to be used through transcription and translation processes to produce a protein. This process is called gene expression and the mechanism of a specific gene expression is controlled and affected by proteins coded by other genes denoted regularity interactions. For example, gene X regulates gene Y if a change in expression of gene X results in a change in the expression of gene Y. A gene regulation network (GRN) is made of genes, proteins and various other molecules which may be modeled by a directed graph with nodes representing these entities and the edges showing their biochemical interactions leading to regulations as shown in Figure 2. Typically, a GRN is a sparse graph with small world and power-law properties, which means there are only few nodes that have very high out-degrees which regulate other gene expressions. Moreover, the distance between any two nodes in a GRN network is small compared to the size of the network as consistent with small-world properties.
- Metabolic Pathways: The main ingredients of the cell such as sugars, amino acids and lipids are produced by the basic chemical system called metabolism that work on ingedients called metabolites. The biochemical reactions in the cell that result in metabolisms can be modeled by directed or undirected graphs with nodes representing metabolites and edges showing biochemical reactions which transform one metabolite to another one [22,60,64]. An edge in such a graph may also represent an enzyme that catalyzes a biochemical reaction. An undirected edge in the graph model denotes a reversible reaction where a directed edge means an irreversible one. A metabolic pathway is a sequence of biochemical reactions to perform a specific metabolic function. An example of a metabolic function is glycolysis in which a glucose molecule is divided into two sugars which generate adenosine triphosphates (ATPs) to produce energy. Graphs representing metabolic pathways have the small-world and scale-free properties. Study of metabolic pathways may provide insight into pathogens causing infections in search of cures for diseases [26].
2.2. Networks Outside the Cell
- Brain Networks: We can analyse brain networks at cell (neuron) level or at a coarser functional level. A neuron in the brain fires when the sum of its input signal strengths exceeds a thershold. A neural network made of neurons performs various cognitive tasks such as problem solving, reasoning and image processing. The artificial neural networks function similar to biological neural networks and have been used widely to implement various tasks in deep learning which is a component of machine learning to be used for artificial intelligence tasks. At a coarser level, we can investigate the functions performed by the brain using brain structural networks (BSNs) or brain functional networks (BFNs). A BSN basically reflects the structures of neural connections whereas a BFN models the connnectedness of the functional regions of the brain. Studies of BFNs have shown that these networks are also small-world and scale-free networks like most of the biological networks [62].
- Phylogenetic Networks: A phylogenetic tree shows evolutionary relationships among organisms with leaves representing living organisms and the intermediate nodes their common ancestors. A phylogenetic network is the general form of a phylogenetic tree where a node may have more than one parent.
- The Food Chain: Living organisms rely on food for survival. The food chain directed graph shows the relationships between the predators and preys where the direction of an edge is from the predator to the prey.
3. Large Graph Analysis
3.1. Degree Distribution
3.2. Density
3.3. Clustering Coefficient
3.4. Matching Index
3.5. Centrality
3.5.1. Closeness Centrality
3.5.2. Vertex Betweenness Centrality
3.5.3. Edge Betweenness Centrality
4. Large Network Models
- Random networks: This type of networks, proposed by Erdos and Renyi, assumes that an edge between the vertices u and v is formed with the probability . The degree distribution in random networks is Binomial following Poission distribution. A random network has a short average path length and has a clustering coefficient inversely proprortional to the size of the network [15].
- Small-world networks: This type of networks are characterized by low average path lengths and short diameters. Biological networks such as PPI networks, GRNs and metabolic pathways, and other complex networks such as social networks and the Internet exhibit this property. The diameter of a small-world network is proportional to where n is the number of nodes in the network.
-
Scale-free networks: Most biological networks have few high-degree nodes with many low-degree ones. The PPI network of T. pallidum in Figure 1 exhibits small-world and scale-free network properties as can be seen. These networks, along with various other complex networks, obey power-law degree distribution shown by the following equation,where is known as the power-law exponent. These networks are called scale-free networks The PPI networks of E. coli, D. melanogaster, C. elegans and H. Pylori were shown to be scale-free. Barabasi and Albert provided a method to form a scale-free network with the following steps [4]:
- 1.
- Growth: A new node is added to the network at each discrete time t.
- 2.
- Preferential Attachment: A new node u is attached to any node v in the network with a probability proportional to the degree of v which means higher degree nodes tend to have more neighbors at each attachment.
- Hierarchical Networks: Study of biological networks show that the clustering coefficients of nodes are inversely proportional to their degrees. This unexpected result means lower degree nodes in these networks have higher clustering coefficents than the hubs. A hierarchical network model of a biological network captures all observed properties such as small-world and scale-free with an additional property that is exhibited by dense clusters of low-degree nodes connected by high-degree hubs.
5. Cluster Discovery in Biological Networks
5.1. Hierarchical Clustering
5.2. Density-based Clustering
5.3. Flow-Based Clustering
5.4. Spectral Clustering
6. Network Motifs
6.1. Motif Discovery
- 1.
- Detection of in G may be performed by exact counting that involves enumeration of all subgraphs of order k. This method evidently has a high time complexity, alternatively, sampling based methods which work in a representative sample of the graph may provide approximate solutions.
- 2.
- Isomorphic classes of the discovered motifs should be determined since various motifs may be isomorphic to each other.
- 3.
- Statistical significance of the discovered motifs in G should be determined. Commonly, a similar structured set H of random graphs are generated and motifs are searched in these graphs. If motifs found in G are statistically higher in number than the ones found in the graphs of set H, we can conclude that they do represent some biological function in the network represented by G.
6.2. Background
- P-value: This parameter is calculated by finding the number of elements of the randomly generated set that have more frequency of motif m than in the target graph G. A motif m is considered a significant motif if P-value of m, given below, is less than 0.01.where is 1 if the occurrence of motif m in the random network is higher and 0 if lower than found in the target graph G.
- Z-score: The Z-score of a motif m, , in a graph G is evaluated by the following formula:where is the number of discovered motifs m in G, and and are the mean and variance frequencies of m in a set of random networks. A motif m is significant if [27].
- Motif significance profile: The motif significance profile vector SP is structured with elements as Z-scores of motifs and normalized to unity as below. Various graphs may then be compared for any common motifs contained in them.
6.3. Review of Motif Searching Algorithms
6.3.1. Network Centric Search Algorithms
6.3.2. Motif Centric Search Algorithms
6.3.3. Parallel Motif Search Algorithms
7. Network Alignment
7.1. Background
- 1.
- Form the similarity matrix R with entry showing the similarity score of the nodes and in input networks and respectively.
- 2.
- Implement a weighted matching algorithm to asses the similarity of the networks and .
7.2. Alignment Quality
7.3. Review of Network Alignment Algorithms
8. Discussion
References
- Aladag AE, Erten C. (2013) SPINAL: scalable protein interaction network alignment. Bioinformatics 2013, 29, 917–924. [CrossRef]
- Altschul S.; Gish W.; Miller W.; Myers E.; Lipman D. (1990) Basic local alignment search tool. J Mol Biol 1990, 215, 403–410. [CrossRef]
- Bader G.D.; Hogue C.W.V. An automated method for finding molecular complexes in0 large protein interaction networks. BMC Bioinform 2003, 4, 1–27. [CrossRef]
- Albert, R.; Barabasi, A. The statistical mechanics of complex networks. Rev Mod Phys, 2002, 74, 47–97. [Google Scholar] [CrossRef]
- Batagelj V.; Zaversnik M. An O(m) algorithm for cores decomposition of networks. CoRR (Computing Research Repository) arXiv:0310049.
- Brohee, S.; van Helden, J. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform 2006, 7, 1–19. [Google Scholar] [CrossRef]
- Bron, C.; Kerbosch, J. Algorithm 457: finding all cliques of an undirected graph. Commun ACM 1973, 16, 575–577. [Google Scholar] [CrossRef]
- Bustamam, A.; Sehgal, M.S.; Hamilton, N.; Wong, S.; Ragan, M.A.; Burrage, K. An efficient parallel implementation of Markov clustering algorithm for large-scale protein-protein interaction networks that uses MPI. In Proceedings of the fifth IMT-GT international conference mathematics, statistics, and their applications (ICMSA), Sumatra Barat, Indonesia, 09 06 2009,pp 94-101.
- Bustamam A, Burrage K; Hamilton N.A. (2012) Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format. IEEE/ACM Trans Comp Biol Bioinform 2012, 9, 679–691. [CrossRef] [PubMed]
- Pablo Carbonell P.; Anne-Galle Planson A-G.; Davide Fichera D.; Jean-Loup Faulon J-P. A retrosynthetic biology approach to metabolic pathway design for therapeutic production. BMC Syst Biol 2011, 5, 1–18. [CrossRef]
- Chen W-Y.; Song Y.; Bai H., Lin C-J.; Chang E.Y. Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 2010, 33, 568–586. [CrossRef]
- Costanzo M.C.; Crawford M.E.; Hirschman J.E.; Kranz J.E.; Olsen P.; Robertson L.S.; Skrzypek M.S.; Braun B.R.; Hopkins K.L.; Kondu P.; Lengieza C.; Lew-Smith J.E.; Tillberg M., Garrels J.I. Ypd(tm), pombepd(tm), and wormpd(tm): model organism volumes of the bioknowledge(tm) library, an integrated resource for protein information. Nucleic Acids Res, 2001, 29, 75–79. [PubMed]
- Dongen, S.V. Graph clustering by flow simulation. PhD Thesis, University of Utrecht, The Netherlands, 2000. [Google Scholar]
- El-Kebir M, Heringa J,KlauGW(2011) Lagrangian relaxation applied to sparse global network alignment. Proceedings of 6th IAPR international conference on pattern recognition in bioinformatics (PRIB’11), Delft, The Netherlands, 02 11 2011, 225-236.
- Erciyes, K. Distributed and Sequential Algorithms for Bioinformatics, Springer Computational Biology Series, Switzerland, chapters 10, 11, 12, 13, 2013.
- Erciyes, K. Algebraic Graph Algorithms, A Practical Approach Using Python. Springer Undergraduate Topics in Computer Science Series, Switzerland, 2021.
- Fiedler, M. Laplacian of graphs and algebraic connectivity. Comb Graph Theory 1989, 25, 57–70. [Google Scholar] [CrossRef]
- Gehweiler, J.; Meyerhenke, H. A distributed diffusive heuristic for clustering a virtual P2P supercomputer. In Proceedings of the 7th high-performance grid computing workshop (HGCW10) in conjunction with 24th international parallel and distributed processing symposium (IPDPS10), Atlanta, USA, 19 04 2010.
- Girvan M, Newman M.E.J. Community structure in social and biological networks. Proc Natl Acad Sci USA 2002, 99, 7821–7826. [CrossRef] [PubMed]
- Grochow, J.; Kellis, M. Network motif discovery using subgraph enumeration and symmetry-breaking. Proceedings of 11th annual international conference research in computational molecular biology (RECOMB’07), Oakland, USA, 21 04 2007, 92-106.
- Han J-D.J.; Bertin N., Hao T.; Goldberg D.S.; Berriz G.F.; Zhang L.V.; Dupuy D.; Walhout A.J.M.; Cusick M.E.; Roth F.P.; Vidal M. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 2004, 430, 88–93. [CrossRef]
- He, Y.; Chen, Z.; Evans, A. Structural insights into aberrant topological patterns of largescale cortical networks in Alzheimers disease. J Neurosci 2008, 28, 4756–4766. [Google Scholar] [CrossRef]
- Jeong H.; Mason S.P.; Barabási A-L.; Oltvai Z.N. Lethality and centrality in protein networks. Nature 2011, 411, 41–42. [CrossRef]
- Hoepman, J.H. Simple distributed weighted matchings. 2004, arXiv:cs/0410047v1.
- Jaber, K.; Rashid, N.A.; Abdullah, R. The parallel maximal cliques algorithm for protein sequence clustering. Am J Appl Sci 2009, 6, 1368–1372. [Google Scholar] [CrossRef]
- Junker, B. Analysis of biological networks. Chap. 9. Wiley, Chapter 9, 2008.
- Kashtan N; Itzkovitz S; Milo R; Alon U. Mfinder tool guide. Technical report, Department of Molecular Cell Biology and Computer Science and Applied Mathematics, Weizman Institute of Science, 2002.
- Kashtan, N.; Itzkovitz, S.; Milo, R.; Alon, U. Efficient sampling algorithm for estimating sub-graph concentrations and detecting network motifs. Bioinformatics 2004, 20, 1746–1758. [Google Scholar] [CrossRef]
- Kashani, Z.R.; Ahrabian, H.; Elahi, E.; Nowzari-Dalini, A.; Ansari, E.S.; Asadi, S.; Mohammadi, S.; Schreiber, F.; Masoudi-Nejad, A. Kavosh: a new algorithm for finding network motifs. BMC Bioinform 2009, 10, 1–12. [Google Scholar] [CrossRef]
- Kelley B.P.; Sharan R.; Karp R.M.; Sittler T.; Root D.E.; Stockwell B.R.; Ideker T. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc PNAS 2003, 100, 11394–11399. [CrossRef]
- Klau, G.W. A new graph-based method for pairwise global network alignment. BMC Bioinform 2009, 10, 1–9. [Google Scholar] [CrossRef]
- Koyuturk, M.; Kim, Y.; Topkara, U.; Subramaniam, S.; Szpankowski, W.; Grama, A. Pairwise alignment of protein interaction networks. J Comput Biol 2006, 13, 182–199. [Google Scholar] [CrossRef] [PubMed]
- Kuchaiev, O.; Milenkovic, T.; Memisevic, V.; Hayes, W.; Przulj, N. Topological network alignment uncovers biological function and phylogeny. J Royal Soc Interface 2010, 7, 1341–1354. [Google Scholar] [CrossRef]
- Manne F, Bisseling RH, A parallel approximation algorithm for the weighted maximum matching problem. In: Wyrzykowski R, Karczewski K, Dongarra J, Wasniewski J (eds). Proceedings of seventh international conference on parallel processing and applied mathematics (PPAM 2007), Lecture notes in computer science, Gdansk, Poland, 09 09 2007, 708-717.
- S. Maskey and Y. -R. Cho, "Survey of biological network alignment: cross-species analysis of conserved systems,". 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, USA, 18 11 2019, 2090-2096.
- Mason, O.; Verwoerd, M. Graph theory and networks in biology. IET Syst Biol 2007, 1(2), 89–119. [Google Scholar] [CrossRef]
- Mfinder. Available online: http://www.weizmann.ac.il/mcb/UriAlon/index.
- Milo, R.; Shen-Orr, S.; Itzkovitz, S.; Kashtan, N.; Chklovskii, D.; Alon, U. Network motifs: simple building blocks of complex networks. Science 2004, 298, 824–827. [Google Scholar] [CrossRef] [PubMed]
- Mohseni-Zadeh, S.; Brezelec, P.; Risler, J.L. Cluster-C, an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques. Comput Biol Chem 2004, 28, 211–218. [Google Scholar] [CrossRef]
- Montresor, A.; Pellegrini, F.D.; Miorandi, D. Distributed k-Core decomposition. IEEE Trans Parallel Distrib Syst 2013, 24(2), 288–300. [Google Scholar] [CrossRef]
- Murtagh, F. Clustering in massive data sets. In Handbook of massive data sets; 2002; pp. 501–543. [Google Scholar]
- Newman MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 2004, 69, 66133. [CrossRef] [PubMed]
- Newman, M.E.J.; Girvan, M. Finding and evaluating community structure in networks. Phys Rev E 2004, 69, 026113. [Google Scholar] [CrossRef]
- Olman, V.; Mao, F.; Wu, H.; Xu, Y. Parallel clustering algorithm for large data sets with applications in bioinformatics. IEEE/ACM Trans Comput Biol Bioinform 2009, 6, 344–352. [Google Scholar] [CrossRef]
- Omidi, S.; Schreiber, F.; Masoudi-Nejad, A. MODA: an efficient algorithm for network motif discovery in biological networks. Genes Genet Syst 2009, 84, 385–395. [Google Scholar] [CrossRef] [PubMed]
- Patra, S.; Mohapatra, A. ; Review of tools and algorithms for network motif discovery in biological networks. IET Systems Biology 2020, 14, 171–189. [Google Scholar] [CrossRef] [PubMed]
- Patro, R.; Kingsford, C. Global network alignment using multiscale spectral signatures. Bioinformatics 2012, 28(23), 3105–3114. [Google Scholar] [CrossRef] [PubMed]
- Preis, R. Linear time 2-approximation algorithm for maximum weighted matching in general graphs. C. Meinel, S. Tison (eds) STACS99 Proceeedings 16th annual conference theoretical aspects of computer science, Lecture notes in computer science, Trier, Germany, 04 04 1999, 259-269.
- Przulj N (2005) Graph theory analysis of protein-protein interactions. In Igor J, Dennis W (eds) A chapter in knowledge discovery in proteomics; CRC Press.
- Ribeiro, P. Efficient and scalable algorithms for network motifs discovery. Ph.D. Thesis, Doctoral Programme in Computer Science. Faculty of Science of the University of Porto, 2009. [Google Scholar]
- Ribeiro P, Silva F, Lopes L (2010) A parallel algorithm for counting subgraphs in complex networks. 3rd international conference on biomedical engineering systems and technologies, Valencia, Spain, 20 01 2010, 380-393.
- Ribeiro, P.; Silva, F.; Lopes, L. Parallel discovery of network motifs. J Parallel Distrib Comput 2012, 72, 144–154. [Google Scholar] [CrossRef]
- Riedy J, Bader DA, Meyerhenke H (2012) Scalable multi-threaded community detection in social networks. Proceedings of IEEE 26th international parallel and distributed processing symposium workshops and PhD forum (IPDPSW), IEEE, Shanghai, China, 21 05 2012, 1619-1628.
- Ruzgar, E.; Erciyes, K.; Dalkilic, M.E. Parallelization of network motif discovery using star contraction. Parallel Computing 2021, 101, 102734. [Google Scholar] [CrossRef]
- Saribatir MB, Erciyes K, A Parallel Network Alignment Algorithm for Biological Networks. IEEE 3rd International Informatics and Software Engineering Conference (IISEC), Ankara, Turkey, 15 12 2022.
- Sathe, M.; Schenk, O.; Burkhart, H. An auction-based weighted matching implementation on massively parallel architectures. Parallel Comput 2012, 38, 595–614. [Google Scholar] [CrossRef]
- Shen-Orr, S.S.; Milo, R.; Mangan, S.; Alon, U. Network motifs in the transcriptional regulation network of Escherichia Coli. Nat Gen 2002, 31, 64–68. [Google Scholar] [CrossRef]
- Schatz, M.; Cooper-Balis, E.; Bazinet, A. Parallel network motif finding. Techinical report, University of Maryland Insitute for Advanced Computer Studies, 2008.
- Schmidt M.C.; Samatova N.F.; Thomas K.; Park B-H. A scalable, parallel algorithm for maximal clique enumeration. J Parallel Distrib Comput 2009, 69, 417–428. [CrossRef]
- Schuster, S.; Fell, D.A.; Dandekar, T. A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat Biotechnol 2000, 18, 326–332. [Google Scholar] [CrossRef]
- Singh, R.; Xu, J.; Berger, B. Pairwise global alignment of protein interaction networks by matching neighborhood topology. Research in computational molecular biology, Springer, 16-31.
- Sporns, O. Networks of the Brain; MIT Press: USA, 2016. [Google Scholar]
- Titz, B.; Rajagopala, S.V.; Goll, J.; Hauser, R.; McKevitt, M.T.; Palzkill, T.; Uetz, P. The binary protein interactome of Treponema pallidum, the syphilis spirochete. PLoS one 2008, 3(5), e2292. [Google Scholar] [CrossRef]
- Vidal M.; Cusick M.E.; Barabasi A.L. Interactome networks and human disease. Cell 2011, 144, 986–998. [CrossRef]
- Vlasblom, J.; Wodak, S.J. Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinform 2009, 10, 1–14. [Google Scholar] [CrossRef]
- Vogelstein, B.; Lane, D.; Levine, A. Surfing the p53 network. Nature 2000, 408, 307–310. [Google Scholar] [CrossRef]
- Wang, T.; Touchman, J.W.; Zhang, W.; Suh, E.B.; Xue, G. A parallel algorithm for extracting transcription regulatory network motifs. In Proceedings of the IEEE international symposium on bioinformatics and bioengineering, IEEE Computer Society Press, Minneapolis, USA, 09 10 2005, 193-200.
- Wernicke, S. Wernicke S. Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 2006, 3, 347–359. [Google Scholar] [CrossRef]
- Wernicke, S.; Rasche, F. FANMOD: a tool for fast network motif detection. Bioinformatics 2006, 22, 1152–1153. [Google Scholar] [CrossRef]
- Williams, R.J.; Martinez, N.D. Simple rules yield complex food webs. Nature 2000, 404, 180–183. [Google Scholar] [CrossRef]
- Yang, Q.; Lonardi, S. A parallel edge-betweenness clustering tool for protein-protein interaction networks. Int J Data Min Bioinform (IJDMB) 2007, 1, 241–247. [Google Scholar] [CrossRef]






Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).