Submitted:
19 April 2024
Posted:
19 April 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Proposed Graph Compression Scheme
3.1. Overall Processing Procedure
3.2. Graph Manager and Reference Pattern Generator
| Algorithm 1. Incremental frequent pattern mining | ||
| Input: graph G, batch B, dictionary PwyxwyxOutput: Updated Pattern Dictionary P | ||
| 1 2 3 4 5 6 7 8 9 10 … 11 12 13 14 15 16 |
for each pattern p in P: for each matched m of pattern p embedded in batch B: vertex_dict = {g(v): p(v) for p(v), g(v) in m} new_edges = edges in batch B incident on m but not in pattern p for each edge e in new_edges: g_src(v) = e.src g_tgt(v) = e.tgt p_src(v) = vertex_map[g_src(v)] p_tgt(v) = vertex_map[g_tgt(v)] extended_pattern = extend pattern p with edge e using p_src(v) and p_tgt(v) if extended_pattern not in P: add extended_pattern to P for each untagged edge e in B: if e not in P: add e as a single-edge pattern to P return P |
|
3.3. Managing Pattern Dictionary
3.4. Graph Compression Process
| Algorithm 2. Graph compression | ||
| Input: graph G, pattern dictionary POutput: Compressed graph | ||
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
compressed_graph = new Graph() marked_matched_patterns = set() for pattern p in P: matched_patterns = findMatchedPatternsOfPatternInGraph(p, G) for matched_pattern in matched_patterns: if matched_pattern not in marked_matched_patterns: new_vertex = createVertexForPattern(p) compressed_graph.addVertex(new_vertex) marked_matched_patterns.add(matched_pattern) connecting_vertices = getVerticesConnectedToMatchedPattern(matched_pattern, G) for vertex in connecting_vertices: compressed_graph.addEdge(new_vertex, vertex) for vertex in G.vertices: if vertex not in marked_matched_patterns: compressed_graph.addVertex(vertex) for edge in G.edges: if edge not connected to marked_matched_patterns: compressed_graph.addEdge(edge.source, edge.target) return compressed_graph |
|
4. Performance Evaluation

5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
Appendix A. Datasets and Libraries
- DBLP (http://snap.stanford.edu/data/com-DBLP.html): A computer science bibliography dataset provided by SNAP (Stanford Network Analysis Project).
- Youtube (http://snap.stanford.edu/data/com-Youtube.html): A social network of Youtube users, also provided by SNAP.
- Skitter (http://snap.stanford.edu/data/as-Skitter.html): An Internet topology graph collected from traceroutes run daily in 2005 by Skitter, provided by CAIDA (Center for Applied Internet Data Analysis).
- NBER (https://www.nber.org/research/data/us-patents): U.S. patent data provided by the National Bureau of Economic Research.
- LiveJournal (http://snap.stanford.edu/data/com-LiveJournal.html): A social network of LiveJournal users, provided by SNAP.
- igraph (https://igraph.org/): A library for creating and manipulating graphs, as well as analyzing networks.
- NetworkX (https://networkx.org/): A library for studying complex networks, providing tools for graph creation, manipulation, and analysis.
References
- Song, J.; Yi, Q.; Gao, H.; Wang, B.; Kong, X. Exploring Prior Knowledge from Human Mobility Patterns for POI Recommendation. Applied Sciences 2023, 13. [Google Scholar] [CrossRef]
- Kouahla, Z.; Benrazek, A.-E.; Ferrag, M.A.; Farou, B.; Seridi, H.; Kurulay, M.; Anjum, A.; Asheralieva, A. A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues. Future Internet 2022, 14. [Google Scholar] [CrossRef]
- Cook, D.J.; Holder, L.B. Substructure Discovery Using Minimum Description Length and Background Knowledge 1994.
- Wang, G.; Ai, J.; Mo, L.; Yi, X.; Wu, P.; Wu, X.; Kong, L. Anomaly Detection for Data from Unmanned Systems via Improved Graph Neural Networks with Attention Mechanism. Drones 2023, 7. [Google Scholar] [CrossRef]
- Henecka, W.; Roughan, M. Lossy Compression of Dynamic, Weighted Graphs. In Proceedings of the 2015 3rd International Conference on Future Internet of Things and Cloud; 2015; pp. 427–434. [Google Scholar]
- Shah, N.; Koutra, D.; Zou, T.; Gallagher, B.; Faloutsos, C. TimeCrunch: Interpretable Dynamic Graph Summarization. In Proceedings of the Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1055–1064. [Google Scholar]
- Li, Y.; Ge, M.; Li, M.; Li, T.; Xiang, S. CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval. Sensors 2023, 23. [Google Scholar] [CrossRef]
- Zhao, H.; Zhang, W.; Huang, M.; Feng, S.; Wu, Y. A Multi-Granularity Heterogeneous Graph for Extractive Text Summarization. Electronics (Basel) 2023, 12. [Google Scholar] [CrossRef]
- Park, Y.-J.; Lee, M.; Yang, G.-J.; Park, S.J.; Sohn, C.-B. Web Interface of NER and RE with BERT for Biomedical Text Mining. Applied Sciences 2023, 13. [Google Scholar] [CrossRef]
- Ray, A.; Holder, L.; Choudhury, S. Frequent Subgraph Discovery in Large Attributed Streaming Graphs. In Proceedings of the Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications; 36; Fan, W., Bifet, A., Yang, Q., Yu, P.S., Eds.; PMLR: New York, New York, USA, April ; Vol, 2014; pp. 166–181. [Google Scholar]
- Packer, C.A.; Holder, L.B. GraphZip: Dictionary-Based Compression for Mining Graph Streams 2017.
- Leung, C.K.; Khan, Q.I. DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams. In Proceedings of the Sixth International Conference on Data Mining (ICDM’06); 2006; pp. 928–932. [Google Scholar]
- Dolgorsuren, B.; Khan, K.U.; Rasel, M.K.; Lee, Y.-K. StarZIP: Streaming Graph Compression Technique for Data Archiving. IEEE Access 2019, 7, 38020–38034. [Google Scholar] [CrossRef]
- Giannella, C.; Han, J.; Pei, J.; Yan, X.; Yu, P. Mining Frequent Patterns in Data Streams at Multiple Time Granularities. 2003.
- Guo, J.; Zhang, P.; Tan, J.; Guo, L. Mining Frequent Patterns across Multiple Data Streams. In Proceedings of the Proceedings of the 20th ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2011; pp. 2325–2328. [Google Scholar]
- Zarrouk, M.; Gouider, M. Frequent Patterns Mining in Time-Sensitive Data Stream. International Journal of Computer Science Issues 2012, 9. [Google Scholar]
- Zhong, H.; Wang, M.; Zhang, X. Unsupervised Embedding Learning for Large-Scale Heterogeneous Networks Based on Metapath Graph Sampling. Entropy 2023, 25. [Google Scholar] [CrossRef]
- Maneth, S.; Peternek, F. Grammar-Based Graph Compression. Inf Syst 2018, 76, 19–45. [Google Scholar] [CrossRef]
- Dhulipala, L.; Kabiljo, I.; Karrer, B.; Ottaviano, G.; Pupyrev, S.; Shalita, A. Compressing Graphs and Indexes with Recursive Graph Bisection. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1535–1544. [Google Scholar]
- Lim, Y.; Kang, U.; Faloutsos, C. SlashBurn: Graph Compression and Mining beyond Caveman Communities. IEEE Trans Knowl Data Eng 2014, 26, 3077–3089. [Google Scholar] [CrossRef]
- Jalil, Z.; Nasir, M.; Alazab, M.; Nasir, J.; Amjad, T.; Alqammaz, A. Grapharizer: A Graph-Based Technique for Extractive Multi-Document Summarization. Electronics (Basel) 2023, 12. [Google Scholar] [CrossRef]
- Rossi, R.; Zhou, R. GraphZIP: A Clique-Based Sparse Graph Compression Method. J Big Data 2018, 5. [Google Scholar] [CrossRef]
- Cordella, L.P.; Foggia, P.; Sansone, C.; Vento, M. A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs. IEEE Trans Pattern Anal Mach Intell 2004, 26, 1367–1372. [Google Scholar] [CrossRef]
- Fournier-Viger, P.; Gan, W.; Wu, Y.; Nouioua, M.; Song, W.; Truong, T.; Duong, H. Pattern Mining: Current Challenges and Opportunities. In Proceedings of the Database Systems for Advanced Applications. DASFAA 2022 International Workshops: BDMS, BDQM, GDMA, IWBT, MAQTDS, and PMBD, Virtual Event, 2022, Proceedings; Springer, 2022, April 11–14; pp. 34–49.
- Shabani, N.; Beheshti, A.; Farhood, H.; Bower, M.; Garrett, M.; Alinejad-Rokny, H. A Rule-Based Approach for Mining Creative Thinking Patterns from Big Educational Data. AppliedMath 2023, 3, 243–267. [Google Scholar] [CrossRef]
- Jamshidi, K.; Mahadasa, R.; Vora, K. Peregrine: A Pattern-Aware Graph Mining System. In Proceedings of the Proceedings of the Fifteenth European Conference on Computer Systems; Association for Computing Machinery: New York, NY, USA; 2020. [Google Scholar]
- Ketkar, N.S.; Holder, L.B.; Cook, D.J. Subdue: Compression-Based Frequent Pattern Discovery in Graph Data. In Proceedings of the Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations; Association for Computing Machinery: New York, NY, USA, 2005; pp. 71–76. [Google Scholar]
- Elseidy, M.; Abdelhamid, E.; Skiadopoulos, S.; Kalnis, P. GraMi: Frequent Subgraph and Pattern Mining in a Single Large Graph. Proc. VLDB Endow. 2014, 7, 517–528. [Google Scholar] [CrossRef]
- Bok, K.; Han, J.; Lim, J.; Yoo, J. Provenance Compression Scheme Based on Graph Patterns for Large RDF Documents. J Supercomput 2019, 76, 6376–6398. [Google Scholar] [CrossRef]
- Bok, K.; Jeong, J.; Choi, D.; Yoo, J. Detecting Incremental Frequent Subgraph Patterns in IoT Environments. Sensors 2018, 18. [Google Scholar] [CrossRef]
- Bok, K.; Kim, G.; Lim, J.; Yoo, J. Historical Graph Management in Dynamic Environments. Electronics (Basel) 2020, 9. [Google Scholar] [CrossRef]
- Han, J.; Pei, J.; Yin, Y. Mining Frequent Patterns without Candidate Generation. In Proceedings of the Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data; Association for Computing Machinery: New York, NY, USA, 2000; pp. 1–12. [Google Scholar]
- Borgelt, C. An Implementation of the FP-Growth Algorithm. Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations 2010. [CrossRef]










| HW/SW | Description |
| CPU | Intel(R) Core (TM) i7-12700K |
| RAM | 32GB |
| Python | 3.8.10 |
| Igraph | 0.9.1 |
| networkX | 3.1 |
| Dataset | Cov. (%) |
Runtime (S) | Accuracy (%) | ||||
| Ours | GraphZip [11] | SUBDUE [27] | Ours | GraphZip [11] | SUBDUE [27] | ||
| 3-CLIQ | 20 | 58 | 52 | 68.3 | 100 | 100 | 89 |
| 50 | 4.2 | 3.8 | 23.7 | 100 | 100 | 90 | |
| 80 | 4.1 | 3.7 | 12.1 | 100 | 100 | 87 | |
| 4-PATH | 20 | 49 | 45.4 | 58.9 | 100 | 100 | 100 |
| 50 | 4.1 | 3 | 19.4 | 100 | 100 | 100 | |
| 80 | 3.3 | 2.9 | 11.7 | 100 | 100 | 100 | |
| 4-STAR | 20 | 54 | 50.1 | Time Over | 100 | 100 | - |
| 50 | 4.5 | 4.1 | Time Over | 100 | 99.8 | - | |
| 80 | 4.9 | 4.4 | Time Over | 100 | 100 | - | |
| 4-CLIQ | 20 | 81 | 68 | Time Over | 100 | 100 | - |
| 50 | 32 | 29.1 | Time Over | 100 | 99.5 | - | |
| 80 | 15.1 | 13.9 | 45.1 | 100 | 100 | 90 | |
| 5-PATH | 20 | 58.4 | 48.5 | Time Over | 100 | 99.2 | - |
| 50 | 6.1 | 4.5 | 21.4 | 100 | 99.6 | 99.8 | |
| 80 | 5.5 | 4.3 | 24.8 | 100 | 100 | 99.4 | |
| 8-TREE | 20 | 81.1 | 72.7 | Time Over | 100 | 99.2 | - |
| 50 | 12.4 | 10.4 | Time Over | 100 | 99.7 | - | |
| 80 | 13.1 | 11 | Time Over | 100 | 100 | - | |
| Dataset | Mark | # Vertices | # Edges | Description | Size (KB, disk) |
| DBLP | DB | 317,080 | 1,049,866 | Collaboration | 13,605 |
| Youtube | YT | 1,134,890 | 2,987,624 | Social | 37,814 |
| Skitter | SK | 1,696,415 | 11,095,298 | Internet | 145,612 |
| NBER | NB | 3,774,218 | 16,512,783 | Patent Citation | 257,887 |
| LiveJournal | LJ | 3,997,962 | 34,681,189 | Social | 478,799 |
| Batch size | Average number of patterns | Average processing time | Average compression ratio |
| 50 | 24.1 | 10.7 | 24.4 |
| 100 | 45.9 | 29.4 | 41.2 |
| 200 | 84.2 | 51.3 | 55.7 |
| 300 | 125.8 | 67.3 | 69.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).