Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations

Version 1 : Received: 22 September 2023 / Approved: 27 September 2023 / Online: 27 September 2023 (15:10:25 CEST)

A peer-reviewed article of this Preprint also exists.

Firmli, S.; Chiadmi, D. A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations. Data 2023, 8, 166. Firmli, S.; Chiadmi, D. A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations. Data 2023, 8, 166.

Abstract

The graph model enables a broad range of analysis, thus graph processing is an invaluable tool in data analytics. At the heart of every graph processing system lies a concurrent graph data structure that stores the graph. Such a data structure needs to be highly efficient for both graph algorithms and queries. Due to the continuous evolution, the sparsity, and the scale-free nature of real-world graphs, graph processing systems face the challenge of providing an appropriate graph data structure that enables both fast analytical workloads and low-memory fast graph mutations. Existing graph structures offer a hard tradeoff between read-only performance, update friendliness, and memory consumption upon updates. In this paper, we introduce CSR++, a new graph data structure that removes these tradeoffs and enables both fast read-only analytics and quick and memory-friendly mutations. CSR++ combines ideas from CSR, the fastest read-only data structure, and adjacency lists to achieve the best of both worlds. We compare CSR++ to CSR, adjacency lists from the Boost Graph Library, as well as state-of-the-art update-friendly graph structures: LLAMA, STINGER, GraphOne, and Teseo. In our evaluation, which is based on popular graph processing algorithms executed over real-world graphs, we show that CSR++ remains close to CSR in read-only concurrent performance (within 10% on average), while significantly outperforming CSR (by an order of magnitude) and LLAMA (by almost 2×) with frequent updates. We also show that both CSR++’s update throughput and analytics performance exceed that of several state-of-the-art graph structures, while maintaining low memory consumption when the workload includes updates.

Keywords

Data Structures; Concurrency; Graph Processing; Graph Mutations

Subject

Computer Science and Mathematics, Data Structures, Algorithms and Complexity

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.