Article
Version 1
This version is not peer-reviewed
Massive RDF Query Processing Efficiently in Spark Environment Based on Semantic Connection Set
Version 1
: Received: 23 June 2019 / Approved: 24 June 2019 / Online: 24 June 2019 (09:15:24 CEST)
How to cite: Xu, J.; Zhang, C. Massive RDF Query Processing Efficiently in Spark Environment Based on Semantic Connection Set. Preprints 2019, 2019060238 Xu, J.; Zhang, C. Massive RDF Query Processing Efficiently in Spark Environment Based on Semantic Connection Set. Preprints 2019, 2019060238
Abstract
Resource Description Framework(RDF) is a data representation format of the Semantic Web, and its data volume is growing rapidly. Cloud-based systems provide a rich platform for managing RDF data. However, the distributed environment has performance challenges when it is processing with RDF queries that contain multiple join operations, such as network reshuffle, memory overhead. To get over these challenges, this paper proposed a spark-based RDF query architecture, which is based on Semantic Connection Set (SCS). First of all, this spark-based query architecture adopts the mechanism of re-partitioning class data based on vertical partitioning, which can reduce memory overhead and fast index data. Secondly, a method for generating query plans based on semantic connection sets is proposed in this paper. In addition, statistics and broadcast variable optimization strategies are used to reduce shuffling and data communication costs. The experiment of this paper is based on the latest SPARQLGX on the spark platform RDF system, two synthetic benchmarks are used to evaluate the query. The experiment result illustrates that the proposed approach in this paper is more efficient in data search than SPARQLGX.
Keywords
RDF; semantic web; basic graph pattern; Distributed SPARQL Query Processing; Spark
Subject
MATHEMATICS & COMPUTER SCIENCE, Information Technology & Data Management
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)