Version 1
: Received: 28 November 2023 / Approved: 29 November 2023 / Online: 4 December 2023 (16:20:05 CET)
How to cite:
S, S. Natural Language Processing-Based Querying Heterogeneous Data Sources Using Integrated Ontology. Preprints2023, 2023120228. https://doi.org/10.20944/preprints202312.0228.v1
S, S. Natural Language Processing-Based Querying Heterogeneous Data Sources Using Integrated Ontology. Preprints 2023, 2023120228. https://doi.org/10.20944/preprints202312.0228.v1
S, S. Natural Language Processing-Based Querying Heterogeneous Data Sources Using Integrated Ontology. Preprints2023, 2023120228. https://doi.org/10.20944/preprints202312.0228.v1
APA Style
S, S. (2023). Natural Language Processing-Based Querying Heterogeneous Data Sources Using Integrated Ontology. Preprints. https://doi.org/10.20944/preprints202312.0228.v1
Chicago/Turabian Style
S, S. 2023 "Natural Language Processing-Based Querying Heterogeneous Data Sources Using Integrated Ontology" Preprints. https://doi.org/10.20944/preprints202312.0228.v1
Abstract
It is essential to deal with data scattered among heterogeneous information sources, which can be structured, semi-structured, or unstructured, to give consumers a cohesive perspective of the data. Information gathering is challenging as a result, and one of the major causes of this is that data sources are developed to support specific applications. A method to simplify this procedure is to use ontology representation as an intermediate step. An ontology represents a knowledge structure that reasonably reflects the real world's complexity and is ideally used in many industries today. This research aims to enhance the functionality of a global ontology system by implementing and integrating a natural language query system. Leveraging NLP-based approaches, we constructed SPARQL queries for our ontologies to facilitate natural language translation into a well-structured query format. A dedicated query component was developed to transform natural language questions into SPARQL queries, with the selection of the optimal query graph to generate the final queries. In the subsequent phase, we explored the integration of distinct sentence encoders to improve latent sentence representations during query construction. The generated SPARQL queries were executed on the global ontology output and then translated into source-specific questions. This innovative approach enables unified access to heterogeneous data sources through a user-friendly natural language querying interface.
Keywords
Natural language processing, Ontology, Knowledge graph, Heterogeneous data sources
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.