Preprint Technical Note Version 1 Preserved in Portico This version is not peer-reviewed

Natural Language Processing-Based Querying Heterogeneous Data Sources Using Integrated Ontology

Version 1 : Received: 28 November 2023 / Approved: 29 November 2023 / Online: 4 December 2023 (16:20:05 CET)

How to cite: S, S. Natural Language Processing-Based Querying Heterogeneous Data Sources Using Integrated Ontology. Preprints 2023, 2023120228. https://doi.org/10.20944/preprints202312.0228.v1 S, S. Natural Language Processing-Based Querying Heterogeneous Data Sources Using Integrated Ontology. Preprints 2023, 2023120228. https://doi.org/10.20944/preprints202312.0228.v1

Abstract

It is essential to deal with data scattered among heterogeneous information sources, which can be structured, semi-structured, or unstructured, to give consumers a cohesive perspective of the data. Information gathering is challenging as a result, and one of the major causes of this is that data sources are developed to support specific applications. A method to simplify this procedure is to use ontology representation as an intermediate step. An ontology represents a knowledge structure that reasonably reflects the real world's complexity and is ideally used in many industries today. This research aims to enhance the functionality of a global ontology system by implementing and integrating a natural language query system. Leveraging NLP-based approaches, we constructed SPARQL queries for our ontologies to facilitate natural language translation into a well-structured query format. A dedicated query component was developed to transform natural language questions into SPARQL queries, with the selection of the optimal query graph to generate the final queries. In the subsequent phase, we explored the integration of distinct sentence encoders to improve latent sentence representations during query construction. The generated SPARQL queries were executed on the global ontology output and then translated into source-specific questions. This innovative approach enables unified access to heterogeneous data sources through a user-friendly natural language querying interface.

Keywords

Natural language processing, Ontology, Knowledge graph, Heterogeneous data sources

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.