Submitted:
16 March 2026
Posted:
17 March 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Literature Review
2.1.1. Key Terms Definitions
2.1.2. Search String Definition
2.1.3. Definition of Inclusion/Exclusion Criteria
2.1.4. Search and Inclusion/Exclusion of Initial Studies
2.1.5. Expansion of the Initial Set of Studies
2.2. Topic Modeling
2.3. Cluster Analysis
2.4. Analysis of Development Trends
3. Results
- The core concepts and systems for information modeling appear to be represented by the first category of studies, labeled as “Topic 1”. Key terms such as “model”, “information”, “system”, “semantic”, “graph” and “knowledge” point to the foundational elements of representing and managing structured data. The inclusion of key terms like “source”, “web” and “tabular” places these concepts within the context of web and tabular data sources. The appearance of the key term “energy” is particularly notable, since it indicates that a significant piece of these studies applies these concepts to a specific domain, namely, the energy sector. In essence, this topic captures the foundational aspects of semantic enrichment.
- The second category of studies, labeled as “Topic 2”, is clearly focused on domain-specific applications and value generation. The inclusion of key terms like “user”, “city” and “value” indicates a strong emphasis on the practical utility and the end-user of these technologies, often in contexts like smart cities. The combination of key terms like “linked”, “ontology” and “domain” suggests that this topic encompasses research that uses Linked Data principles and domain-specific ontologies to create tangible value for users. Thus, this topic highlights the applied aspect of semantic enrichment and shows its direct impact in specific areas.
- The technical processes and tasks of tabular data annotation appear to be represented by the third category of studies, labeled as “Topic 3”. The key terms “entity”, “type”, “annotation”, “matching”, “task”, “table” and “tabular” form a cohesive and highly-specialized body of literature that describes the “how-to” of the semantic enrichment process. In essence, this topic is about the specific and granular operations involved in linking entities and assigning types to data within data tables, often with the goal of mapping them to some knowledge “graph”.
- Cluster 1, which includes the key concepts “linked”, “value”, “city”, “open”, “existing”, “smart”, “used” and “available”, strongly points to the practical application and utilization of Open and Linked Data. The key terms “linked”, “open”, “smart”, and “city” suggest applications in the context of smart cities and linked open data ecosystems. The key terms “value”, “used” and “available” reinforce the idea of data exploitation and utility, connecting this conceptual structure to practical infrastructure, data publication, and its socio-economic impact.
- Cluster 2, which includes the key terms “information”, “datasets”, “label”, “energy”, “schema”, “learning” and “dataset”, appears to be focused on data preparation and machine learning for information and data systems. Key terms like “datasets”, “dataset”, “information” and “label” are central to the preparation of data for analysis. The key term “learning” is a clear indicator of the use of algorithms, while “schema” relates to the structure of the data. The presence of the key term “energy” suggests a notable application domain related to these concepts.
- Cluster 3, which includes the key terms “knowledge”, “graph”, “table”, “web”, “system”, “task”, “matching” and “technique” is a clear indicator of research on Knowledge Representation and Linking, especially concerning Knowledge Graphs. The key terms “knowledge”, “graph”, “web”, “system” and “table” (as a source for tabular data) are central. The key terms “matching” and “technique” point to the specific methods for constructing or utilizing these graphs, standing this as a foundational pillar of semantic enrichment and the Semantic Web.
- Cluster 4, which includes the key terms “semantic”, “source”, “model”, “ontology”, “domain”, “lake”, “user”, “structured”, “indicator” and “set”, represents the conceptual core of semantic enrichment and data modeling. The key terms “semantic”, “model”, “ontology”, “domain” and “structured” are fundamental concepts. The terms “source” and “lake” likely refer to data sources or data lakes where enrichment is applied, while the key terms “user” and “indicator” suggest the purpose of this enrichment is to generate value for end-users or to create meaningful indicators.
- Cluster 5, which includes the key terms “tabular”, “annotation”, “column”, “entity and “type” is a highly specific cluster focused on the nature of tabular data and its annotation. These key terms precisely refer to the technical process of adding semantic information to tabular data by identifying data types and linking values to known entities. This conceptual structure is very specific to the core techniques of tabular data enrichment.
4. Discussion
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Uren, V.; Buckingham Shum, S.; Bachler, M.; Li, G.; Domingue, J. Semantic annotation for knowledge management: Requirements and a survey of the state of the art. J Web Semant. 2006, 4, 14–28. [Google Scholar] [CrossRef]
- Liu, J.; Chabot, Y.; Troncy, R.; Huynh, V.-P.; Labbé, T.; Monnin, P. From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods. J Web Semant. 2023, 76, 100761. [Google Scholar] [CrossRef]
- Chen, C.; Song, M. Visualizing a field of research: A methodology of systematic scientometric reviews. PLoS One 2019, 14, e0223994. [Google Scholar] [CrossRef] [PubMed]
- Gusenbauer, M.; Haddaway, N.R. Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res Synth Methods 2020, 11, 181–217. [Google Scholar] [CrossRef] [PubMed]
- Nakagawa, S.; Koricheva, J.; Macleod, M.; Viechtbauer, W. Introducing our series: research synthesis and meta-research in biology. BMC Biol. 2020, 18, 20. [Google Scholar] [CrossRef] [PubMed]
- Börner, K.; Chen, C.; Boyack, K.W. Visualizing knowledge domains. Annu Rev Inf Sci Technol. 2003, 37, 179–255. [Google Scholar] [CrossRef]
- Fortunato, S.; et al. Science of science. Science 2018, 359, eaao0185. [Google Scholar] [CrossRef] [PubMed]
- Kitchenham, B. Procedures for Performing Systematic Reviews. Joint Technical Report TR/SE-0401, 2004. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J Mach Learn Res. 2003, 3, 993–1022. [Google Scholar]
- Hoffman, M.D.; Blei, D.M.; Bach, F. Online learning for Latent Dirichlet Allocation. Proceedings of the 23rd International Conference on Neural Information Processing Systems - 2010, Volume 1, 856–864. [Google Scholar]
- Wu, J.; Orlandi, F.; AlSkaif, T.; O’Sullivan, D.; Dev, S. A semantic web approach to uplift decentralized household energy data. Sustain Energy Grids Netw. 2022, 32, 100891. [Google Scholar] [CrossRef]
- Knap, T. Towards Odalic, a Semantic Table Interpretation Tool in the ADEQUATE Project. 2017, pp. 26–37. Available online: http://ceur-ws.org/Vol-1946/#paper-04.
- Orlandi, F.; et al. Leveraging Knowledge Graphs of Movies and Their Content for Web-Scale Analysis. In Proceedings of the 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), 2018; pp. 609–616. [Google Scholar]
- An, J.; Kumar, S.; Lee, J.; Jeong, S.; Song, J. Synapse: Towards Linked Data for Smart Cities using a Semantic Annotation Framework. In Proceedings of the 2020 IEEE 6th World Forum on Internet of Things (WF-IoT), 2020; pp. 1–6. [Google Scholar]
- Wu, J.; Orlandi, F.; Lee, Y.H.; O’Sullivan, D.; Dev, S. Organizing Decentralized Energy Data Using Semantic Approach. In Proceedings of the 2021 Photonics & Electromagnetics Research Symposium (PIERS), 2021; pp. 2213–2216. [Google Scholar]
- Rümmele, N. Evaluating Approaches for Supervised Semantic Labeling. 2018, pp. 4–13. Available online: https://ceur-ws.org/Vol-2073/article-04.pdf.
- Chen, Z.; Jia, H.; Heflin, J.; Davison, B.D. Generating Schema Labels through Dataset Content Analysis. In Companion Proceedings of the The Web Conference 2018, Republic and Canton of Geneva, CHE, 2018; International World Wide Web Conferences Steering Committee; pp. 1515–1522. [Google Scholar]
- Ramnandan, S.K.; Mittal, A.; Knoblock, C.A.; Szekely, P. Assigning Semantic Labels to Data Sources. In The Semantic Web. Latest Advances and New Domains; Springer International Publishing: Cham, 2015; pp. 403–417. [Google Scholar]
- Knap, T. Increasing Quality of Austrian Open Data by Linking Them to Linked Data Sources: Lessons Learned. In The Semantic Web; Springer International Publishing: Cham, 2016; pp. 243–254. [Google Scholar]
- Neumaier, S.; Umbrich, J.; Parreira, J.X.; Polleres, A. Multi-level Semantic Labelling of Numerical Values. In The Semantic Web – ISWC 2016; Springer International Publishing: Cham, 2016; pp. 428–445. [Google Scholar]
- Azzi, R.; Diallo, G. AMALGAM: A Matching Approach to Fairfy TabuLar Data with KnowledGe GrAph Model. In Trends and Applications in Information Systems and Technologies; Springer International Publishing: Cham, 2021; pp. 76–86. [Google Scholar]
- Özcan, F.; Lei, C.; Quamar, A.; Efthymiou, V. Semantic enrichment of data for AI applications. In Proceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning; Association for Computing Machinery: New York, NY, USA, 2021; pp. 1–7. [Google Scholar]
- Ciavotta, M.; Cutrona, V.; De Paoli, F.; Nikolov, N.; Palmonari, M.; Roman, D. Supporting Semantic Data Enrichment at Scale. In Technologies and Applications for Big Data Value; Springer International Publishing: Cham, 2022; pp. 19–39. [Google Scholar]
- Bischof, S.; Harth, A.; Kämpgen, B.; Polleres, A.; Schneider, P. Enriching integrated statistical open city data by combining equational knowledge and missing value imputation. J Web Semant. 2018, 48, 22–47. [Google Scholar] [CrossRef]
- Wu, J.; Orlandi, F.; O’Sullivan, D.; Pisoni, E.; Dev, S. Boosting Climate Analysis With Semantically Uplifted Knowledge Graphs. IEEE J Sel Top Appl Earth Obs Remote Sens. 2022, 15, 4708–4718. [Google Scholar] [CrossRef]
- Taheriyan, M.; Knoblock, C.A.; Szekely, P.; Ambite, J.L. Learning the semantics of structured data sources. J Web Semant. 2016, 37-38, 152–169. [Google Scholar]
- Nguyen, P.; Kertkeidkachorn, N.; Ichise, R.; Takeda, H. MTab: Matching Tabular Data to Knowledge Graph using Probability Models. 2019, pp. 7–14. Available online: https://ceur-ws.org/Vol-2553/paper2.pdf.
- Oliveira, D. ADOG - Annotating Data with Ontologies and Graphs. 2019, pp. 1–6. Available online: https://ceur-ws.org/Vol-2553/paper1.pdf.
- Thawani, A.; Hu, M.; Hu, E. Entity Linking to Knowledge Graphs to Infer Column Types and Properties; 2019; pp. 22–25. [Google Scholar]
- Shigapov, R.; Zumstein, P.; Kamlah, J.; Oberlander, L.; Mechnich, J.; Schumm, I. bbw: Matching CSV to Wikidata via Meta-lookup. 2020, pp. 17–26. Available online: https://ceur-ws.org/Vol-2775/paper2.pdf.
- Cremaschi, M.; Avogadro, R.; Barazzetti, A. MantisTable SE: an Efficient Approach for the Semantic Table Interpretation. 2020, pp. 75–85. Available online: https://ceur-ws.org/Vol-2775/paper8.pdf.
- Vu, B.; Knoblock, C.; Pujara, J. Learning Semantic Models of Data Sources Using Probabilistic Graphical Models. In Proceedings of the The World Wide Web Conference, 2019; Association for Computing Machinery: New York, NY, USA; pp. 1944–1953. [Google Scholar]
- Steenwinckel, B.; Vandewiele, G.; Turck, F.D.; Ongenae, F. CSV2KG: Transforming Tabular Data into Semantic Knowledge. 2019, pp. 33–40. Available online: https://ceur-ws.org/Vol-2553/paper5.pdf.
- Morikawa, H. Semantic Table Interpretation using LOD4ALL. 2019, pp. 49–56. Available online: https://ceur-ws.org/Vol-2553/paper7.pdf.
- Alobaid, A.; Kacprzak, E.; Corcho, O. Typology-based semantic labeling of numeric tabular data. Semantic Web 2020, 12, 5–20. [Google Scholar] [CrossRef]
- Nguyen, P.; Nguyen, K.; Ichise, R.; Takeda, H. EmbNum+: Effective, Efficient, and Robust Semantic Labeling for Numerical Values. New Gener Comput. 2019, 37, 393–427. [Google Scholar] [CrossRef]
- Pomp, A.; Paulus, A.; Kirmse, A.; Kraus, V.; Meisen, T. Applying Semantics to Reduce the Time to Analytics within Complex Heterogeneous Infrastructures. Technologies 2018, 6. [Google Scholar] [CrossRef]
- Nikolov, N.; Ciavotta, M.; De Paoli, F. Data wrangling at scale: the experience of EW-shopp. In Proceedings of the 12th European Conference on Software Architecture: Companion Proceedings, 2018; Association for Computing Machinery: New York, NY, USA; pp. 1–4. [Google Scholar]
- Huynh, V.-P.; et al. DAGOBAH: Table and Graph Contexts for Efficient Semantic Annotation of Tabular Data. CEUR Workshop Proceedings, En ligne, Unknown Region, 2021; p. 2. Available online: https://hal.science/hal-04170864.
- Bonfitto, S.; Perlasca, P.; Mesiti, M. Easy-to-use interfaces for supporting the semantic annotation of web tables. 2023, pp. 1–10. Available online: https://ceur-ws.org/Vol-3379/DataPlat_2023_601.pdf.
- Alam, M.; et al. Tab2KG: Semantic table interpretation with lightweight semantic profiles. Semant web 2022, 13, 571–597. [Google Scholar]
- Chabot, Y.; Labbé, T.; Liu, J.; Troncy, R. Dagobah: An End-To-End Context-Free Tabular Data Semantic Annotation System. In Proceedings of the SemTab@ISWC, 2019; pp. 41–48. Available online: https://ceur-ws.org/Vol-2553/paper6.pdf.
- Takeoka, K.; Oyamada, M.; Nakadai, S.; Okadome, T. Meimei: an efficient probabilistic approach for semantically annotating tables. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, 2019; AAAI Press: Honolulu, Hawaii, USA; pp. 281–288. [Google Scholar]
- Chen, J.; Jiménez-Ruiz, E.; Horrocks, I.; Sutton, C. Learning semantic annotations for tabular data. In Proceedings of the 28th International Joint Conference on Artificial Intelligence; AAAI Press: Macao, China, 2019; pp. 2088–2094. [Google Scholar]
- Chen, S.; et al. LinkingPark: An Integrated Approach for Semantic Table Interpretation. 2020, pp. 65–74. Available online: https://ceur-ws.org/Vol-2775/paper7.pdf.
- Huynh, V.-P.; Liu, J.; Chabot, Y.; Labbe, T.; Monnin, P.; Troncy, R. DAGOBAH: Enhanced Scoring Algorithms for Scalable Annotations of Tabular Data. 2020, pp. 27–39. Available online: https://ceur-ws.org/Vol-2775/paper3.pdf.
- Nguyen, P.; Yamada, I.; Kertkeidkachorn, N.; Ichise, R.; Takeda, H. MTab4Wikidata at SemTab 2020: Tabular Data Annotation with Wikidata. 2020, pp. 86–95. Available online: https://ceur-ws.org/Vol-2775/paper9.pdf.
- Bonfitto, S.; Cappelletti, L.; Trovato, F.; Valentini, G.; Mesiti, M. Semi-automatic Column Type Inference for CSV Table Understanding. In SOFSEM 2021: Theory and Practice of Computer Science; Springer-Verlag: Berlin, Heidelberg, 2021; pp. 535–549. [Google Scholar]
- Gottschalk, S.; Tempelmeier, N.; Kniesel, G.; Iosifidis, V.; Fetahu, B.; Demidova, E. Simple-ML: Towards a Framework for Semantic Data Analytics Workflows. In Semantic Systems. The Power of AI and Knowledge Graphs; Springer International Publishing: Cham, 2019; pp. 359–366. [Google Scholar]
- Taheriyan, M.; Knoblock, C.A.; Szekely, P.; Ambite, J.L. Leveraging Linked Data to Discover Semantic Relations Within Data Sources. In The Semantic Web – ISWC 2016; Springer-Verlag: Berlin, Heidelberg, 2016; pp. 549–565. [Google Scholar]
- Cutrona, V.; Bianchi, F.; Jiménez-Ruiz, E.; Palmonari, M. Tough Tables: Carefully Evaluating Entity Linking for Tabular Data. In The Semantic Web – ISWC 2020; Springer-Verlag: Berlin, Heidelberg, 2020; pp. 328–343. [Google Scholar]
- Bischof, S.; Martin, C.; Polleres, A.; Schneider, P. Collecting, Integrating, Enriching and Republishing Open City Data as Linked Data. In The Semantic Web - ISWC 2015; Springer International Publishing: Cham, 2015; pp. 57–75. [Google Scholar]
- Bianchini, D.; De Antonellis, V.; Garda, M. A semantics-enabled approach for personalised Data Lake exploration. Knowl Inf Syst. 2024, 66, 1469–1502. [Google Scholar] [CrossRef]
- Mami, M.N.; Graux, D.; Scerri, S.; Jabeen, H.; Auer, S.; Lehmann, J. Squerall: Virtual Ontology-Based Access to Heterogeneous and Large Data Sources. In The Semantic Web – ISWC 2019; Ghidini, C., Hartig, O., Maleshkova, M., Svátek, V., Cruz, I., Hogan, A., et al., Eds.; Springer International Publishing: Cham, 2019; pp. 229–245. [Google Scholar]
- Diamantini, C.; Lo Giudice, P.; Potena, D.; Storti, E.; Ursino, D. An Approach to Extracting Topic-guided Views from the Sources of a Data Lake. Inf Syst Front. 2021, 23, 243–262. [Google Scholar] [CrossRef]
- Pingos, M.; Andreou, A.S. A Data Lake Metadata Enrichment Mechanism via Semantic Blueprints. In Proceedings of the International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE), 2022. [Google Scholar]
- Bagozi, A.; Bianchini, D.; De Antonellis, V.; Garda, M.; Melchiori, M. Personalised Exploration Graphs on Semantic Data Lakes. In On the Move to Meaningful Internet Systems: OTM 2019 Conferences; Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R., Eds.; Springer International Publishing: Cham, 2019; pp. 22–39. [Google Scholar]
- Sarramia, D.; Claude, A.; Ogereau, F.; Mezhoud, J.; Mailhot, G. CEBA: A Data Lake for Data Sharing and Environmental Monitoring. Sensors (Basel) 2022, 22, 2733. [Google Scholar] [CrossRef] [PubMed]





| Type of Criteria | Revision Objective | Focus of the Review |
| Inclusion | Semantic Enrichment of Tabular Data | Transformation of Flat Tabular Data into Semantically Enriched Data. |
| Exclusion | Publication of Linked Data on the Web | • Creation of Domain Ontologies. • Construction of Knowledge Graphs. • Linking Data on the Web. |
| Topic | Number of studies | Belonging studies |
| Topic 1 | 12 | [11,17,22,23,27,40], [43,45,48,55,56,58] |
| Topic 2 | 21 | [14,15,16], [18,19,20,21,24,36], [37,38,41,44,46,47], [50,51,52,53,54,57] |
| Topic 3 | 15 | [12,13,25], [26,28,29,30,31,32], [33,34,35,39,42,49], |
| Topic | Trend (slope coefficient, x1) | p-value (unadjusted) | Durbin-Watson |
| Topic 1 | 0.049 | 0.066 | 1.850 |
| Topic 2 | -0.022 | 0.466 | 2.158 |
| Topic 3 | -0.027 | 0.301 | 1.872 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.