Submitted:
12 September 2024
Posted:
13 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods
2.1. Literature Search
2.2. Terminology Mapping
3. Results
3.1. Data Quality Dimensions
3.2. Classification of Data Quality Dimensions
4. Discussion
4.1. Inherent Data Quality
4.1.1. Accuracy
4.1.2. Completeness
4.1.3. Consistency
4.1.4. Credibility
4.1.5. Currentness
4.2. Contextual Data Quality
4.2.1. Accessibility
4.2.2. Compliance
4.2.3. Confidentiality
4.2.4. Efficiency
4.2.5. Governance
4.2.6. Traceability
4.2.7. Precision
4.2.8. Understandability
4.2.9. Usefulness
4.3. System-Dependent Data Quality
4.3.1. Availability
4.3.2. Portability
4.3.3. Quantity
4.3.4. Recoverability
4.3.5. Semantics
5. Conclusions
6. Future Directions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liu, C.; Peng, G.; Kong, Y.; Li, S.; Chen, S. Data Quality Affecting Big Data Analytics in Smart Factories: Research Themes, Issues and Methods. SYMMETRY-BASEL 2021, 13. [Google Scholar] [CrossRef]
- International Standards Organization. ISO/IEC 25012:2008. Technical report, International Organization for Standardization, 2008.
- Chen, H.; Hailey, D.; Wang, N.; Yu, P. A Review of Data Quality Assessment Methods for Public Health Information Systems. International Journal of Environmental Research and Public Health 2014, 11, 5170–5207. [Google Scholar] [CrossRef] [PubMed]
- Liu, J.; Li, J.; Li, W.; Wu, J. Rethinking big data: A review on the data quality and usage issues. ISPRS Journal of Photogrammetry and Remote Sensing 2016, 115, 134–142. [Google Scholar] [CrossRef]
- Ekegren, C.; Gabbe, B.; Finch, C. Sports Injury Surveillance Systems: A Review of Methods and Data Quality. Sports medicine 2016, 46, 49–65. [Google Scholar] [CrossRef]
- Abdullah, M.; Arshah, R. A Review of Data Quality Assessment: Data Quality Dimensions from User’s Perspective. Advanced Science Letters 2018, 24, 7824–7829. [Google Scholar] [CrossRef]
- Stausberg, J.; Nasseh, D.; Nonnemacher, M. Measuring Data Quality: A Review of the Literature between 2005 and 2013. Building Capacity for Health Informatics in the Future. IOS press, 2015, Vol. 210, pp. 712–716.
- Wang, X.; Williams, C.; Liu, Z.; Croghan, J. Big data management challenges in health research-a literature review. Briefings in Bioinformatics 2019, 20, 156–167. [Google Scholar] [CrossRef]
- Ijab, M.T.; Surin, E.S.M.; Nayan, N.M. Conceptualizing big data quality framework from a systematic literature review perspective. Malaysian Journal of Computer Science 2019, pp. 25–37.
- Liu, G. Data quality problems troubling business and financial researchers: A literature review and synthetic analysis. Journal of Business & Finance Librarianship 2020, 25, 315–371. [Google Scholar]
- Teh, H.; Kempa-Liehr, A.; Wang, K. Sensor data quality: a systematic review. Journal of Big Data 2020, 7. [Google Scholar] [CrossRef]
- Salih, F.; Ismail, S.; Hamed, M.; Yusop, O.; Azmi, A.; Azmi, N. Data Quality Issues in Big Data: A Review. Recent Trends in Data Science and Soft Computing: Proceedings of the 3rd International Conference of Reliable Information and Communication Technology (IRICT 2018). Springer, 2019, Vol. 843, pp. 105–116.
- Ibrahim, A.; Mohamed, I.; Satar, N. Factors Influencing Master Data Quality: A Systematic Review. International Journal of Advanced Computer Science and Applications 2021, 12, 181–192. [Google Scholar] [CrossRef]
- Mansouri, T.; Moghadam, M.; Monshizadeh, F.; Zareravasan, A. IoT Data Quality Issues and Potential Solutions: A Literature Review. The Computer Journal 2023, 66, 615–625. [Google Scholar] [CrossRef]
- Iturry, M.; Alves-Souza, S.; Ito, M. Data Quality in health records: A literature review. 2021 16th Iberian Conference on Information Systems and Technologies (CISTI). IEEE, 2021.
- Engsig-Karup, T.; Doupi, P.; Makinen, M.; Launa, R.; Estupinan-Romero, F.; Bernal-Delgado, E.; Kristiansen, N. Review of data quality assessment frameworks experiences around Europe, 2022.
- Ozonze, O.; Scott, P.; Hopgood, A. Automating Electronic Health Record Data Quality Assessment. Journal of Medical Systems 2023, 47. [Google Scholar] [CrossRef] [PubMed]
- Mashoufi, M.; Ayatollahi, H.; Khorasani-Zavareh, D.; Boni, T. Data Quality in Health Care: Main Concepts and Assessment Methodologies. Methods of Information in Medicine 2023, 62, 5–18. [Google Scholar] [CrossRef] [PubMed]
- Morewood, J. Building energy performance monitoring through the lens of data quality: A review. Energy and Buildings 2023, 279. [Google Scholar] [CrossRef]
- Pradhan, S.; Heyn, H.; Knauss, E. Identifying and managing data quality requirements: a design science study in the field of automated driving. Software Quality Journal 2023. [Google Scholar] [CrossRef]
- Zhang, L.; Jeong, D.; Lee, S. Data Quality Management in the Internet of Things. Sensors 2021, 21. [Google Scholar] [CrossRef]
- Firmani, D.; Mecella, M.; Scannapieco, M.; Batini, C. On the Meaningfulness of Big Data Quality (Invited Paper). Data Science and Engineering 2016, 1, 6–20. [Google Scholar] [CrossRef]
- Fenza, G.; Gallo, M.; Loia, V.; Orciuoli, F.; Herrera-Viedma, E. Data set quality in machine learning: consistency measure based on group decision making. Applied Soft Computing 2021, 106, 107366. [Google Scholar] [CrossRef]
- Kavasidis, I.; Lallas, E.; Leligkou, H.C.; Oikonomidis, G.; Karydas, D.; Gerogiannis, V.C.; Karageorgos, A. Deep Transformers for Computing and Predicting ALCOA+ Data Integrity Compliance in the Pharmaceutical Industry. Applied Sciences 2023, 13, 7616. [Google Scholar] [CrossRef]
- Durá, M.; Sánchez-García, Á.; Sáez, C.; Leal, F.; Chis, A.E.; González-Vélez, H.; García-Gómez, J.M. Towards a computational approach for the assessment of compliance of ALCOA+ Principles in pharma industry. Studies in Health Technology and Informatics 2022, 294, 755–759. [Google Scholar]
- Jaya, I.; Sidi, F.; Ishak, I.; Affendey, L.; A. Jabar, M. A review of data quality research in achieving high data quality within organization. Journal of Theoretical and Applied Information Technology 2017, 95, 2647–2657. [Google Scholar]
- Wand, Y.; Wang, R.Y. Anchoring Data Quality Dimensions in Ontological Foundations. Commun. ACM 1996, 39, 86–95. [Google Scholar] [CrossRef]
- Durá, M.; Leal, F.; Sánchez-García, Á.; Sáez, C.; García-Gómez, J.M.; Chis, A.E.; González-Vélez, H. Blockchain for data originality in pharma manufacturing. Journal of Pharmaceutical Innovation 2023, pp. 1–19.
- Alosert, H.; Savery, J.; Rheaume, J.; Cheeks, M.; Turner, R.; Spencer, C.; S. Farid, S.; Goldrick, S. Data integrity within the biopharmaceutical sector in the era of Industry 4.0. Biotechnology Journal 2022, 17, 2100609. [Google Scholar] [CrossRef] [PubMed]
- Efimova, O.V.; Igolnikov, B.V.; Isakov, M.P.; Dmitrieva, E.I. Data Quality and Standardization for Effective Use of Digital Platforms. 2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS), 2021, pp. 282–285.
- Arts, D.G.; De Keizer, N.F.; Scheffer, G.J. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. Journal of the American Medical Informatics Association 2002, 9, 600–611. [Google Scholar] [CrossRef]
- Weiskopf, N.G.; Weng, C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. Journal of the American Medical Informatics Association 2013, 20, 144–151. [Google Scholar] [CrossRef] [PubMed]
- Hock, S.C.; Tay, V.; Sachdeva, V.; Wah, C.L. Pharmaceutical Data Integrity: issues, challenges and proposed solutions for manufacturers and inspectors. Generics and Biosimilars Initiative Journal 2020, 9, 171–183. [Google Scholar] [CrossRef]
- Boukouvala, F.; Muzzio, F.J.; Ierapetritou, M.G. Predictive modeling of pharmaceutical processes with missing and noisy data. AIChE journal 2010, 56, 2860–2872. [Google Scholar] [CrossRef]
- Tabersky, D.; Woelfle, M.; Ruess, J.A.; Brem, S.; Brombacher, S. Recent regulatory trends in pharmaceutical manufacturing and their impact on the industry. Chimia 2018, 72, 146–146. [Google Scholar] [CrossRef] [PubMed]
- Leal, F.; Chis, A.E.; Caton, S.; González-Vélez, H.; García-Gómez, J.M.; Durá, M.; Sánchez-García, A.; Sáez, C.; Karageorgos, A.; Gerogiannis, V.C.; others. Smart pharmaceutical manufacturing: Ensuring end-to-end traceability and data integrity in medicine production. Big Data Research 2021, 24, 100172. [Google Scholar] [CrossRef]
- Cai, L.; Zhu, Y. The challenges of data quality and data quality assessment in the big data era. Data science journal 2015, 14, 2–2. [Google Scholar] [CrossRef]
- Zulkiffli, P.; Akshir, E.; Azis, N.; Cox, K. The development of data quality metrics using thematic analysis. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 2019, 8, 304–310. [Google Scholar]
- Hub, G.D.Q. The Government Data Quality Framework. Technical report, Government Digital Service, 2020.
- Botha, M.; Botha, A.; Herselman, M. Compiling a Prioritized List of Health Data Quality Challenges in Public Healthcare Systems. IST-Africa 2014 Conference Proceedings. IEEE, 2014.
- Heinrich, B.; Klier, M. Metric-based data quality assessment - Developing and evaluating a probability-based currency metric. Decision Support Systems 2015, 72, 82–96. [Google Scholar] [CrossRef]
- Cappiello, C.; Pernici, B.; Villani, L. Strategies for Data Quality Monitoring in Business Processes. Lecture Notes in Computer Science (LNCS). Springer, 2015, Vol. 9051, pp. 226–238.
- Jesilevska, S. Data quality aspects in latvian innovation system. New Challenges of Economic and Business Development–2016. University of Latvia, 2016, pp. 307–320.
- Ortega-Ruiz, L.; Caro, A.; Rodriguez, A. Identifying the Data Quality terminology used by Business People. 2015 34th International Conference of the Chilean Computer Science Society (SCCC). IEEE, 2015.
- Laranjeiro, N.; Soydemir, S.; Bernardino, J. A Survey on Data Quality: Classifying Poor Data. 2015 IEEE 21st Pacific rim international symposium on dependable computing (PRDC). IEEE, 2015.
- Becker, D.; McMullen, B.; King, T. Big data, big data quality problem. 2015 IEEE international conference on big data (big data). IEEE, 2015, pp. 2644–2653.
- Rao, D.; Gudivada, V.; Raghavan, V. Data Quality Issues in Big Data. 2015 IEEE International Conference on Big Data (Big Data). IEEE, 2015, pp. 2654–2660.
- Juddoo, S. Overview of data quality challenges in the context of Big Data. 2015 International Conference on Computing, Communication and Security (ICCCS). IEEE, 2015.
- Taleb, I.; El Kassabi, H.; Serhani, M.; Dssouli, R.; Bouhaddioui, C. Big Data Quality: A Quality Dimensions Evaluation. 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld). IEEE, 2016, pp. 759–765.
- Jiang, H.; Liang, L.; Zhang, Y. An Exploration of Data Quality Management Based on Allocation Efficiency Model. Proceedings of 20th International Conference on Industrial Engineering and Engineering Management: Theory and Apply of Industrial Management. Springer, 2015, pp. 313–318.
- Haug, F. Bad Big Data Science. 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016, pp. 2863–2871.
- Karkouch, A.; Mousannif, H.; Al Moatassime, H.; Noel, T. A Model-Driven Architecture-based Data Quality Management Framework for the Internet of Things. booktitle=2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech),. IEEE, 2016, pp. 252–259.
- Rivas, B.; Merino, J.; Caballero, I.; Serrano, M.; Piattini, M. Towards a service architecture for master data exchange based on ISO 8000 with support to process large datasets. Computer Standards & Interfaces 2017, 54, 94–104. [Google Scholar]
- Aljumaili, M.; Karim, R.; Tretten, P. Metadata-based data quality assessment. VINE Journal of Information and Knowledge Management Systems 2016, 46, 232–250. [Google Scholar] [CrossRef]
- Heinrich, B.; Hristova, D.; Klier, M.; Schiller, A.; Szubartowicz, M. Requirements for Data Quality Metrics. Journal of Data and Information Quality (JDIQ) 2018, 9. [Google Scholar] [CrossRef]
- Edelen, A.; Ingwersen, W. The creation, management, and use of data quality information for life cycle assessment. The International Journal of Life Cycle Assessment 2018, 23, 759–772. [Google Scholar] [CrossRef] [PubMed]
- Fu, Q.; Easton, J. Understanding Data Quality Ensuring Data Quality by Design in the Rail Industry. 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017, pp. 3792–3799.
- Hart, R.; Kuo, M. Better Data Quality for Better Healthcare Research Results - A Case Study. Building Capacity for Health Informatics in the Future. IOS press, 2017, Vol. 234, pp. 161–166.
- Lim, Y.; Yusof, M.; Sivasampu, S. Assessing primary care data quality. INT J HEALTH CARE Q 2018, 31, 203–213. [Google Scholar] [CrossRef]
- Jesilevska, S.; Skiltere, D. Analysis of deficiencies of data quality dimensions. New Challenges of Economic and Business Development–2017 Digital Economy (2017). University of Latvia, 2017, pp. 236–246.
- Heinrich, B.; Klier, M.; Schiller, A.; Wagner, G. Assessing data quality - A probability-based metric for semantic consistency. Decision Support Systems 2018, 110, 95–106. [Google Scholar] [CrossRef]
- Koltay, T. Data governance, data literacy and the management of data quality. IFLA journal 2016, 42, 303–312. [Google Scholar] [CrossRef]
- Liu, C.; Talaei-Khoei, A.; Zowghi, D.; Daniel, J. Data Completeness in Healthcare: A Literature Survey. Pacific Asia Journal of the Association for Information Systems 2017, 9, 75–100. [Google Scholar] [CrossRef]
- Cichy, C.; Rass, S. An Overview of Data Quality Frameworks. IEEE Access 2019, 7, 24634–24648. [Google Scholar] [CrossRef]
- Gyulgyulyan, E.; Ravat, F.; Astsatryan, H.; Aligon, J. Data Quality Impact in Business Inteligence. 2018 Ivannikov Memorial Workshop (IVMEM). IEEE, 2018, pp. 47–51.
- Abdallah, M. Big Data Quality Challenges. 2019 International Conference on Big Data and Computational Intelligence (ICBDCI). IEEE, 2019.
- Rajan, N.; Gouripeddi, R.; Mo, P.; Madsen, R.; Facelli, J. Towards a content agnostic computable knowledge repository for data quality assessment. Computer Methods and Programs in Biomedicine 2019, 177, 193–201. [Google Scholar] [CrossRef] [PubMed]
- Bronselaer, A.; Nielandt, J.; Boeckling, T.; De Tre, G. Operational Measurement of Data Quality. COMM COM INF SC. Springer, 2018, Vol. 855, pp. 517–528.
- Barsi, A.; Kugler, Z.; Juhasz, A.; Szabo, G.; Batini, C.; Abdulmuttalib, H.; Huang, G.; Shen, H. Remote sensing data quality model: from data sources to lifecycle phases. International Journal of Image and Data Fusion 2019, 10, 280–299. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, Y.; Zhou, K.; Yang, Y.; Liu, Y. Semantic-aware data quality assessment for image big data. Future Generation Computer Systems 2020, 102, 53–65. [Google Scholar] [CrossRef]
- Liu, C.; Nitschke, P.; Williams, S.; Zowghi, D. Data quality and the Internet of Things. Computer 2020, 102, 573–599. [Google Scholar] [CrossRef]
- Cristalli, E.; Serra, F.; Marotta, A. Data Quality Evaluation in Document Oriented Data Stores. Advances in Conceptual Modeling: ER 2018 Workshops Emp-ER, MoBiD, MREBA, QMMQ, SCME, Xi’an, China, October 22-25, 2018, Proceedings 37. Springer, 2019, Vol. 11158, pp. 309–318.
- Firmani, D.; Tanca, L.; Torlone, R. Ethical Dimensions for Data Quality. Journal of Data and Information Quality (JDIQ) 2020, 12. [Google Scholar] [CrossRef]
- Grueneberg, K.; Calo, S.; Dewan, P.; Verma, D.; O’Gorman, T. A Policy-based Approach for Measuring Data Quality. 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2019, pp. 4025–4031.
- Mustapha, J.C.; Mokhtar, S.A.; Jaffar, J.; Boursier, P. Measurement of Data Consumer Satisfaction with Data Quality for Improvement of Data Utilization. 2019 13th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics (MACS). IEEE, 2019, pp. 1–7.
- Ceravolo, P.; Bellini, E. Towards Configurable Composite Data Quality Assessment. 2019 IEEE 21st Conference on Business Informatics (CBI). IEEE, 2019, Vol. 1, pp. 249–257.
- Günther, L.C.; Colangelo, E.; Wiendahl, H.H.; Bauer, C. Data quality assessment for improved decision-making: a methodology for small and medium-sized enterprises. Procedia Manufacturing 2019, 29, 583–591. [Google Scholar] [CrossRef]
- Ehrlinger, L.; Haunschmid, V.; Palazzini, D.; Lettner, C. A DaQL to Monitor Data Quality in Machine Learning Applications. Database and Expert Systems Applications: 30th International Conference, DEXA 2019, Linz, Austria, August 26–29, 2019, Proceedings, Part I 30. Springer, 2019, pp. 227–237.
- Ridzuan, F.; Zainon, W.M.N.W. A Review on Data Cleansing Methods for Big Data. Procedia Computer Science 2019, 161, 731–738. [Google Scholar] [CrossRef]
- Li, A.; Zhang, L.; Qian, J.; Xiao, X.; Li, X.; Xie, Y. TODQA: Efficient Task-Oriented Data Quality Assessment. 2019 15th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN). IEEE, 2019, pp. 81–88.
- Souibgui, M.; Atigui, F.; Zammali, S.; Cherfi, S.; Yahia, S.B. Data quality in ETL process: A preliminary study. Procedia Computer Science 2019, 159, 676–687. [Google Scholar] [CrossRef]
- Nikiforova, A. Definition and Evaluation of Data Quality: User-Oriented Data Object-Driven Approach to Data Quality Assessment. Baltic Journal of Modern Computing 2020, 8, 391–432. [Google Scholar] [CrossRef]
- Albertoni, R.; Isaac, A. Introducing the Data Quality Vocabulary (DQV). Semantic Web 2021, 12, 81–97. [Google Scholar] [CrossRef]
- Mulgund, P.; Sharman, R.; Anand, P.; Shekhar, S.; Karadi, P. Data Quality Issues With Physician-Rating Websites: Systematic Review. Journal of Medical Internet Research 2020, 22. [Google Scholar] [CrossRef] [PubMed]
- Valencia-Parra, A.; Parody, L.; Varela-Vaca, A.; Caballero, I.; Gomez-Lopez, M. DMN4DQ: When data quality meets DMN. Decision Support Systems 2021, 141. [Google Scholar] [CrossRef]
- Onyeabor, G.; Ta’a, A. A Model for Addressing Quality Issues in Big Data. Recent Trends in Data Science and Soft Computing: Proceedings of the 3rd International Conference of Reliable Information and Communication Technology (IRICT 2018). Springer, 2019, Vol. 843, pp. 65–73.
- Marev, M.; Compatangelo, E.; Vasconcelos, W. Intrinsic Indicators for Numerical Data Quality. 5th International Conference on Internet of Things, Big Data and Security, IoTBDS 2020. Scipress, 2020, pp. 341–348.
- Sarafidis, M.; Tarousi, M.; Anastasiou, A.; Pitoglou, S.; Lampoukas, E.; Spetsarias, A.; Matsopoulos, G.; Koutsouris, D. Data Quality Challenges in a Learning Health System. Studies in health technology and informatics 2020, 270, 143–147. [Google Scholar] [PubMed]
- Musto, J.; Dahanayake, A. Integrating data quality requirements to citizen science application design. Proceedings of the 11th International Conference on Management of Digital EcoSystems. Association for Computing Machinery, 2019, pp. 166–173.
- Musto, J.; Dahanayake, A. Improving Data Quality, Privacy and Provenance in Citizen Science Applications. Information Modelling and Knowledge Bases XXXI. IOS press, 2020, Vol. 321, pp. 141–160.
- Weatherburn, C. Data quality in primary care, Scotland. Scottish medical journal 2021, 66, 66–72. [Google Scholar] [CrossRef]
- Gadde, M.; Wang, Z.; Zozus, M.; Talburt, J.; Greer, M. Rules Based Data Quality Assessment on Claims Database. In The Importance of Health Informatics in Public Health during a Pandemic; IOS press, 2020; Vol. 272, pp. 350–353.
- Foscarin, F.; Rigaux, P.; Thion, V. Data quality assessment in digital score libraries The GioQoso Project. International Journal on Digital Libraries 2021, 22, 159–173. [Google Scholar] [CrossRef]
- Piscopo, A.; Simperl, E. What we talk about when we talk about Wikidata quality: a literature survey. Proceedings of the 15th International Symposium on Open Collaboration. Association for Computing Machinery, 2019.
- Gualo, F.; Rodriguez, M.; Verdugo, J.; Caballero, I.; Piattini, M. Data quality certification using ISO/IEC 25012: Industrial experiences. Journal of Systems and Software 2021, 176. [Google Scholar] [CrossRef]
- Schmidt, C.; Struckmann, S.; Enzenbach, C.; Reineke, A.; Stausberg, J.; Damerow, S.; Huebner, M.; Schmidt, B.; Sauerbrei, W.; Richter, A. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Medical Research Methodology 2021, 21. [Google Scholar] [CrossRef]
- Kong, X. Evaluation of Flight Test Data Quality Based on Rough Set Theory. 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). IEEE, 2020, pp. 1053–1057.
- Wong, K.; Wong, R. Big data quality prediction informed by banking regulation. International Journal of Data Science and Analytics 2021, 12, 147–164. [Google Scholar] [CrossRef]
- Lettner, C.; Stumptner, R.; Fragner, W.; Rauchenzauner, F.; Ehrlingera, L. DaQL 2.0: Measure Data Quality based on Entity Models. Procedia Computer Science. Elsevier, 2021, Vol. 180, pp. 772–777.
- Kong, L.; Xi, Y.; Lang, Y.; Wang, Y.; Zhang, Q. A Data Quality Evaluation Index for Data Journals. Lecture Notes in Computer Science (LNCS). Springer, 2019, Vol. 11473, pp. 291–300.
- Taleb, I.; Serhani, M.; Bouhaddioui, C.; Dssouli, R. Big data quality framework: a holistic approach to continuous quality management. Journal of Big Data 2021, 8. [Google Scholar] [CrossRef]
- Akgul, M. Data Quality: Success Factors Emergent Research Forum (ERF). AMCIS 2021 Proceedings. Association for Information Systems, 2021.
- Juddoo, S.; George, C.; Duquenoy, P.; Windridge, D. Data Governance in the Health Industry: Investigating Data Quality Dimensions within a Big Data Context. Applied System Innovation 2018, 1. [Google Scholar] [CrossRef]
- Bronselaer, A. Data Quality Management: An Overview of Methods and Challenges. Lecture Notes in Artificial Intelligence. Springer, 2021, Vol. 12871, pp. 127–141.
- Bogdanov, A.; Degtyarev, A.; Shchegoleva, N.; Khvatov, V. Data Quality in a Decentralized Environment. Lecture Notes in Computer Science (LNCS). Springer, 2020, Vol. 12251, pp. 58–71.
- Valencia-Parra, A.; Parody, L.; Varela-Vaca, A.; Caballero, I.; Gomez-Lopez, M. DMN for Data Quality Measurement and Assessment. Lecture Notes in Business Information Processing. Springer, 2019, Vol. 362, pp. 362–374.
- Fang, Z.; Liu, Y.; Lu, Q.; Pitt, M.; Hanna, S.; Tian, Z. BIM-integrated portfolio-based strategic asset data quality management. Automation in Construction 2022, 134. [Google Scholar] [CrossRef]
- Jain, A.; Patel, H.; Nagalapatti, L.; Gupta, N.; Mehta, S.; Guttula, S.; Mujumdar, S.; Afzal, S.; Mittal, R.; Munigala, V. Overview and Importance of Data Quality for Machine Learning Tasks. Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. Association for Computing Machinery, 2020, pp. 3561–3562.
- Shenoy, K.; Ilievski, F.; Garijo, D.; Schwabe, D.; Szekely, P. A study of the quality of Wikidata. Journal of Web Semantics 2022, 72. [Google Scholar] [CrossRef]
- Hickey, D.; Connor, R.; McCormack, P.; Kearney, P.; Rosti, R.; Brennan, R. The Data Quality Index: Improving Data Quality in Irish Healthcare Records. 24th International Conference Enterprise Information Systems (ICEIS ’21). ICEIS, 2021, pp. 625–636.
- Talha, M.; Kalam, A. Big Data: Towards a Collaborative Security System at the Service of Data Quality. nternational Conference on Hybrid Intelligent Systems. Springer, 2022, Vol. 420, pp. 595–606.
- Ehrlinger, L.; Woess, W. A Survey of Data Quality Measurement and Monitoring Tools. Frontiers in Big Data 2022, 5. [Google Scholar] [CrossRef] [PubMed]
- AbuHalimeh, A. Improving Data Quality in Clinical Research Informatics Tools. Frontiers in Big Data 2022, 5. [Google Scholar] [CrossRef] [PubMed]
- Azeroual, O. Proof of Concept to Secure the Quality of Research Data. Fourteenth International Conference on Machine Vision (ICMV 2021). Society of Photo-Optical Instrumentation Engineers (SPIE), 2022, Vol. 12084.
- Caballero, I.; Gualo, F.; Rodriguez, M.; Piattini, M. BR4DQ: A methodology for grouping business rules for data quality evaluation. Infromation Systems 2022, 109. [Google Scholar] [CrossRef]
- Nakajima, S.; Nakatani, T. AI Extension of SQuaRE Data Quality Model. 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 2021, pp. 306–313.
- Reda, O.; Zellou, A. SMDQM- Social Media Data Quality Assessment Model. 2022 2nd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET). IEEE, 2022, pp. 733–739.
- Mohammed, M.; Talburt, J.; Dagtas, S.; Hollingsworth, M. A Zero Trust Model Based Framework For Data Quality Assessment. 2021 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, 2021, pp. 305–307.
- Iyengar, A.; Patel, D.; Shrivastava, S.; Zhou, N.; Bhamidipaty, A. Real-Time Data Quality Analysis. 2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI). IEEE, 2020, pp. 101–108.
- To, A.; Meymandpour, R.; Davis, J.; Jourjon, G.; Chan, J. A Linked Data Quality Assessment Framework for Network Data. Proceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). Association for Computing Machinery, 2019.
- Wurl, A.; Falkner, A.; Haselbock, A.; Mazak, A. Using Signifiers for Data Integration in Rail Automation. Proceedings of the 6th International Conference on Data Science, Technology and Applications. Scipress, 2017, pp. 172–179.
- Kuban, M.; Gabaj, S.; Aggoune, W.; Vona, C.; Rigamonti, S.; Draxl, C. Similarity of materials and data-quality assessment by fingerprinting. MRS Bulletin 2022. [Google Scholar] [CrossRef]
- Brajkovic, H.; Jaksic, D.; Poscic, P. Data Warehouse and Data Quality - An Overview. Central European Conference on Information and Intelligent Systems. Faculty of Organization and Informatics Varazdin, 2020, pp. 17–24.
- Valverde, C.; Marotta, A.; Panach, J.; Vallespir, D. Towards a model and methodology for evaluating data quality in software engineering experiments. Information and Software Technology 2022, 151. [Google Scholar] [CrossRef]
- Serra, F.; Peralta, V.; Marotta, A.; Marcel, P. Modeling Context for Data Quality Management. Lecture Notes in Computer Science (LNCS). Springer, 2022, Vol. 13607, pp. 325–335.
- Nesca, M.; Katz, A.; Leung, C.; Lix, L. A scoping review of preprocessing methods for unstructured text data to assess data quality. International Journal of Population Data Science 2022, 7. [Google Scholar] [CrossRef]
- Ben Hassine, S.; Clement, D. Open Data Quality Dimensions and Metrics: State of the Art and Applied Use Cases. Lecture Notes in Business Information Processing. Springer, 2020, Vol. 394, pp. 311–323.
- Elouataoui, W.; El Alaoui, I.; El Mendili, S.; Gahi, Y. An Advanced Big Data Quality Framework Based on Weighted Metrics. Big Data and Cognitive Computing 2022, 6. [Google Scholar] [CrossRef]
- Mashoufi, M.; Ayatollahi, H.; Khorasani-Zavareh, D.; Boni, T. Data quality assessment in emergency medical services: an objective approach. BMC Emergency Medicine 2023, 23. [Google Scholar] [CrossRef]
- Buelvas, J.; Munera, D.; Tobon, V.; Aguirre, J.; Gaviria, N. Data Quality in IoT-Based Air Quality Monitoring Systems: a Systematic Mapping Study. Water, Air, & Soil Pollution 2023, 234. [Google Scholar]
- Guerra-Garcia, C.; Nikiforova, A.; Jimenez, S.; Perez-Gonzalez, H.; Ramirez-Torres, M.; Ontanon-Garcia, L. ISO/IEC 25012-based methodology for managing data quality requirements in the development of information systems: Towards Data Quality by Design. Data and Knowledge Engineering 2023, 145. [Google Scholar] [CrossRef]
- Krishna, C.; Ruikar, K.; Jha, K. Determinants of Data Quality Dimensions for Assessing Highway Infrastructure Data Using Semiotic Framework. Buildings 2023, 13. [Google Scholar] [CrossRef]
- Mirzaie, M.; Behkamal, B.; Allahbakhsh, M.; Paydar, S.; Bertino, E. State of the art on quality control for data streams: A systematic literature review. Computer Science Review 2023, 48. [Google Scholar] [CrossRef]
- Bertrand, Y.; Van Belle, R.; De Weerdt, J.; Serral, E. Defining Data Quality Issues in Process Mining with IoT Data. Lecture Notes in Business Information Processing. Springer, 2023, Vol. 468, pp. 422–434.
- Lewis, A.; Weiskopf, N.; Abrams, Z.; Foraker, R.; Lai, A.; Payne, P.; Gupta, A. Electronic health record data quality assessment and tools: a systematic review. Journal of the American Medical Informatics Association 2023. [Google Scholar] [CrossRef]
- Arden, N.S.; Fisher, A.C.; Tyner, K.; Lawrence, X.Y.; Lee, S.L.; Kopcha, M. Industry 4.0 for pharmaceutical manufacturing: Preparing for the smart factories of the future. International Journal of Pharmaceutics 2021, 602, 120554. [Google Scholar] [CrossRef]
- Perez-Castillo, R.; Carretero, A.G.; Rodriguez, M.; Caballero, I.; Piattini, M.; Mate, A.; Kim, S.; Lee, D. Data Quality Best Practices in IoT Environments. 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC). IEEE, 2018, pp. 272–275.
- Huser, V.; Li, X.; Zhang, Z.; Jung, S.; Park, R.W.; Banda, J.; Razzaghi, H.; Londhe, A.; Natarajan, K. Extending Achilles Heel Data Quality Tool with New Rules Informed by Multi-Site Data Quality Comparison. In MEDINFO 2019: Health and Wellbeing e-Networks for All; IOS Press, 2019; pp. 1488–1489.
- Heine, F.; Kleiner, C.; Oelsner, T. A DSL for Automated Data Quality Monitoring. Lecture Notes in Computer Science (LNCS). Springer, 2020, Vol. 12391, pp. 89–105.
- Montana, P.; Marotta, A. Data Quality Management oriented to the Electronic Medical Record. 2021 XLVII Latin American Computing Conference (CLEI). IEEE, 2021.
- Strozyna, M.; Filipiak, D.; Wecel, K. Data Quality Assessment - A Use Case from the Maritime Domain. Lecture Notes in Business Information Processing. Springer, 2020, Vol. 394, pp. 5–20.
- Ji, R.; Hou, H.; Sheng, G.; Jiang, X. Data Quality Assessment for Electrical Equipment Condition Monitoring. 2022 9th International Conference on Condition Monitoring and Diagnosis (CMD). IEEE, 2022, pp. 259–262.


| Dimension | Definition |
|---|---|
| Accuracy | The degree to which data has attributes that correctly represent the true value of the intended attribute of a concept or event in a specific context of use. |
| Completeness | The degree to which subject data associated with an entity has values for all expected attributes and related entity instances in a specific context of use. |
| Consistency | The degree to which data has attributes that are free from contradiction and are coherent with other data in a specific context of use. It can be either or both among data regarding one entity and across similar data for comparable entities. |
| Credibility | The degree to which data has attributes that are regarded as true and believable by users in a specific context of use. Credibility includes the concept of authenticity (the truthfulness of origins, attributions, commitments). |
| Currentness | The degree to which data has attributes that are of the right age in a specific context of use. |
| Accessibility | The degree to which data can be accessed in a specific context of use, particularly by people who need supporting technology or special configuration because of some disability. |
| Compliance | The degree to which data has attributes that adhere to standards, conventions or regulations in force and similar rules relating to data quality in a specific context of use. |
| Confidentiality | The degree to which data has attributes that ensure that it is only accessible and interpretable by authorised users in a specific context of use. |
| Efficiency | The degree to which data has attributes that can be processed and provide the expected levels of performance by using the appropriate amounts and types of resources in a specific context of use. |
| Precision | The degree to which data has attributes that are exact or that provide discrimination in a specific context of use. |
| Traceability | The degree to which data has attributes that provide an audit trail of access to the data and of any changes made to the data in a specific context of use. |
| Understandability | The degree to which data has attributes that enable it to be read and interpreted by users, and are expressed in appropriate languages, symbols and units in a specific context of use. |
| Availability | The degree to which data has attributes that enable it to be retrieved by authorised users and/or applications in a specific context of use. |
| Portability | The degree to which data has attributes that enable it to be installed, replaced or moved from one system to another preserving the existing quality in a specific context of use. |
| Recoverability | The degree to which data has attributes that enable it to maintain and preserve a specified level of operations and quality, even in the event of failure, in a specific context of use. |
| Dimension | Definition |
|---|---|
| Governance | The degree to which data has attributes that adhere to the formalised frameworks of authority and accountability that support harmonised data activities across an organisation. |
| Usefulness | The usefulness of data is determined by the extent to which its attributes meet the specific requirements of users or applications. This includes the data’s adaptability across various contexts, recognising its potential for diverse applicability due to aspects such as reusability and interoperability. |
| Quantity | The degree to which data has attributes that represent the sufficient amount or volume, providing a comprehensive view of the intended attribute of a concept or event in a specific context of use. |
| Semantics | The degree to which data accurately and consistently represents the intended meaning, interpretation, and real-world concepts within a specific context of use, ensuring correct semantic understanding by users and applications. |
| Dimension | Associated Terms |
|---|---|
| Accuracy | Accuracy, Accurate, Closely match a real-state, Coincidence, Correct, Corrections made, Correctness, Data value out of range, Errors, Free of error, Free of mistakes, Inaccurate, Incorrect, Positive predicted values, Value accuracy |
| Completeness | Complete, Completeness, Comprehensiveness, Diversity, Entity heterogeneity, Heterogeneity, Incompleteness, Coverage, Min. data capture, Min. sample points, Min. time coverage, Missing values, Representativeness, Study representativeness, Variety, Areas covered, Geographical coverage, Handling of null values, Homogeneity, Missing information, Missingness, Omission, Representativity, Scope, Technological cover, Time-related coverage |
| Consistency | Coherence, Cohesiveness, Comparability, Consistency, Consistent, Consistent representation, Constant representation, Comparable, Duplication, Incompatibility, Inconsistency, Redundancy, Representational Consistency, Reproducibility, Spatial stability, Structural consistency, Syntactic Accuracy, Thematic accuracy, Variability |
| Credibility | Agreement, Authenticity, Believability, Bias, Coding Reliability, Confidence, Corroboration, Credibility, Freedom of bias, Impartiality, Incorrect information, Integrity, Misleading, Plausibility, Popularity, Quality, Reliability, Reputability, Reputation, Robustness, Status, Trust, Trustworthiness, Unambiguity, Unbiased, Valid, Validity, Veracity |
| Currentness | Actuality, Currency, Currentness, Freshness, Outdated Information, Rate of recording, Recency, Timeliness, Timely, Up-to-date, Velocity, Vitality, Volatility, Volatability |
| Dimension | Associated Terms |
|---|---|
| Accessibility | Accessibility, Clear definition, Discoverability, Ease of access, Findability |
| Compliance | Compliance, Concordance, Conformance, Conformity, Licensing, Model conformance, Privacy preservation, Value data type, Appropriate use |
| Confidentiality | Confidentiality, Data protection, Privacy, Security, Sensitivity, Statistical disclosure control, Vulnerability |
| Efficiency | Costs effectiveness, Ease of manipulation, Efficiency, Expediency, Minimality, Optimal use of resources, Performance, Viscosity |
| Governance * | Accountability, Alignment, Auditability, Authority, Authorisation, Enduring, Management, Risks |
| Precision | Attribute granularity, Brief representation, Concise representation, Conciseness, Detection limit, Distribution bias, Format precision, Imprecise, Intrinsic Approximation, Intrinsic uncertainty, Intrinsic variability, Level of detail, Noiseness, Objectivity, Outliers, Precision, Precision of domains, Representational conciseness, Spatial resolution, Time resolution, Unambiguous, Uncertainty, Variation |
| Traceability | Attributable, Capture, Contemporaneous, Documentation, Fairness, Identifiability, Lineage, Meta-data, Original, Provenance, Quality of Methodology, Source, Traceability, Translatability, Transparency, Verifiability |
| Understandability | Characteristic series structure, Clarity, Clean, Complexity, Comprehensibility, Content, Ease of interpretation, Ease of understanding, Format, Formats, Information-to-noise ratio, Informativeness, Interpretable, Interpretability, Legible, Presentation, Presentation quality, Quantitativeness, Readability, Semiotic, Structure, Transformation, Understandability, Understandable, Understanding, Visualisation |
| Usefulness * | Applicability, Appropriateness, Artificiality, Definition, Essentialness, Expandability, Fitness, Fitness for Purpose, Fitness for use, Flexibility, Importance, Interoperability, Irrelevant, Meaningful, Naturalness, Relevance, Relevancy, Relevant, Reusability, Suitability, Uniqueness, Useableness,Usability, Usefulness, Utility, Valuation, Value, Value-added, Versatility |
| Dimensions | Associated Terms |
|---|---|
| Availability | Access security, Adequacy, Attainability, Availability, Available, Obtainability, Visibility |
| Portability | Controllability, Mobility, Portability, Use of Storage |
| Quantity * | Amount of data, Appropriate amount, Compactness, Data volume, Scalability, Sufficiency, Suitable amount, Volume |
| Recoverability | Back-up, Decay, Recoverability |
| Semantics * | Interlinking, Language, Semantic accuracy, Semantic consistency, Syntactic validity, Syntax |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).