Submitted:
17 September 2025
Posted:
19 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Material and Methods
3. Definitions and Frameworks of Data Quality
3.1. Standards and ISO 8000
| Source / Standard | Definition of Data Quality | Key Dimensions / Emphasis | Notes |
|---|---|---|---|
| ISO 8000 (2011–2015) [5,13,14,16] | “Quality is the conformance of characteristics to requirements.” Data is high-quality if it meets stated requirements. | Accuracy, completeness, consistency, timeliness, uniqueness, validity. | Formal international standard; widely adopted in industry. |
| Wang & Strong (1996)[10] | “Data quality is data that is fit for use by data consumers.” | Four categories: Intrinsic (accuracy, objectivity), Contextual (relevance, timeliness, completeness), Representational (interpretability, consistency), Accessibility (access, security). | Highly influential conceptual framework; consumer-oriented. |
| Strong, Lee & Wang (1997)[11] | Emphasizes data quality as “fitness for use” in operations, decision-making, and planning. | Same four categories as above. | Extends the earlier framework with an organizational perspective. |
| Pipino, Lee & Wang (2002)[17] | Defines data quality through measurable attributes that reflect accuracy, completeness, consistency, and timeliness. | Quantitative measures for core dimensions. | Introduces practical tools for data quality assessment. |
| Ehrlinger & Wöß (2022) [18] | Data quality as a multidimensional construct is influenced by context and use. | Highlights timeliness, completeness, plausibility, integrity, and multifacetedness. | Extends beyond classical dimensions and focuses on big data. |
| Haug, Zachariassen & van Liempt (2011)[19] | Suggests that “perfect” data quality is neither achievable nor optimal; instead, the right level balances costs of maintenance vs. costs of poor data. | Trade-off between quality maintenance effort and business impact. | Cost-oriented perspective. |
3.2. Conceptual Definitions
3.3. Frameworks
3.4. Scope of Data Quality in Data Analytics
3.4.1. Role in Data Analytics
4. Dimensions of Data Quality
4.1. Accuracy
4.2. Completeness
4.3. Consistency
4.4. Timeliness
4.5. Relevance
4.6. Validity
4.7. Emerging Dimensions (Plausibility, Multifacetedness, Integrity)
5. Consequences of Poor Data Quality
5.1. Impact on Decision-Making and Outcomes
5.2. Flawed Insights and Conclusions
5.3. Wasted Resources
5.4. Damaged Reputation
5.5. Missed Opportunities
6. Case Studies: Illustrating the Consequences of Data Quality Failures and Successes
| Case / Organization | Domain | Data Quality Issue | Consequence |
|---|---|---|---|
| Failures | |||
| Equifax [72] | Finance / Credit reporting | Inaccurate and poorly managed consumer credit data | Erosion of public trust; legal and financial consequences |
| NASA Mars Climate Orbiter) [25] | Aerospace / Engineering | Unit mismatch (imperial vs. metric) not reconciled in data systems | Spacecraft loss (~$125 million) |
| Mid-sized enterprise (CRM migration) [24] | Business / CRM | Data quality challenges during migration from legacy systems | Errors, inconsistent formats, and disruption in customer management |
| Large home appliance business [20] | Retail / CRM | Low completeness, timeliness, and accuracy of customer data | Ineffective campaigns, reduced loyalty, and weak predictive performance |
| University fundraising CRM. [33] | Education / Fundraising | Outdated, incomplete, and inaccurate alumni data | Reduced donor identification, inefficient fundraising, wasted resources |
| Target [82]. | Retail / CRM | Predictive analytics revealed sensitive customer information | Public backlash over privacy intrusion |
| Google Flu Trends (2008–2013) [79,80,81]. | Public health analytics | Overfitting and reliance on biased signals | Overestimation of flu cases; credibility loss |
| Amsterdam Tax Office [24]. | Public sector | Duplicate and inconsistent taxpayer records | Inefficient operations; reduced compliance |
| Healthcare organizations [22,83,84] | Healthcare / CRM | Incomplete or inconsistent patient data in electronic health records | Medical errors, patient safety risks |
| Successes of Data Quality | |||
| Netflix Recommendation System [78]. | Entertainment / Business | Leveraging high-quality behavioral data for personalization | Recommendations drive 80% of content consumption; increased engagement and revenue |
| Freight forwarding industry [25] | Logistics / Freight forwarding | Workflow-embedded quality checks across logistics processes | Improved coordination, fewer customs delays, reduced correction costs |
7. Ensuring Data Quality: Methods and Best Practices
7.1. Data Collection Practices
7.2. Data Cleaning and Validation
7.3. Monitoring and Maintenance
7.4. Governance/Ethics
8. Challenges and Proposed Solutions
8.1. Technical Challenges
8.2. Organizational Challenges
8.3. Solutions
9. Discussion
9.1. Synthesis of Literature: What Is Agreed upon, What Is Debated
9.2. Emerging Trends and Lessons from Failures
9.3. Research Gaps
9.4. Implications for Practitioners and Policymakers
10. Conclusions
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
Abbreviations
| Abbreviation | Full Term |
| AI | Artificial Intelligence |
| BERT | Bidirectional Encoder Representations from Transformers |
| BI | Business Intelligence |
| CLV | Customer Lifetime Value |
| CRM | Customer Relationship Management |
| DQA | Data Quality Assessment |
| DQI | Data Quality Index |
| DQM | Data Quality Management |
| DSAN | Denoising Self-Attention Network |
| DSS | Decision Support Systems |
| ECCMA | Electronic Commerce Code Management Association |
| FAIR | Findability, Accessibility, Interoperability, and Reusability |
| FAIR4RS | FAIR for Research Software |
| GDPR | General Data Protection Regulation |
| IoT | Internet of Things |
| ISO | International Organization for Standardization |
| KNN | K-Nearest Neighbors |
| LLMs | Large Language Models |
| LSTM | Long Short-Term Memory |
| NATO | North Atlantic Treaty Organization |
| NCS | NATO Codification System |
| RFM | Recency, Frequency, and Monetary |
| SaaS | Software as a Service |
| SMEs | Small and Medium-sized Enterprises |
| WHO | World Health Organization |
References
- Bernardi, F.; Andrade Alves, D.; Crepaldi, N. Y.; Yamada, D. B.; Lima, V.; Costa Rijo, R. P. C. L. Data Quality in Health Research: A Systematic Literature Review [PrePrint]. medRxiv 2022.
- Elahi, E. Data quality in healthcare – Benefits, challenges, and steps for improvement https://dataladder.com/data-quality-in-healthcare-data-systems/ (accessed 2024 -08 -16).
- Ali, S. M.; Naureen, F.; Noor, A.; Kamel Boulos, M. N.; Aamir, J.; Ishaq, M.; Anjum, N.; Ainsworth, J.; Rashid, A.; Majidulla, A.; Fatima, I. Data Quality: A Negotiator between Paper-Based and Digital Records in Pakistan’s TB Control Program. Data (Basel) 2018, 3 (3), 27. [CrossRef]
- Chen, H.; Hailey, D.; Wang, N.; Yu, P. A Review of Data Quality Assessment Methods for Public Health Information Systems. Int J Environ Res Public Health 2014, 11 (5), 5170–5207. [CrossRef]
- ISO 8000-1:2022(en). Data quality — Part 1: Overview https://www.iso.org/obp/ui/#iso:std:iso:8000:-1:ed-1:v1:en (accessed 2025 -08 -20).
- ECCMA. What is ISO 8000? https://eccma.org/what-is-iso-8000/ (accessed 2025 -08 -20).
- Wilkinson, M. D.; Dumontier, M.; Aalbersberg, Ij. J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.-W.; da Silva Santos, L. B.; Bourne, P. E.; Bouwman, J.; Brookes, A. J.; Clark, T.; Crosas, M.; Dillo, I.; Dumon, O.; Edmunds, S.; Evelo, C. T.; Finkers, R.; Gonzalez-Beltran, A.; Gray, A. J. G.; Groth, P.; Goble, C.; Grethe, J. S.; Heringa, J.; ’t Hoen, P. A. C.; Hooft, R.; Kuhn, T.; Kok, R.; Kok, J.; Lusher, S. J.; Martone, M. E.; Mons, A.; Packer, A. L.; Persson, B.; Rocca-Serra, P.; Roos, M.; van Schaik, R.; Sansone, S.-A.; Schultes, E.; Sengstag, T.; Slater, T.; Strawn, G.; Swertz, M. A.; Thompson, M.; van der Lei, J.; van Mulligen, E.; Velterop, J.; Waagmeester, A.; Wittenburg, P.; Wolstencroft, K.; Zhao, J.; Mons, B. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci Data 2016, 3 (1), 160018. [CrossRef]
- Wang, R. Y.; Strong, D. M. Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems 1996, 12 (4), 5–33. [CrossRef]
- Strong, D. M.; Lee, Y. W.; Wang, R. Y. Data Quality in Context. Commun ACM 1997, 40 (5), 103–110. [CrossRef]
- Benson, P. NATO Codification System as the Foundation for ISO 8000, the International Standard for Data Quality. Oil IT Journal 2008, 1 (5), 1–4.
- ISO 8000-8:2015. Data Quality — Part 8: Information and Data Quality: Concepts and Measuring. Geneva, Switzerland 2015.
- ISO 8000-110:2009. Data Quality — Part 110: Master Data: Exchange of Characteristic Data: Syntax, Semantic Encoding, and Conformance to Data Specification. International Organization for Standardization (ISO): Geneva, Switzerland 2009.
- ISO 8000-1:2011. Data Quality — Part 1: Overview. International Organization for Standardization (ISO): Geneva, Switzerland 2011.
- Miller, R.; Whelan, H.; Chrubasik, M.; Whittaker, D.; Duncan, P.; Gregório, J. A Framework for Current and New Data Quality Dimensions: An Overview. Data (Basel) 2024, 9 (12), 151. [CrossRef]
- Pipino, L. L.; Lee, Y. W.; Wang, R. Y. Data Quality Assessment. Commun ACM 2002, 45 (4), 211–218. [CrossRef]
- Ehrlinger, L.; Wöß, W. A Survey of Data Quality Measurement and Monitoring Tools. Front Big Data 2022, 5. [CrossRef]
- Haug, A.; Zachariassen, F.; van Liempd, D. The Costs of Poor Data Quality. Journal of Industrial Engineering and Management 2011, 4 (2), 168–193.
- Suh, Y. Exploring the Impact of Data Quality on Business Performance in CRM Systems for Home Appliance Business. IEEE Access 2023, 11, 116076–116089. [CrossRef]
- Canali, S. Towards a Contextual Approach to Data Quality. Data (Basel) 2020, 5 (4), 90. [CrossRef]
- Payton, F. C.; Zahay, D. Understanding Why Marketing Does Not Use the Corporate Data Warehouse for CRM Applications. Journal of Database Marketing & Customer Strategy Management 2003, 10 (4), 315–326. [CrossRef]
- Henderson, D.; Earley, S.; Sykora, E.; Smith, E. DAMA -DMBOOK Data Management Body of Knowledge, Second Edition.; DAMA International: Basking Ridge, NJ, 2017.
- Bidlack, C.; Wellman, M. P. Exceptional Data Quality Using Intelligent Matching and Retrieval. AI Mag 2010, 31 (1), 65–73. [CrossRef]
- Vaknin, M.; Filipowska, A. Information Quality Framework for the Design and Validation of Data Flow Within Business Processes - Position Paper; 2017; pp 158–168. [CrossRef]
- Petrović, M. Data Quality in Customer Relationship Management (CRM): Literature Review. Strategic Management 2020, 25 (2), 40–47. [CrossRef]
- Even, A.; Shankaranarayanan, G.; Berger, P. D. Evaluating a Model for Cost-Effective Data Quality Management in a Real-World CRM Setting. Decis Support Syst 2010, 50 (1), 152–163. [CrossRef]
- Nicholson, N.; Negrao Carvalho, R.; Štotl, I. A FAIR Perspective on Data Quality Frameworks. Data (Basel) 2025, 10 (9), 136. [CrossRef]
- Lamprecht, A.-L.; Garcia, L.; Kuzak, M.; Martinez, C.; Arcila, R.; Martin Del Pico, E.; Dominguez Del Angel, V.; van de Sandt, S.; Ison, J.; Martinez, P. A.; McQuilton, P.; Valencia, A.; Harrow, J.; Psomopoulos, F.; Gelpi, J. Ll.; Chue Hong, N.; Goble, C.; Capella-Gutierrez, S. Towards FAIR Principles for Research Software. Data Science 2020, 3 (1), 37–59. [CrossRef]
- Lopes, C. S.; Silveira, D. S. da; Araujo, J. Business Processes Fragments to Promote Information Quality. International Journal of Quality & Reliability Management 2021, 38 (9), 1880–1901. [CrossRef]
- Oliychenko, I.; Ditkovska, M. Improving Information Quality in E-Government of Ukraine. Electronic Government, an International Journal 2023, 19 (2), 146. [CrossRef]
- Jianfeng, X.; Jun, T.; Xuefeng, M.; Bin, X.; Yanli, S.; Yongjie, Q. Objective Information Theory: A Sextuple Model and 9 Kinds of Metrics. 2014.
- Lian, H.; He, T.; Qin, Z.; Li, H.; Liu, J. Research on the Information Quality Measurement of Judicial Documen. In 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C; IEEE, Ed.; IEEE: Lisbon, 2018; pp 177–181.
- Chue Hong, N. P. .; Aragon, S.; Hettrick, S.; Jay, C. The Future of Research Software Is the Future of Research. Patterns 2025, 6 (7), 101322. [CrossRef]
- Foote, K. The Impact of Poor Data Quality (and How to Fix It) https://www.dataversity.net/the-impact-of-poor-data-quality-and-how-to-fix-it/ (accessed 2024 -08 -26).
- Henderson, D.; Earley, S.; Sykora, E.; Smith, E. Data Quality. In DAMA -DMBOOK Data Management Body of Knowledge; Henderson, D., Earley, S., Sykora, E., Smith, E., Eds.; DAMA International: Basking Ridge, NJ, 2017; pp 551–611.
- Schäffer, T.; Beckmann, H. Trendstudie Stammdatenqualität 2013: Erhebung Der Aktuellen Situation Zur Stammdatenqualität in Unternehmen Und Daraus Abgeleitete Trends [Trend StudyMaster Data Quality 2013: Inquiry of the Current Situation of Master Data Quality in Companies and Derived Trends].; Steinbeis-Edition: Stuttgart, 2014.
- Alshawi, S.; Missi, F.; Irani, Z. Organisational, Technical and Data Quality Factors in CRM Adoption — SMEs Perspective. Industrial Marketing Management 2011, 40 (3), 376–383. [CrossRef]
- Fisher, C. W.; Lauria, E. J. M.; Matheus, C. C. An Accuracy Metric. Journal of Data and Information Quality 2009, 1 (3), 1–21. [CrossRef]
- Yao, X.; Zeng, W.; Zhu, L.; Wu, X.; Li, D. A Novel Proposal for Improving Economic Decision-Making Through Stock Price Index Forecasting. International Journal of Advanced Computer Science and Applications 2024, 15 (4). [CrossRef]
- Kelka, H. Supply Chain Resilience Navigating Disruptions Through Strategic Inventory Management, Metropolia University of Applied Sciences, Helsinki, 2024.
- Al-Harrasi, A. S.; Adarbah, H. Y.; Al-Badi, A. H.; Shaikh, A. K.; Al-Shihi, H.; Al-Barrak, ِa. Exploring the Adoption of Big Data Analytics in the Oil and Gas Industry: A Case Study. Journal of Business, Communication & Technology 2024, 1–16.
- Joseph, M.; Kumar, D. P.; Keerthana, J. K. Stock Market Analysis and Portfolio Management. In 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO); IEEE, 2024; pp 1–6. [CrossRef]
- Purohit, P.; Al Nuaimi, F.; Nakkolakkal, S. Data Governance, Privacy, Data Sharing Challenges. In Day 2 Wed, May 08, 2024; SPE, 2024. [CrossRef]
- UTradeAlgos. The Importance of Real-Time Data in Algo Trading Software https://utradealgos.com/blog/the-importance-of-real-time-data-in-algo-trading-software/ (accessed 2024 -08 -27).
- Bernard Owusu Antwi; Beatrice Oyinkansola Adelakun; Augustine Obinna Eziefule. Transforming Financial Reporting with AI: Enhancing Accuracy and Timeliness. International Journal of Advanced Economics 2024, 6 (6), 205–223. [CrossRef]
- Chioma Susan Nwaimo; Ayodeji Enoch Adegbola; Mayokun Daniel Adegbola; Kudirat Bukola Adeusi. Evaluating the Role of Big Data Analytics in Enhancing Accuracy and Efficiency in Accounting: A Critical Review. Finance & Accounting Research Journal 2024, 6 (6), 877–892. [CrossRef]
- Judijanto, L.; Edtiyarsih, D. D. The Effect of Company Policy, Legal Compliance, and Information Technology on Audit Report Accuracy in the Textile Industry in Tangerang. West Science Accounting and Finance 2024, 2 (02), 287–298. [CrossRef]
- Ehsani-Moghaddam, B.; Martin, K.; Queenan, J. A. Data Quality in Healthcare: A Report of Practical Experience with the Canadian Primary Care Sentinel Surveillance Network Data. Health Information Management Journal 2021, 50 (1–2), 88–92. [CrossRef]
- Lorence, D. Measuring Disparities in Information Capture Timeliness Across Healthcare Settings: Effects on Data Quality. J Med Syst 2003, 27 (5), 425–433. [CrossRef]
- Mashoufi, M.; Ayatollahi, H.; Khorasani-Zavareh, D.; Talebi Azad Boni, T. Data Quality in Health Care: Main Concepts and Assessment Methodologies. Methods Inf Med 2023, 62 (01/02), 005–018. [CrossRef]
- WAGER, K. A.; SCHAFFNER, M. J.; FOULOIS, B.; SWANSON KAZLEY, A.; PARKER, C.; WALO, H. Comparison of the Quality and Timeliness of Vital Signs Data Using Three Different Data-Entry Devices. CIN: Computers, Informatics, Nursing 2010, 28 (4), 205–212. [CrossRef]
- Alzghoul, A.; Khaddam, A. A.; Abousweilem, F.; Irtaimeh, H. J.; Alshaar, Q. How Business Intelligence Capability Impacts Decision-Making Speed, Comprehensiveness, and Firm Performance. Information Development 2024, 40 (2), 220–233. [CrossRef]
- Kusumawardhani, F. K.; Ratmono, D.; Wibowo, S. T.; Darsono, D.; Widyatmoko, S.; Rokhman, N. The Impact of Digitalization in Accounting Systems on Information Quality, Cost Reduction and Decision Making: Evidence from SMEs. International Journal of Data and Network Science 2024, 8 (2), 1111–1116. [CrossRef]
- GOV.UK. Hidden costs of poor data quality Tackling data quality saves money and reduces risk.
- Sattler, K.-U. Data Quality Dimensions. In Encyclopedia of Database Systems; Springer New York: New York, NY, 2016; pp 1–5. [CrossRef]
- Black, A.; van Nederpelt, P. Dimensions of Data Quality (DDQ) Research Paper; Herveld, Netherlands, 2020.
- Chen, B. What is Data Relevance? Definition, Examples, and Best Practices https://www.metaplane.dev/blog/data-relevance-definition-examples (accessed 2024 -08 -27).
- IBM. What is data quality?
- Enterprise Big Data Framework. Understanding Data Quality: Ensuring Accuracy, Reliability, and Consistency https://www.bigdataframework.org/knowledge/understanding-data-quality/ (accessed 2024 -08 -27).
- Ibrahim, A.; Mohamed, I.; Satar, N. S. M.; Hasan, M. K. Master Data Quality Management Framework: Content Validity. Scalable Computing: Practice and Experience 2024, 25 (3), 2001–2012. [CrossRef]
- Houston, M. B. Assessing the Validity of Secondary Data Proxies for Marketing Constructs. J Bus Res 2004, 57 (2), 154–161. [CrossRef]
- Piedmont, R. L. Construct Validity. In Encyclopedia of Quality of Life and Well-Being Research; Springer International Publishing: Cham, 2023; pp 1332–1332. [CrossRef]
- Van Iddekinge, C. H.; Ployhart, R. E. Developments in the Criterion-related Validation of Selection Procedures: A Critical Review and Recommendations for Practice. Pers Psychol 2008, 61 (4), 871–925. [CrossRef]
- Collibra. The 6 data quality dimensions with examples www.collibra.com/us/en/blog/the-6-dimensions-of-data-quality (accessed 2024 -08 -27).
- Okembo, C.; Morales, J.; Lemmen, C.; Zevenbergen, J.; Kuria, D. A Land Administration Data Exchange and Interoperability Framework for Kenya and Its Significance to the Sustainable Development Goals. Land (Basel) 2024, 13 (4), 435. [CrossRef]
- Ellul, C.; Reynolds, P.; Vilardo, L. (M)App My Data! Developing a Map-Ability Rating and App to Rapidly Communicate Data Quality and Interoperability Potential of Open Data. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2024, X-4/W4-2024, 49–56. [CrossRef]
- Mohammed, S.; Harmouch, H.; Naumann, F.; Srivastava, D. Data Quality Assessment: Challenges and Opportunities. 2024.
- Terzi, S.; Stamelos, I. Architectural Solutions for Improving Transparency, Data Quality, and Security in EHealth Systems by Designing and Adding Blockchain Modules, While Maintaining Interoperability: The EHDSI Network Case. Health Technol (Berl) 2024, 14 (3), 451–462. [CrossRef]
- Bammidi, T.; Gutta, L.; Kotagiri, A.; Samayamantri, L.; Vaddy, R. The Crucial Role of Data Quality in Automated Decision-Making Systems. International Journal of Management Education for Sustainable Development 2024, 7 (7), 1–22.
- Yandrapalli, V. AI-Powered Data Governance: A Cutting-Edge Method for Ensuring Data Quality for Machine Learning Applications. In 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE); IEEE, 2024; pp 1–6. [CrossRef]
- Brahimi, S.; Elhussein, M. Measuring the Effect of Fraud on Data-Quality Dimensions. Data (Basel) 2023, 8 (8), 124. [CrossRef]
- Gartner INC. Data Quality: Why It Matters and How to Achieve It https://www.gartner.com/en/data-analytics/topics/data-quality (accessed 2025 -08 -20).
- Ambler, S. W. Data Quality: The Impact of Poor Data Quality https://agiledata.org/essays/impact-of-poor-data-quality.html (accessed 2025 -08 -20).
- Gregory, D. Data quality issues (causes & consequences) | Ataccama https://www.ataccama.com/blog/data-quality-issues-causes-consequences (accessed 2025 -08 -20).
- MacIntyre, L. The Cost of Incomplete Data: Businesses Lose $3 Trillion Annually - Enricher.io https://enricher.io/blog/the-cost-of-incomplete-data (accessed 2025 -08 -20).
- Delpha. The Impacts of Bad Data Quality; Paris, 2021.
- Forrest, S. Study examines accuracy of arrest data in FBI’s NIBRS crime database https://phys.org/news/2022-02-accuracy-fbi-nibrs-crime-database.html (accessed 2024 -08 -15).
- Gates, S. 5 Examples of Bad Data Quality in Business — And How to Avoid Them https://www.montecarlodata.com/blog-bad-data-quality-examples/ (accessed 2024 -08 -26).
- Nagle, T.; Redman, T. C.; Sammon, D. Only 3% of Companies’ Data Meets Basic Quality Standards. Harv Bus Rev 2017.
- Elahi, E. The impact of poor data quality: Risks, challenges, and solution https://dataladder.com/the-impact-of-poor-data-quality-risks-challenges-and-solutions/ (accessed 2024 -08 -26).
- MacDonald, L. Measuring Data Quality: Key Metrics, Processes, and Best Practices https://www.montecarlodata.com/blog-measuring-data-quality-key-metrics-processes-and-best-practices/ (accessed 2024 -08 -27).
- Mahendra, P.; Doshi, P.; Verma, A.; Shrivastava, S. A Comprehensive Review of AI and ML in Data Governance and Data Quality. In 2025 3rd International Conference on Inventive Computing and Informatics (ICICI); IEEE, 2025; pp 01–06. [CrossRef]
- Davidson, N. The cost of poor data quality on business operations https://lakefs.io/blog/poor-data-quality-business-costs/#:~:text=What%20is%20the%20business%20cost,lead%20to%20inaccurate%20decision%2Dmaking. (accessed 2024 -08 -26).
- Biemer, P. P. Data Quality and Inference Errors. In Big Data and Social Science Data Science Methods and Tools for Research and Practice; Foster, I., Ghani, R., Jarmin, R., Kreuter, F., Lane, J., Eds.; CRC: Boca Raton, Florida, 2020.
- Butler, D. When Google Got Flu Wrong. Nature 2013, 494 (7436), 155–156. [CrossRef]
- Lazer, D.; Kennedy, R.; King, G.; Vespignani, A. The Parable of Google Flu: Traps in Big Data Analysis. Science (1979) 2014, 343 (6176), 1203–1205. [CrossRef]
- Yang, Y. Applications and Challenges of Big Data in Market Analytics. Transactions on Economics, Business and Management Research 2024, 9, 450–458. [CrossRef]
- Verma, P.; Kumar, V.; Mittal, A.; Rathore, B.; Jha, A.; Rahman, M. S. The Role of 3S in Big Data Quality: A Perspective on Operational Performance Indicators Using an Integrated Approach. The TQM Journal 2023, 35 (1), 153–182. [CrossRef]
- Marzullo, A.; Savevski, V.; Menini, M.; Schilirò, A.; Franchellucci, G.; Dal Buono, A.; Bezzio, C.; Gabbiadini, R.; Hassan, C.; Repici, A.; Armuzzi, A. Collecting and Analyzing IBD Clinical Data for Machine-Learning: Insights from an Italian Cohort. Data (Basel) 2025, 10 (7), 100. [CrossRef]
- Tlouyamma, J.; Mokwena, S. Automated Data Quality Control System in Health and Demographic Surveillance System. Science, Engineering and Technology 2024, 4 (2), 82–91. [CrossRef]
- Pykes, K. 10 Signs of Bad Data: How to Spot Poor Quality Data https://www.datacamp.com/blog/10-signs-bad-data-quality (accessed 2024 -08 -26).
- Fu, A.; Shen, T.; Roberts, S. B.; Liu, W.; Vaidyanathan, S.; Marchena-Romero, K.-J.; Lam, Y. Y. P.; Shah, K.; Mak, D. Y. F.; Chin, S.; Stern, S. J.; Koppula, R.; Joyce, L. F.; Pellegrino, N.; Harris, N.; Ng, V.; Srivastava, S.; Manikan, N.; Wilkinson, A.; Gastmeier, J.; Kwan, J. C.; Byaruhanga, H.; Shaji, L.; George, S.; Handsor, S.; Roy, R. A.; Kim, C. S.; Mequanint, S.; Razak, F.; Verma, A. A. Optimizing the Efficiency and Effectiveness of Data Quality Assurance in a Multicenter Clinical Dataset. Journal of the American Medical Informatics Association 2025, 32 (5), 835–844. [CrossRef]
- Haverila, M. J.; Haverila, K. C. The Influence of Quality of Big Data Marketing Analytics on Marketing Capabilities: The Impact of Perceived Market Performance! Marketing Intelligence & Planning 2024. [CrossRef]
- Lee, D.-H.; Kim, H. A Self-Attention-Based Imputation Technique for Enhancing Tabular Data Quality. Data (Basel) 2023, 8 (6), 102. [CrossRef]
- Becerra, M. A.; Tobón, C.; Castro-Ospina, A. E.; Peluffo-Ordóñez, D. H. Information Quality Assessment for Data Fusion Systems. Data (Basel) 2021, 6 (6), 60. [CrossRef]
- Sluzki, N. 8 Data Quality Monitoring Techniques & Metrics to Watch https://www.ibm.com/think/topics/data-quality-monitoring-techniques (accessed 2024 -08 -27).
- Karkošková, S. Data Governance Model To Enhance Data Quality In Financial Institutions. Information Systems Management 2023, 40 (1), 90–110. [CrossRef]
- Woods, C.; Selway, M.; Bikaun, T.; Stumptner, M.; Hodkiewicz, M. An Ontology for Maintenance Activities and Its Application to Data Quality. Semant Web 2024, 15 (2), 319–352. [CrossRef]
- Stepanenko, R. Data Stewardship Explained: The Backbone of Data Management https://recordlinker.com/data-stewardship-explained/ (accessed 2025 -09 -14).
- Jatin, B. Data Governance for Quality: Policies Ensuring Reliable Data https://www.decube.io/post/data-quality-data-governance (accessed 2024 -08 -27).
- Khatri, V.; Brown, C. V. Designing Data Governance. Commun ACM 2010, 53 (1), 148–152. [CrossRef]
- Duggireddy, G. B. R. Integrated Data and AI Governance Framework: A Lifecycle Approach to Responsible AI Implementation. Journal of Computer Science and Technology Studies 2025, 7 (7), 771–777.
- Papagiannidis, E.; Mikalef, P.; Conboy, K. Responsible Artificial Intelligence Governance: A Review and Research Framework. The Journal of Strategic Information Systems 2025, 34 (2), 101885. [CrossRef]
- Floridi, L.; Taddeo, M. What Is Data Ethics? Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2016, 374 (2083), 20160360. [CrossRef]
- Pahune, S.; Akhtar, Z.; Mandapati, V.; Siddique, K. The Importance of AI Data Governance in Large Language Models. Big Data and Cognitive Computing 2025, 9 (6), 147. [CrossRef]
- Labarrère, N.; Costa, L.; Lima, R. M. Data Science Project Barriers—A Systematic Review. Data (Basel) 2025, 10 (8), 132. [CrossRef]
- Illinois Criminal Justice Information Authority. Annual Audit Report for 1982-1983: Data Quality of Computerized Criminal Histories. ; National Criminal Justice Reference Service (NCJRS).: Springfield, 1983.
- Bosse, R. C.; Jino, M.; de Franco Rosa, F. A Study on Data Quality and Analysis in Business Intelligence; 2024; pp 249–253. [CrossRef]
- Sienkiewicz, M. From Data Silos to Data Mesh: A Case Study in Financial Data Architecture; 2026; pp 3–20. [CrossRef]
- Senguttuvan, KR. Multi-Agent Based Automated Data Quality Engineering, Fordham University, New York, 2025.
- Stamkou, C.; Saprikis, V.; Fragulis, G. F.; Antoniadis, I. User Experience and Perceptions of AI-Generated E-Commerce Content: A Survey-Based Evaluation of Functionality, Aesthetics, and Security. Data (Basel) 2025, 10 (6), 89. [CrossRef]
- Vanam, R. R.; Pingili, R.; Myadaboyina, S. G. AI-Based Data Quality Assurance for Business Intelligence and Decision Support Systems. International Journal of Emerging Trends in Computer Science and Information Technology 2025, 6, 21–29. [CrossRef]
- Elouataoui, W.; El Mendili, S.; Gahi, Y. An Automated Big Data Quality Anomaly Correction Framework Using Predictive Analysis. Data (Basel) 2023, 8 (12), 182. [CrossRef]
- Tamm, H. C.; Nikiforova, A. From Data Quality for AI to AI for Data Quality: A Systematic Review of Tools for AI-Augmented Data Quality Management in Data Warehouses. 2025.
- Stoudt, S.; Jernite, Y.; Marshall, B.; Marwick, B.; Sharan, M.; Whitaker, K.; Danchev, V. Ten Simple Rules for Building and Maintaining a Responsible Data Science Workflow. PLoS Comput Biol 2024, 20 (7), e1012232. [CrossRef]
- Mons, B.; Schultes, E.; Liu, F.; Jacobsen, A. The FAIR Principles: First Generation Implementation Choices and Challenges. Data Intell 2020, 2 (1–2), 1–9. [CrossRef]
- Stvilia, B.; Pang, Y.; Lee, D. J.; Gunaydin, F. Data Quality Assurance Practices in Research Data Repositories—A Systematic Literature Review. An Annual Review of Information Science and Technology (ARIST) Paper. J Assoc Inf Sci Technol 2025, 76 (1), 238–261. [CrossRef]
- Korbmacher, M.; Azevedo, F.; Pennington, C. R.; Hartmann, H.; Pownall, M.; Schmidt, K.; Elsherif, M.; Breznau, N.; Robertson, O.; Kalandadze, T.; Yu, S.; Baker, B. J.; O’Mahony, A.; Olsnes, J. Ø.-S.; Shaw, J. J.; Gjoneska, B.; Yamada, Y.; Röer, J. P.; Murphy, J.; Alzahawi, S.; Grinschgl, S.; Oliveira, C. M.; Wingen, T.; Yeung, S. K.; Liu, M.; König, L. M.; Albayrak-Aydemir, N.; Lecuona, O.; Micheli, L.; Evans, T. The Replication Crisis Has Led to Positive Structural, Procedural, and Community Changes. Communications Psychology 2023, 1 (1), 3. [CrossRef]
- Dudda, L.; Kormann, E.; Kozula, M.; DeVito, N. J.; Klebel, T.; Dewi, A. P. M.; Spijker, R.; Stegeman, I.; Van den Eynden, V.; Ross-Hellauer, T.; Leeflang, M. M. G. Open Science Interventions to Improve Reproducibility and Replicability of Research: A Scoping Review. R Soc Open Sci 2025, 12 (4). [CrossRef]
- MacMaster, S.; Sinistore, J. Testing the Use of a Large Language Model (LLM) for Performing Data Quality Assessment. Int J Life Cycle Assess 2024. [CrossRef]
- Batool, A.; Zowghi, D.; Bano, M. AI Governance: A Systematic Literature Review. AI and Ethics 2025, 5 (3), 3265–3279. [CrossRef]
- Gebru, T.; Morgenstern, J.; Vecchione, B.; Vaughan, J. W.; Wallach, H.; III, H. D.; Crawford, K. Datasheets for Datasets. Commun ACM 2021, 64 (12), 86–92. [CrossRef]
- Leslie, D. Understanding Artificial Intelligence Ethics and Safety: A Guide for the Responsible Design and Implementation of AI Systems in the Public Sector. ; London, 2019. [CrossRef]
- Gautam, A. R. Impact of High Data Quality on LLM Hallucinations. Int J Comput Appl 2025, 187 (4), 35–39. [CrossRef]
- Abhishek, A.; Erickson, L.; Bandopadhyay, T. Data and AI Governance: Promoting Equity, Ethics, and Fairness in Large Language Models. 2025. [CrossRef]
- WHO. Overview of the Data Quality Review (DQR) Frameworkand Methodology; Geneva, 2020.
- Lighterness, A.; Adcock, M.; Scanlon, L. A.; Price, G. Data Quality–Driven Improvement in Health Care: Systematic Literature Review. J Med Internet Res 2024, 26, e57615. [CrossRef]
- Tolera, A.; Firdisa, D.; Roba, H. S.; Motuma, A.; Kitesa, M.; Abaerei, A. A. Barriers to Healthcare Data Quality and Recommendations in Public Health Facilities in Dire Dawa City Administration, Eastern Ethiopia: A Qualitative Study. Front Digit Health 2024, 6. [CrossRef]
- Tawil, A.-R.; Mohamed, M.; Schmoor, X.; Vlachos, K.; Haidar, D. Trends and Challenges Towards an Effective Data-Driven Decision Making in UK SMEs: Case Studies and Lessons Learnt from the Analysis of 85 SMEs. 2023.
- Patra, P.; Di Pompeo, D.; Di Marco, A. An Evaluation Framework for the FAIR Assessment Tools in Open Science. 2025.


| Dimension | Definition | Practical Example | Measurement Approach |
|---|---|---|---|
| Accuracy | The degree to which data correctly describes the real-world object or event. | Patient’s recorded blood pressure matches the actual measurement. | Comparison against an authoritative source or ground truth. |
| Completeness | The extent to which all required data is present. | The customer database contains contact details for all clients. | Ratio of available values to required values; percentage of missing fields. |
| Consistency | Absence of contradictions within and across datasets. | A patient’s birthdate is consistent across both electronic health records and insurance records. | Cross-field and cross-database validation checks. |
| Timeliness | The degree to which data is up to date and available when needed. | Stock market prices updated in real time. | Lag time between data generation and availability for use. |
| Validity | Degree to which data conforms to defined formats, rules, or ranges. | Postal codes follow the official national standard. | Validation rules, format checks, and range constraints. |
| Relevance | Appropriateness of data for the intended use. | Including clinical trial data when evaluating a new treatment. | Expert judgment; alignment with analytical or decision-making needs. |
| Uniqueness | The degree to which data is free of duplicate records. | Each patient has a single unique medical record number. | Duplicate detection and record linkage algorithms. |
| Consequence | Definition | Practical Example | Impact / Cost |
|---|---|---|---|
| Faulty decision-making | Wrong or suboptimal choices based on inaccurate data. | A hospital prescribes inappropriate treatment due to errors in lab data. | Patient harm, liability risks, loss of trust. |
| Financial losses | Direct or indirect costs from incorrect, incomplete, or duplicated data. | A bank suffers multimillion-dollar losses due to flawed credit risk models. | Wasted resources, loss of revenue. |
| Operational inefficiencies | Processes slowed or disrupted due to unreliable information. | Logistics companies misroute deliveries due to inaccurate addresses. | Increased workload, delays, and higher costs. |
| Reputational damage | Erosion of trust from stakeholders, customers, or the public. | Data breaches and reporting errors damage a company’s brand. | Customer attrition, lower market share. |
| Regulatory and legal risks | Non-compliance with laws and standards due to poor data. | A pharmaceutical firm fails an audit due to inconsistent records. | Fines, sanctions, reputational harm. |
| Missed opportunities | Failure to identify insights or innovations. | Retailer loses potential sales due to incomplete CRM data. | Reduced competitiveness, slower growth. |
| Misleading analytics | Models or reports based on flawed inputs lead to invalid results. | Overestimation of flu outbreaks by Google Flu Trends. | Misallocation of resources leads to a loss of credibility. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).