Submitted:
30 October 2024
Posted:
31 October 2024
You are already at the latest version
Abstract
Keywords:
I. Introduction
II. Literature Review
III. Big Data And Its Dimensions
IV. Topic Modelling
| Step | Description |
|---|---|
| 1 | Identify individual words as features in NLP. |
| 2 | Utilize Latent Dirichlet Allocation (LDA) method to establish connections between documents. |
| 3 | Employ Variational Expectation Maximization (VEM) algorithm to estimate similarities within the corpus. |
| 4 | Utilize Bag of Words (BoW) to extract initial few words for LDA model. |
| 5 | Represent each document in the corpus as a distribution of topics. |
| 6 | Extract topics representing distributions of words. |
| 7 | Determine probabilistic distribution of topics for each document through LDA. |
| 8 | Obtain comprehensive understanding of relationships between topics. |
V. Methodology
A. Data Collection

B. Data Processing
C. Data Analysis
D. Topic Labeling
E. Comparison with Traditional Literature Review Methods
VI. Result
A. Research Area
| No. | Key Terms | Topic Label |
|---|---|---|
| 1. | data, research, study, quality, big, performance, record, firm, process, impact | Data-Driven Decision and Performance Analysis |
| 2. | data, quality, big, method, based, model, proposed, evaluation, process, result | Big Data Evaluation Methods |
| 3. | data, system, analytics, quality, analysis, social, research, medium, based, line | Social Media Data Analytics and Quality Research |
| 4. | data, quality, big, algorithm, database, result, model, performance, processing, process | Big Data Processing and Algorithm Performance |
| 5. | chain, supply, data, performance, big, analytics, model, management, organizational, operational | Big Data Analytics in Supply Chain Management |
| 6. | data, medical, detection, leisure, website, phishing, image, record, big data analytics, agriculture | Big Data Analytics in Medical |
| 7. | data, quality, information, study, model, method, pollution, present, analysis, result | Data Quality Analysis in Pollution Studies |
| 8. | data, decision, big, research, quality, marketing, big data analytics, sensing, technology, analytics | Big Data Analytics for Decision-Making and Marketing |
| 9. | data, health, quality, method, study, big, clinical, research, patient, medical | Big Data in Clinical Research |
| 10. | data, big, quality, study, analysis, research, collection, use, network, using | Network Analysis by Data Quality |
| 11. | data, quality, big, assessment, measure, improvement, label, supplier, decision, using | Data Quality Measures for Supplier Assessment |
| 12. | data, quality, research, management, survey, big, framework, machine, learning, source | Data Quality Management with Machine Learning |
| 13. | adoption, firm, data, big data analytics, study, management, business, theory, perceived, analytics | Business Analytics and Big Data |
| 14. | data, science, big, quality, information, analysis, research, challenge, system, method | Challenges in Data Quality Analysis |
| 15. | data, learning, deep, method, quality, big, analytics, model, machine, approach | Deep Learning Methods for Data Analytics |
| 16. | data, specie, method, area, research, sampling, database, urban, trend, google | Data Sampling Methods |
B. Research Trends
VII. Future Research Directions
| No. | Future Avenues |
|---|---|
| 1. | How can firms use big data to enhance performance and decision-making, and what key factors influence the effectiveness of these data-driven strategies? |
| 2. | What are the most effective methods for evaluating big data quality, and how can these methods be standardized across different industries? |
| 3. | How can social media data analytics ensure high data quality, and what methods can address the challenges of analyzing user-generated content and social trends? |
| 4. | What are the best ways to optimize algorithms in big data processing, and how does data quality affect their results? |
| 5. | How can big data help supply chain management, what challenges and opportunities are there in keeping data quality high? |
| 6. | How does big data analytics help medical detection, diagnosis, how can we keep data quality high in various medical applications? |
| 7. | What are best models, methods for analyzing pollution data accurately? |
| 8. | How can we make marketing analytics data better, and why does good data help make better decisions and marketing plans? |
| 9. | How does good data quality affect clinical research, and how can big data analytics help improve patient care and medical research? |
| 10. | How can we use network analysis to make big datasets better, and what's the best way to collect and use network data in research? |
| 11. | How can we measure data quality when evaluating suppliers, and how does this help in managing them better? |
| 12. | How can we use ML to handle, guarantee data quality in research and surveys, and what are the main challenges in doing so? |
| 13. | What factors affect how much businesses use big data analytics, and how does this affect their performance and decisions? |
| 14. | What are the main challenges in ensuring data quality in big data analysis, and what methods and systems can be developed to address these challenges effectively? |
| 15. | How can deep learning methods be optimized for data analytics, and what are the critical factors affecting the quality and performance of deep learning models? |
| 16. | What are the most effective data sampling methods for ecological research, and how can these methods be applied to urban and species studies to ensure accurate trend analysis? |
VIII. Discussions
IX. Comparison of Traditional Literature Review Methods with LDA
A. Depth vs. Breadth
B. Fact-based vs. Opinion-based
C. Theoretical vs. Empirical
X. Contributions to Literature and Practitioners
XI. Limitations and Future Research
XII. Conclusion
References
- D. Reinsel, J. Gantz, and J. Rydning, “The Digitization of the World From Edge to Core,” Seagate, Nov. 2018.
- D. Rao, V. N. Gudivada and V. V. Raghavan, "Data quality issues in big data," 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, 2015, pp. 2654-2660. [CrossRef]
- N. Kshetri, M. M. Rahman, S. A. Sayeed and I. Sultana, "cryptoRAN: A Review on Cryptojacking and Ransomware Attacks W.R.T. Banking Industry - Threats, Challenges, & Problems," 2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, 2024, pp. 523-528. [CrossRef]
- L. Cai and Y. Zhu, “The challenges of data quality and data quality assessment in the big data era,” Data Science Journal, vol. 14, no. 0, p. 2, May 2015. [CrossRef]
- S. Marcos-Pablos and F. J. García-Peñalvo, “Information retrieval methodology for aiding scientific database search,” Soft Computing, vol. 24, no. 8, pp. 5551–5560, Oct. 2018. [CrossRef]
- B. Kitchenham, O. P. Brereton, D. Budgen, M. Turner, J. Bailey, and S. Linkman, “Systematic literature reviews in software engineering – A systematic literature review,” Information and Software Technology, vol. 51, no. 1, pp. 7–15, Nov. 2008. [CrossRef]
- R. R. Xiong, C. Z. Liu, and K.-K. R. Choo, “Synthesizing Knowledge through A Data Analytics-Based Systematic Literature Review Protocol,” Information Systems Frontiers, Oct. 2023. [CrossRef]
- C. Zou and D. Hou, "LDA Analyzer: A Tool for Exploring Topic Models," 2014 IEEE International Conference on Software Maintenance and Evolution, Victoria, BC, Canada, 2014, pp. 593-596. [CrossRef]
- W. Elouataoui, I. E. Alaoui, and Y. Gahi, “Data Quality in the Era of Big Data: A Global Review,” Studies in Computational Intelligence, pp. 1–25, Jan. 2022. [CrossRef]
- F. Ridzuan, W. M. N. W. Zainon, and M. Zairul, “A Thematic Review on Data Quality Challenges and Dimension in the Era of Big Data,” Lecture Notes in Electrical Engineering, pp. 725–737, Sep. 2021. [CrossRef]
- M. T. Ijab, E. S. M. Surin, and N. M. Nayan, “CONCEPTUALIZING BIG DATA QUALITY FRAMEWORK FROM A SYSTEMATIC LITERATURE REVIEW PERSPECTIVE,” Malaysian Journal of Computer Science, pp. 25–37, Nov. 2019. [CrossRef]
- M. Mirzaie, B. Behkamal, and S. Paydar, “Big Data Quality: A systematic literature review and future research directions,” arXiv (Cornell University), Jan. 2019. [CrossRef]
- J. Liu, J. Li, W. Li, and J. Wu, “Rethinking big data: A review on the data quality and usage issues,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 115, pp. 134–142, Dec. 2015. [CrossRef]
- S. Ji, Q. Li, W. Cao, P. Zhang, and H. Muccini, “Quality Assurance Technologies of Big Data Applications: A Systematic Literature Review,” Applied Sciences, vol. 10, no. 22, p. 8052, Nov. 2020. [CrossRef]
- A. Ramasamy and S. Chowdhury, “Big Data Quality Dimensions: A Systematic Literature Review,” Journal of Information Systems and Technology Management, May 2020. [CrossRef]
- N. Kshetri, R. Mishra, M. M. Rahman and T. Steigner, "HNMblock: Blockchain Technology Powered Healthcare Network Model for Epidemiological Monitoring, Medical Systems Security, and Wellness," 2024 12th International Symposium on Digital Forensics and Security (ISDFS), San Antonio, TX, USA, 2024, pp. 01-08. [CrossRef]
- M. M. Rahman, N. Kshetri, S. A. Sayeed, and M. M. Rana, “AssessITS: Integrating procedural guidelines and practical evaluation metrics for organizational IT and Cybersecurity risk assessment,” arXiv.org, Oct. 02, 2024. https://arxiv.org/abs/2410.01750.
- S. Sarsfield: The butterfly effect of data quality, 5th MIT IQIS, 2011.
- IBM, What is data quality? https://www.ibm.com/topics/data-quality.
- “ISO 25012.” https://iso25000.com/index.php/en/iso-25000-standards/iso-25012, 2019.
- Taleb, H. T. E. Kassabi, M. A. Serhani, R. Dssouli and C. Bouhaddioui, "Big Data Quality: A Quality Dimensions Evaluation," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 759-765. [CrossRef]
- C. Sharma, I. Batra, S. Sharma, A. Malik, A. S. M. S. Hosen and I. -H. Ra, "Predicting Trends and Research Patterns of Smart Cities: A Semi-Automatic Review Using Latent Dirichlet Allocation (LDA)," in IEEE Access, vol. 10, pp. 121080-121095, 2022. [CrossRef]
- R. Pranckutė, “Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World,” Publications, vol. 9, no. 1, p. 12, Mar. 2021. [CrossRef]
- H. Jelodar et al., “Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey,” Multimedia Tools and Applications, Nov. 2018. [CrossRef]
- J. Gan and Y. Qi, “Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example,” Entropy, vol. 23, no. 10, p. 1301, Oct. 2021. [CrossRef]
- S. J. Weston, I. Shryock, R. Light, and P. A. Fisher, “Selecting the Number and Labels of Topics in Topic Modeling: A Tutorial,” Advances in Methods and Practices in Psychological Science, vol. 6, no. 2, p. 251524592311601, Apr. 2023. [CrossRef]
- M. W. Soykoth, W. Sim, and S. Frederick, “Research trends in market intelligence: a review through a data-driven quantitative approach,” Journal of Marketing Analytics, Feb. 2024. [CrossRef]
- N. Kumar Kar, S. Jana, A. Rahman, P. Rahul Ashokrao, I. G and R. Alarmelu Mangai, "Automated Intracranial Hemorrhage Detection Using Deep Learning in Medical Image Analysis," 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India, 2024, pp. 1-6. [CrossRef]
| Volume | Refers to the vast amount of data being generated and collected. |
| Velocity | Represents the speed at which data is being produced and processed. |
| Variety | Indicates the diverse types of data, including structured, unstructured, and semi-structured data. |
| Veracity | Refers to the reliability and accuracy of the data. |
| Value | Represents the potential insights and benefits that can be derived from analyzing the data. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).