Submitted:
30 July 2025
Posted:
30 July 2025
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
- Explain how synthetic data is generated and used in healthcare.
- Explore the utility and limitations of synthetic data.
- Identify the risks and challenges involved; and
- Understand the long-term impact of synthetic data on the future of AI in healthcare.
2. Methods
2.1. Search Strategy

2.2. Inclusion and Exclusion Criteria
2.3. Study Selection Process
2.4. Data Extraction
2.5. Synthesis of Results
| Study | Methodology | Application Focus | Challenges Identified | Implications |
|---|---|---|---|---|
| D'Amico et al. (2023)[13] | AI-based synthetic data generation | Precision medicine in hematology | Validation of synthetic data and integration with real datasets | Accelerates research and improves personalized treatment strategies |
| Akpinar et al. (2024)[14] | Systematic review of GAN-based techniques | Healthcare image and signal data generation | GAN stability and performance evaluation | Provides insights into potential applications and gaps in healthcare data synthesis |
| Aravinth et al. (2023)[15] | Comparative analysis of generative AI techniques | Tabular medical record data generation | Scalability and preservation of data utility | Facilitates better understanding of data generation approaches for EHRs |
| Ferreira et al. (2024)[16] | GAN-based systematic review | 3D volumetric data generation | Computational complexity and realism of generated data | Advances 3D data synthesis for medical imaging and diagnostics |
| Rashidian et al. (2020)[17] | SMOOTH-GAN architecture | Synthetic EHR data generation | Maintaining longitudinal consistency in synthetic data | Improves quality and usability of synthetic EHR datasets for research |
| Nikolentzos et al. (2023)[18] | Variational graph autoencoders | Synthetic electronic health records | Complexity in representing relational structures | Enhances data synthesis with relational and temporal context |
| Dos Santos et al. (2024)[19] | VAE and linked data paradigm | Synthetic data generation for medical research | Integration with linked datasets and scalability | Promotes interoperability and broader applications in health research |
| Lenatti et al. (2023)[20] | Rule-based AI models | Characterization of synthetic health data | Incorporating domain-specific rules effectively | Improves reliability and acceptance of synthetic datasets |
| Arora & Arora (2022)[21] | Generative adversarial networks (GANs) | Synthetic patient data generation | Ethical concerns and biases in GAN outputs | Guides to the ethical development of AI in healthcare |
| Little et al. (2023)[7] | Federated learning | Synthetic data generation for privacy-preservation | Balancing privacy and data utility | Enhance secure data sharing and collaborative research |
| Mosquera et al. (2023)[22] | Methodology for longitudinal data synthesis | Synthetic longitudinal health data | Maintaining temporal trends and relationships | Enables research requiring time-series health data |
| Sun et al. (2021)[23] | Recurrent autoencoders and GANs | Longitudinal synthetic EHR data generation | Complexity of modeling temporal dependencies | Advances the synthesis of realistic time-series health records |
| Kosolwattana et al. (2023)[24] | Self-inspected adaptive SMOTE (SASMOTE) | Imbalanced healthcare data classification | Over-sampling without overfitting minority classes | Improves model performance on rare medical conditions |
| Nicolaie et al. (2023)[25] | Synthetic population construction | Big data applications in public health | Balancing population diversity and representativeness | Supports public health simulations and policy planning |
| Kumichev et al. (2024)[26] | LLM-based synthetic text generation | Medical text generation for research | Preservation of medical context and coherence | Facilitates NLP research and healthcare applications |
| Miletic & Sariyar (2024)[27] | Benchmark study | Tabular health data generation | Performance and accuracy trade-offs | Guides to the selection of appropriate synthetic data models |
| Juwara et al. (2024)[28] | Synthetic data augmentation | Mitigation of covariate bias in health data | Overcoming model biases and variability | Improve equity in health data analyses |
| Lomotey et al. (2024)[29] | Digital twins and data trusts | Privacy in health data sharing | Balancing privacy with data usability | Facilitates secure and ethical health data sharing |
| Osorio-Marulanda et al. (2024)[30] | Systematic review | Privacy and evaluation metrics for synthetic data | Standardization of metrics and methods | Improves trust in synthetic data for sensitive domains |
| Nicholas et al. (2024)[31] | Health Gym project | Synthetic datasets in education | Engaging learners without overwhelming complexity | Enriches data science and healthcare education |
| Patil et al. (2024)[32] | Transformer-based DGA integration | Improved ML-based fault identification | Complexity in data integration and scalability | Enhance fault detection in healthcare systems using synthetic data |
| Gonzales et al. (2023)[10] | Narrative review | Synthetic data in healthcare applications | Lack of standardization and ethical considerations | Encourages unified guidelines for healthcare data synthesis |
| Giuffrà & Shung (2023)[1] | Review on synthetic data innovation | Healthcare privacy and innovation | Balancing innovation with ethical responsibilities | Guides responsible for the use of synthetic data in health technologies |
| Qian et al. (2024)[8] | Privacy-preserving clinical risk prediction | Synthetic data for clinical applications | Data fidelity and privacy trade-offs | Facilitates secure predictive modeling in clinical research |
| Burgon et al. (2024)[33] | Bias amplification evaluation framework | Bias mitigation in healthcare ML models | Challenges in systematic bias evaluation | Improves fairness and accountability in AI healthcare tools |
| Koetzier et al. (2024)[34] | Medical imaging synthetic data generation | Enhancing medical imaging datasets | Quality and utility of synthetic images | Advances imaging tools for better diagnostic accuracy |
| Rodriguez-Almeida et al. (2022)[35] | Disease prediction on small datasets | Synthetic patient data for imbalanced datasets | Balancing small sample sizes with realistic data generation | Improves disease prediction accuracy in rare conditions |
| Shanley et al. (2024)[9] | Ethics-focused review | Synthetic data ethics in healthcare | Adoption of AI ethics principles | Strengthens ethical frameworks for synthetic data usage |
| Chen et al. (2021)[36] | ML applications in synthetic data | Medicine and healthcare | Reproducibility and validation of synthetic data models | Encourages robustness in AI model development for healthcare |
| Goyal & Mahmoud (2024)[37] | Systematic review of generative AI | Synthetic data generation techniques | Scalability and generalizability | Broadens understanding of generative methods |
| Tucker et al. (2020)[38] | High-fidelity synthetic patient data | Machine learning in healthcare software testing | Achieving realism in synthetic datasets | Enhances model validation and reliability in healthcare AI |
| Hairani et al. (2024)[39] |
Review of modified SMOTE strategies | Addressing class imbalance in health data | Adapting SMOTE for healthcare-specific needs | Improves handling of imbalanced datasets |
| Bigi et al. (2024)[40] | Agent-based modeling | Synthetic population for mobility analysis | Accurate parameterization and assumptions | Supports public health and operational planning |
| Guo & Zhao (2023)[41] | Survey of deep generative models for graph generation | Graph learning and representation | Scalability, permutation invariance, evaluation standards | Guides future research on graph-based data generation |
| Iannucci et al. (2017)[42] | Benchmarking of graph-based synthetic data generators | Intrusion detection system benchmarking | Model robustness and generalizability across attacks | Informs IDS design with realistic benchmark datasets |
| Haleem et al. (2023)[43] | Deep learning for multimodal health data synthesis | Real-time multimodal health data generation | Cross-modal consistency and real-time generation | Enables richer datasets for health monitoring systems |
| Pawłowski et al. (2023)[44] | Comparative analysis of multimodal data fusion methods | Sensor fusion and integration strategies | Selecting appropriate fusion strategy for task needs | Supports development of task-specific fusion pipelines |
| Gogoshin et al. (2021)[45] | Bayesian networks for probabilistic data generation | Biological simulation and data reconstruction | High-dimensional sampling and structural accuracy | Validates BNs as interpretable simulation frameworks |
| Kaur et al. (2020)[46] | Bayesian network application to synthetic health data | Synthetic health data generation and evaluation | Preserving associations and rare events | Shows BNs outperform deep models in certain tasks |
| Hosseini & Serag (2025)[47] | Diffusion models for synthetic medical image generation | Synthetic chest X-ray generation and model pretraining | Maintaining clinical fidelity and training stability | Demonstrates strong performance without real data |
| Naseer et al. (2023)[48] | Continuous-time diffusion model using stochastic differential equations | Electronic health record synthesis | Modeling long-term temporal dependencies and clinical coherence | Advances EHR generation with high realism and improved utility for downstream tasks |
3. Results
3.1. How do Synthetic Data Generation in Healthcare Works?
3.2. Data Collection and Preprocessing
3.3. Model Training
3.4. Data Generation
3.5. Validation and Evaluation
- Fidelity: How well the synthetic data matches the statistical properties of real data.[35]
- Utility: Whether the data performs effectively in tasks like training machine learning models.[36]
- Privacy: Confirmation that no sensitive information from the original dataset can be inferred, often tested using privacy-preserving techniques like differential privacy.[30]
4. Current Synthetic Healthcare Data Generation Techniques
Generative Adversarial Networks (GANs)
5. Variational Autoencoders (VAEs)
5.1. Differential Privacy-Based Methods
5.2. Graph-Based Synthetic Data Generation: GraphGAN and NetGAN
5.3. Multimodal Synthetic Data Generation
5.4. Bayesian Networks for Synthetic Data Generation
5.5. Diffusion Models for Synthetic Data Generation
5.6. Federated Learning for Synthetic Data Generation
5.7. Recurrent Neural Networks (RNNs)
5.8. Synthetic Minority Over-sampling Technique (SMOTE)
5.9. Agent-Based Modeling (ABM)
5.10. Large Language Models for Synthetic Text Generation
6. Current Applications for Synthetic Data
6.1. AI Training and Model Development
6.2. Privacy-Preserving Data Sharing
6.3. Bias Mitigation
6.4. Education and Training
6.5. Operational Optimization
7. Challenges and Potential Risks
7.1. No Established Data Standards
7.2. Realism vs. Privacy Trade-Off
7.3. Bias Amplification
7.4. Computational Complexity
7.5. Overconfidence in Data Utility
7.6. Security Vulnerabilities
7.7. Ethical Concerns
7.8. Heterogeneity Problems and Lack of Clinical Quality
8. The Future of Healthcare AI and the Impact of Synthetic Data
8.1. Risk of Model Overfitting to Synthetic Patterns
8.2. Amplification of Bias and Inequity
8.3. Erosion of Trust in AI Systems
8.4. Challenges in Validation and Benchmarking
8.5. Ethical Concerns About Data Ownership and Consent
8.6. Security Risks and Privacy Paradox
8.7. Overreliance and Misplaced Confidence
9. Discussion
10. Limitations
11. Conclusions
Funding
Conflicts of Interest
Use of AI
References
- Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. Npj Digit Med. 2023, 6, 1–8. [Google Scholar] [CrossRef]
- Tucker A, Wang Z, Rotalinti Y, Myles P. Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. Npj Digit Med. 2020, 3, 1–13. [Google Scholar] [CrossRef]
- Rujas M, Herranz RMG del M, Fico G, Merino-Barbancho B. Synthetic Data Generation in Healthcare: A Scoping Review of reviews on domains, motivations, and future applications. Published online August 9, 2024:2024.08.09.24311338. [CrossRef]
- Gartner Identifies Top Trends Shaping the Future of Data Science and Machine Learning. Gartner. Accessed January 2, 2025. https://www.gartner.com/en/newsroom/press-releases/2023-08-01-gartner-identifies-top-trends-shaping-future-of-data-science-and-machine-learning.
- Fortune Business. Synthetic Data Generation Market | Forecast Analysis [2030]. Accessed January 2, 2025. https://www.fortunebusinessinsights.com/synthetic-data-generation-market-108433.
- Madden B. Synthetic Data in Healthcare: the Great Data Unlock. Hospitalogy. November 2, 2023. Accessed January 2, 2025. https://hospitalogy.com/articles/2023-11-02/synthetic-data-in-healthcare-great-data-unlock/.
- Little C, Elliot M, Allmendinger R. Federated learning for generating synthetic data: a scoping review. Int J Popul Data Sci. 2023, 8, 2158. [Google Scholar] [CrossRef]
- Qian Z, Callender T, Cebere B, Janes SM, Navani N, van der Schaar M. Synthetic data for privacy-preserving clinical risk prediction. Sci Rep. 2024, 14, 25676. [Google Scholar] [CrossRef]
- Shanley D, Hogenboom J, Lysen F, et al. Getting real about synthetic data ethics. EMBO Rep. 2024, 25, 2152–2155. [Google Scholar] [CrossRef] [PubMed]
- Gonzales A, Guruswamy G, Smith SR. Synthetic data in health care: A narrative review. PLOS Digit Health. 2023, 2, e0000082. [Google Scholar] [CrossRef]
- Munn Z, Peters MDJ, Stern C, Tufanaru C, McArthur A, Aromataris E. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med Res Methodol. 2018, 18, 143. [Google Scholar] [CrossRef]
- Tricco AC, Lillie E, Zarin W, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med. 2018, 169, 467–473. [Google Scholar] [CrossRef] [PubMed]
- D’Amico S, Dall’Olio D, Sala C, et al. Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology. JCO Clin Cancer Inform. 2023, 7, e2300021. [Google Scholar] [CrossRef]
- Akpinar MH, Sengur A, Salvi M, et al. Synthetic Data Generation via Generative Adversarial Networks in Healthcare: A Systematic Review of Image- and Signal-Based Studies. IEEE Open J Eng Med Biol. 2025, 6, 183–192. [Google Scholar] [CrossRef]
- Aravinth SS, Srithar S, Joseph KP, Gopala Anil Varma U, Kiran GM, Jonna V. Comparative Analysis of Generative AI Techniques for Addressing the Tabular Data Generation Problem in Medical Records. In: 2023 International Conference on Recent Advances in Science and Engineering Technology (ICRASET). ; 2023:1-5. [CrossRef]
- Ferreira A, Li J, Pomykala KL, Kleesiek J, Alves V, Egger J. GAN-based generation of realistic 3D volumetric data: A systematic review and taxonomy. Med Image Anal. 2024, 93, 103100. [Google Scholar] [CrossRef]
- Rashidian S, Wang F, Moffitt R, et al. SMOOTH-GAN: Towards Sharp and Smooth Synthetic EHR Data Generation. In: Artificial Intelligence in Medicine: 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Minneapolis, MN, USA, August 25–28, 2020, Proceedings. Springer-Verlag; 2020:37-48. [CrossRef]
- Nikolentzos G, Vazirgiannis M, Xypolopoulos C, Lingman M, Brandt EG. Synthetic electronic health records generated with variational graph autoencoders. Npj Digit Med. 2023, 6, 1–12. [Google Scholar] [CrossRef]
- Dos Santos R, Aguilar J. A synthetic data generation system based on the variational-autoencoder technique and the linked data paradigm | Progress in Artificial Intelligence. Accessed January 2, 2025.
- Lenatti M, Paglialonga A, Orani V, Ferretti M, Mongelli M. Characterization of Synthetic Health Data Using Rule-Based Artificial Intelligence Models. IEEE J Biomed Health Inform. 2023, 27, 3760–3769. [Google Scholar] [CrossRef]
- Arora A, Arora A. Generative adversarial networks and synthetic patient data: current challenges and future perspectives. Future Healthc J. 2022, 9, 190–193. [Google Scholar] [CrossRef] [PubMed]
- Mosquera L, El Emam K, Ding L, et al. A method for generating synthetic longitudinal health data. BMC Med Res Methodol. 2023, 23, 67. [Google Scholar] [CrossRef]
- Sun S, Wang F, Rashidian S, et al. Generating Longitudinal Synthetic EHR Data with Recurrent Autoencoders and Generative Adversarial Networks. In: Heterogeneous Data Management, Polystores, and Analytics for Healthcare: VLDB Workshops, Poly 2021 and DMAH 2021, Virtual Event, August 20, 2021, Revised Selected Papers. Springer-Verlag; 2021:153-165. [CrossRef]
- Kosolwattana T, Liu C, Hu R, Han S, Chen H, Lin Y. A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare. BioData Min. 2023, 16, 15. [Google Scholar] [CrossRef]
- Nicolaie MA, Füssenich K, Ameling C, Boshuizen HC. Constructing synthetic populations in the age of big data. Popul Health Metr. 2023, 21, 19. [Google Scholar] [CrossRef]
- Kumichev G, Blinov P, Kuzkina Y, et al. MedSyn: LLM-Based Synthetic Medical Text Generation Framework. In: Bifet A, Krilavičius T, Miliou I, Nowaczyk S, eds. Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. Springer Nature Switzerland; 2024:215-230. [CrossRef]
- Miletic M, Sariyar M. Large Language Models for Synthetic Tabular Health Data: A Benchmark Study. In: Digital Health and Informatics Innovations for Sustainable Health Care Systems. IOS Press; 2024:963-967. [CrossRef]
- Juwara L, El-Hussuna A, El Emam K. An evaluation of synthetic data augmentation for mitigating covariate bias in health data. Patterns. 2024, 5, 100946. [Google Scholar] [CrossRef] [PubMed]
- Lomotey RK, Kumi S, Ray M, Deters R. Synthetic Data Digital Twins and Data Trusts Control for Privacy in Health Data Sharing. In: Proceedings of the 2024 ACM Workshop on Secure and Trustworthy Cyber-Physical Systems. SaT-CPS ’24. Association for Computing Machinery; 2024:1-10. [CrossRef]
- Osorio-Marulanda PA, Epelde G, Hernandez M, Isasa I, Reyes NM, Iraola AB. Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation: A Systematic Review. IEEE Access. 2024, 12, 88048–88074. [Google Scholar] [CrossRef]
- Nicholas KIH, Perez-Concha O, Hanly M, et al. Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project. JMIR Med Educ. 2024, 10, e51388. [Google Scholar] [CrossRef]
- Patil AJ, Naresh R, Jarial RK, Malik H. Optimized Synthetic Data Integration with Transformer’s DGA Data for Improved ML-based Fault Identification. IEEE Trans Dielectr Electr Insul. Published online 2024:1-1. [CrossRef]
- Burgon A, Zhang Y, Petrick N, Sahiner B, Cha KH, Samala RK. Bias amplification to facilitate the systematic evaluation of bias mitigation methods. IEEE J Biomed Health Inform. Published online 2024:1-12. [CrossRef]
- Koetzier LR, Wu J, Mastrodicasa D, et al. Generating Synthetic Data for Medical Imaging. Radiology. Published online September 10, 2024. [CrossRef]
- Hernandez M, Epelde G, Alberdi A, Cilla R, Rankin D. Synthetic data generation for tabular health records: A systematic review. Neurocomputing. 2022, 493, 28–45. [Google Scholar] [CrossRef]
- Chen RJ, Lu MY, Chen TY, Williamson DFK, Mahmood F. Synthetic data in machine learning for medicine and healthcare. Nat Biomed Eng. 2021, 5, 493–497. [Google Scholar] [CrossRef]
- Goyal M, Mahmoud QH. A Systematic Review of Synthetic Data Generation Techniques Using Generative AI. Electronics. 2024, 13, 3509. [Google Scholar] [CrossRef]
- Tucker A, Wang Z, Rotalinti Y, Myles P. Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. NPJ Digit Med. 2020, 3, 147. [Google Scholar] [CrossRef]
- Hairani H, Widiyaningtyas T, Prasetya DD. Addressing Class Imbalance of Health Data: A Systematic Literature Review on Modified Synthetic Minority Oversampling Technique (SMOTE) Strategies. JOIV Int J Inform Vis. 2024, 8, 1310–1318. [Google Scholar] [CrossRef]
- Bigi F, Rashidi TH, Viti F. Synthetic Population: A Reliable Framework for Analysis for Agent-Based Modeling in Mobility. Transp Res Rec. Published online April 15, 2024:03611981241239656. [CrossRef]
- Guo X, Zhao L. A Systematic Survey on Deep Generative Models for Graph Generation. IEEE Trans Pattern Anal Mach Intell. 2023, 45, 5370–5390. [Google Scholar] [CrossRef]
- Iannucci S, Kholidy HA, Ghimire AD, Jia R, Abdelwahed S, Banicescu I. A Comparison of Graph-Based Synthetic Data Generators for Benchmarking Next-Generation Intrusion Detection Systems. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER). ; 2017:278-289. [CrossRef]
- Haleem MS, Ekuban A, Antonini A, Pagliara S, Pecchia L, Allocca C. Deep-Learning-Driven Techniques for Real-Time Multimodal Health and Physical Data Synthesis. Electronics. 2023, 12, 1989. [Google Scholar] [CrossRef]
- Pawłowski M, Wróblewska A, Sysko-Romańczuk S. Effective Techniques for Multimodal Data Fusion: A Comparative Analysis. Sensors. 2023, 23, 2381. [Google Scholar] [CrossRef]
- Gogoshin G, Branciamore S, Rodin AS. Synthetic data generation with probabilistic Bayesian Networks. Math Biosci Eng MBE. 2021, 18, 8603–8621. [Google Scholar] [CrossRef]
- Kaur D, Sobiesk M, Patil S, et al. Application of Bayesian networks to generate synthetic health data. J Am Med Inform Assoc JAMIA. 2020, 28, 801–811. [Google Scholar] [CrossRef]
- Hosseini A, Serag A. Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray Images. IEEE Access. 2025, 13, 59074–59084. [Google Scholar] [CrossRef]
- Naseer AA, Walker B, Landon C, et al. ScoEHR: Generating Synthetic Electronic Health Records using Continuous-time Diffusion Models. In: Proceedings of the 8th Machine Learning for Healthcare Conference. PMLR; 2023:489-508. Accessed July 29, 2025. https://proceedings.mlr.press/v219/naseer23a.html.
- Wang H, Wang J, Wang J, et al. GraphGAN: Graph Representation Learning With Generative Adversarial Nets. Proc AAAI Conf Artif Intell. 2018;32(1). [CrossRef]
- Generating Longitudinal Synthetic EHR Data with Recurrent Autoencoders and Generative Adversarial Networks | Heterogeneous Data Management, Polystores, and Analytics for Healthcare. Accessed December 28, 2024. https://dl.acm.org/doi/10.1007/978-3-030-93663-1_12.
- Hussain L, Lone KJ, Awan IA, Abbasi AA, Pirzada J ur R. Detecting congestive heart failure by extracting multimodal features with synthetic minority oversampling technique (SMOTE) for imbalanced data using robust machine learning techniques. Waves Random Complex Media. 2022, 32, 1079–1102. [Google Scholar] [CrossRef]
- Emdad FB, Ravuri B, Ayinde L, Rahman MI. “ChatGPT, a Friend or Foe for Education?” Analyzing the User’s Perspectives on The Latest AI Chatbot Via Reddit. In: 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI). Vol 2. ; 2024:1-5. [CrossRef]
- Shahul Hameed MA, Qureshi AM, Kaushik A. Bias Mitigation via Synthetic Data Generation: A Review. Electronics. 2024, 13, 3909. [Google Scholar] [CrossRef]
- Olson, LK. Ethically Challenged: Private Equity Storms US Health Care. JHU Press; 2022.
- Aysha A. Evaluate synthetic data quality using downstream ML - MOSTLY AI. September 20, 2023. Accessed February 27, 2025. https://mostly.ai/blog/synthetic-data-quality-evaluation.
- Zewe A. In machine learning, synthetic data can offer real performance improvements. MIT News | Massachusetts Institute of Technology. November 3, 2022. Accessed February 27, 2025. https://news.mit.edu/2022/synthetic-data-ai-improvements-1103.
- Draghi B, Wang Z, Myles P, Tucker A. Identifying and handling data bias within primary healthcare data using synthetic data generators. Heliyon. 2024, 10, e24164. [Google Scholar] [CrossRef] [PubMed]
- Holmes M, Kaufman BG, Pink GH. Average Beneficiary CMS Hierarchical Condition Category (HCC) Risk Scores for Rural and Urban Providers. Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill; 2018. https://tinyurl.com/3crc98ey.
- Pachouly J, Ahirrao S, Kotecha K, Selvachandran G, Abraham A. A systematic literature review on software defect prediction using artificial intelligence: Datasets, Data Validation Methods, Approaches, and Tools. Eng Appl Artif Intell. 2022, 111, 104773. [Google Scholar] [CrossRef]
- Ali M. Synthetic data is the future of Artificial Intelligence. Medium. January 18, 2023. Accessed February 27, 2025. https://moez-62905.medium.com/synthetic-data-is-the-future-of-artificial-intelligence-6fcfd2ce1a14.
- Shanley D, Hogenboom J, Lysen F, et al. Getting real about synthetic data ethics. EMBO Rep. 2024, 25, 2152–2155. [Google Scholar] [CrossRef]
- Kocoń J, Cichecki I, Kaszyca O, et al. ChatGPT: Jack of all trades, master of none. Inf Fusion. 2023, 99, 101861. [Google Scholar] [CrossRef]
- Purushotham S, Meng C, Che Z, Liu Y. Benchmarking deep learning models on large healthcare datasets. J Biomed Inform. 2018, 83, 112–134. [Google Scholar] [CrossRef]
- González-Sendino R, Serrano E, Bajo J. Mitigating bias in artificial intelligence: Fair data generation via causal models for transparent and explainable decision-making. Future Gener Comput Syst. 2024, 155, 384–401. [Google Scholar] [CrossRef]
- Raji ID, Smart A, White RN, et al. Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT* ’20. Association for Computing Machinery; 2020:33-44. [CrossRef]
- Holzinger A, Biemann C, Pattichis CS, Kell DB. What do we need to build explainable AI systems for the medical domain? Published online December 28, 2017. [CrossRef]
- GDPR. What is GDPR, the EU’s new data protection law? GDPR.eu. November 7, 2018. Accessed July 23, 2025. https://gdpr.eu/what-is-gdpr/.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).