Submitted:
23 May 2023
Posted:
24 May 2023
You are already at the latest version
Abstract
Keywords:
Background
Population diversity
Ethics and data science
Common methods of synthetic data generation
Machine learning models
Deep learning models
Data augmentation
Classical methods for data generation
Content models
Summary
Author Contributions
Funding
Ethics approval
Consent to participate
Consent for publication
Acknowledgements
Conflicts of interest
References
- Delanerolle, G.; Yang, X.; Shetty, S.; Raymont, V.; Shetty, A.; Phiri, P.; Hapangama, D.K.; Tempest, N.; Majumder, K.; Shi, J.Q. Artificial intelligence: a rapid case for advancement in the personalization of gynaecology/obstetric and mental health care. Women's Health. 2021, 17, 17455065211018111. [Google Scholar] [CrossRef] [PubMed]
- Delanerolle, G.K.; Shetty, S.; Raymont, V. A perspective: use of machine learning models to predict the risk of multimorbidity. LOJ Medical Sciences. 2021, 5. [Google Scholar] [CrossRef]
- Cowie, M.R.; Blomster, J.I.; Curtis, L.H.; Duclaux, S.; Ford, I.; Fritz, F.; Goldman, S.; Janmohamed, S.; Kreuzer, J.; Leenay, M.; Michel, A. Electronic health records to facilitate clinical research. Clinical Research in Cardiology. 2017, 106, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Gov.uk Policy paper Data saves lives: reshaping health and social care with data.[Internet]. cited on st 2022. Available on: https://www.gov.uk/government/publications/data-saves-lives-reshaping-health-and-social-care-with-data/data-saves-lives-reshaping-health-and-social-care-with-data.
- Taichman, D.B.; Backus, J.; Baethge, C.; Bauchner, H.; De Leeuw, P.W.; Drazen, J.M.; Fletcher, J.; Frizelle, F.A.; Groves, T.; Haileamlak, A.; James, A. Sharing clinical trial data: a proposal from the International Committee of Medical Journal Editors. Annals of internal medicine. 2016, 164, 505–6. [Google Scholar] [CrossRef] [PubMed]
- Little, R.J.; Rubin, D.B. Statistical analysis with missing data. John Wiley & Sons; 2019.
- Rubin, D.B. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association. 2005, 100, 322–31. [Google Scholar] [CrossRef]
- Heckman, J.J. Econometric causality. International statistical review. 2008, 76, 1–27. [Google Scholar] [CrossRef]
- Robins, J.M.; Hernán, M.A. Estimation of the causal effects of time-varying exposures in Longitudinal Analysis, Handbook of Modern Statistical Methods, Eds Fitzmaurice, G, Davidian, M., Verbeke, G., Molenberghs, G., Chapman & Hall. 2009;553:599. CRC, Bacon Raton, USA.
- NHS England. Exploring how to create mock patient data (synthetic data) from real patient data.[internet] cited on 1st March 2022. Available on https://transform.england.nhs.uk/ai-lab/explore-all-resources/develop-ai/exploring-how-to-create-mock-patient-data-synthetic-data-from-real-patient-data/.
- Office for National Statistics. ONS methodology working paper series number 16 - Synthetic data pilot. [Internet]. Cited on 1st March 2022. Available on https://www.ons.gov.uk/methodology/methodologicalpublications/generalmethodology/onsworkingpaperseries/onsmethodologyworkingpaperseriesnumber16syntheticdatapilot.
- Levine, A.B.; Peng, J.; Farnell, D.; Nursey, M.; Wang, Y.; Naso, J.R.; Ren, H.; Farahani, H.; Chen, C.; Chiu, D.; Talhouk, A. Synthesis of diagnostic quality cancer pathology images by generative adversarial networks. The Journal of pathology. 2020, 252, 178–88. [Google Scholar] [CrossRef] [PubMed]
- Yale A,J., Privacy Preserving Synthetic Health Data Generation and Evaluation, Ph.D. thesis, Rensselaer Polytechnic Institute, ISBN: 9798662575981 Publication Title: ProQuest Dissertations and Theses 27833340, 2020.
- SMcLachlan, K. Dube, T. Gallagher, J.A. Simmonds, N. Fenton, Realistic Synthetic Data Generation: The ATEN Framework, in: A. Cliquet Jr., S. Wiebe, P. Anderson, G. Saggio, R. Zwiggelaar, H. Gamboa, A. Fred, S. Bermúdez i Badia (Eds.), Biomedical Engineering Systems and Technologies, Communications in Computer and Information Science, Springer International Publishing, 497– 523, ISBN 978-3-030-29196-9, 2019. [CrossRef]
- Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, Duffett C, Dube K, Gallagher T, McLachlan S. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. Journal of the American Medical Informatics Association. 2018, 25, 230–8. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Chun, D.; Patel, M.; Chiang, E.; James, J. The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures. BMC medical informatics and decision making. 2019, 19, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Dahmen, J.; Cook, D. SynSys: A synthetic data generation system for healthcare applications. Sensors. 2019, 19, 1181. [Google Scholar] [CrossRef] [PubMed]
- Hyun, J.; Lee, S.H.; Son, H.M.; Park, J.U.; Chung, T.M. A synthetic data generation model for diabetic foot treatment. InFuture Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications: 7th International Conference, FDSE 2020, Quy Nhon, Vietnam, November 25–27, 2020, Proceedings 7 2020 (pp. 249-264). Springer Singapore.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).