Version 1
: Received: 17 August 2023 / Approved: 21 August 2023 / Online: 21 August 2023 (11:39:55 CEST)
How to cite:
Cruz Castañeda, W.A.; Bertemes Filho, P. Synthetic Health data Generation for Enhancement of Non-Invasive Diabetes AI-Based Prediction. Preprints2023, 2023081464. https://doi.org/10.20944/preprints202308.1464.v1
Cruz Castañeda, W.A.; Bertemes Filho, P. Synthetic Health data Generation for Enhancement of Non-Invasive Diabetes AI-Based Prediction. Preprints 2023, 2023081464. https://doi.org/10.20944/preprints202308.1464.v1
Cruz Castañeda, W.A.; Bertemes Filho, P. Synthetic Health data Generation for Enhancement of Non-Invasive Diabetes AI-Based Prediction. Preprints2023, 2023081464. https://doi.org/10.20944/preprints202308.1464.v1
APA Style
Cruz Castañeda, W.A., & Bertemes Filho, P. (2023). Synthetic Health data Generation for Enhancement of Non-Invasive Diabetes AI-Based Prediction. Preprints. https://doi.org/10.20944/preprints202308.1464.v1
Chicago/Turabian Style
Cruz Castañeda, W.A. and Pedro Bertemes Filho. 2023 "Synthetic Health data Generation for Enhancement of Non-Invasive Diabetes AI-Based Prediction" Preprints. https://doi.org/10.20944/preprints202308.1464.v1
Abstract
Continuous glucose monitoring devices allow diabetes condition management. However, when limited data is available, one option is to increase their size by generating synthetic samples. From a homemade wearable prototype was created a real dataset with 18 instances and 53 attributes that capture characteristics of capillary and venous blood glucose, oxygen concentration, pulse rate, skin temperature, and 24 modules and 24 phases related to bio-impedance. The objective of this article is to generate synthetic datasets, and also it investigates the ideal features subset and optimal model for non-invasive diabetes prediction. Gaussian-Copulas (GC), conditional generative adversarial networks (CG), variational autoencoders, and Copula-GAN techniques' were used to generate five synthetic datasets. Experiments show that GC1 and GC2 datasets follow min/max boundaries and are not copies of the original data. Multilayer perceptron regressor outperformed (train and test) with 2.17, 2.51 in MAE; 9.29, 13.59 in MSE; 3.05, 3.69 in RMSE, and 0.95, 0.92 in R2 in GC1, and 2.64, 3.02 in MAE; 11.43, 15.11 in MSE; 3.38, 3.89 in RMSE, and 0.94, 0.92 in R2 in GC2 with eight features. Future work is necessary to explore autoencoder and generative architectures, datasets with diverse characteristics, and the effect of the number of features.
Keywords
synthetic generation; wearables health data; non-invasive diabetes prediction
Subject
Engineering, Electrical and Electronic Engineering
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.