Preprint
Article

This version is not peer-reviewed.

GenProtect-V: A Variational Inference-based Framework for Privacy-Preserving Synthetic Human Genomic Data Generation

Submitted:

27 December 2025

Posted:

29 December 2025

You are already at the latest version

Abstract
The generation of synthetic human genomic data offers immense potential for biomedical research and data sharing, while theoretically safeguarding individual privacy. However, existing methods, including deep generative models, struggle to achieve a robust balance between data utility and privacy protection. State-of-the-art evaluations like PRISM-G reveal vulnerabilities such as proximity, kinship replay, and trait-linked leakage. This paper introduces GenProtect-V, an end-to-end privacy-preserving synthetic human genomic data generation framework based on a Variational Autoencoder architecture. GenProtect-V integrates multi-layered privacy mechanisms: a Differentially Private Encoder to mitigate Proximity Leakage, Decoupled Latent Space Learning to address Kinship Replay, and a Rare Variant Smoother to counter Trait-linked Leakage. Through extensive experiments on the 1000 Genomes Project dataset, we demonstrate that GenProtect-V consistently achieves significantly lower PRISM-G composite scores compared to state-of-the-art baselines. Crucially, GenProtect-V simultaneously maintains or improves key utility metrics, including Allele Frequency fidelity, Population Structure preservation, and GWAS reproducibility. An ablation study further confirms the independent and significant contributions of its privacy mechanisms. GenProtect-V establishes a new benchmark for balancing privacy and utility, offering a more secure and practical paradigm for synthetic genomic data generation.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated