Submitted:
14 October 2025
Posted:
27 October 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Latent Abstraction: To learn a low-dimensional representation of the data by factorizing the signal into latent variables that are shared across all omics views () and variables that are specific to each view (). This captures both common and view-specific biological processes.
- View Selection: To automatically determine the relevance of each omics view for predicting the survival outcome. We achieve this through a Bayesian spike-and-slab prior on the view-specific latents (), which can effectively “turn off” uninformative views, enhancing interpretability [4].
- Principled Sparsity Modeling: To explicitly model the excess zeros common in omics data using a zero-inflated mixture likelihood [2]. This distinguishes between true biological absence (a point mass at zero) and low-level biological signal (a continuous component).
- Censored Outcome Modeling: To directly model time-to-event data through an integrated Cox proportional hazards module that operates on the learned latent representations [3].
- Integration of Biological Priors: To leverage prior knowledge in the form of biological networks () by using Graph Convolutional Network (GCN) encoders, which regularize the model and encourage biologically meaningful latent representations [5].
2. Methods
2.1. Notation and Core Entities
- : The feature vector for subject i in view k, where is the number of features in that view.
- : An optional adjacency matrix for view k, representing prior biological knowledge (e.g., a protein-protein interaction network).
- : The observed time for subject i, where is the survival or censoring time and is the event indicator ( if the event was observed, if censored).
- : A shared latent vector for subject i, capturing variation common to all views.
- : A view-specific latent vector for subject i in view k.
- : A binary variable indicating whether view k is active in predicting survival.
- : The prior inclusion probability for view k.
2.2. The BMGCN Generative Model
- Sample View Inclusion Probabilities and Indicators: For each view , we first draw a prior inclusion probability from a Beta distribution and then sample the binary indicator variable:
- Sample Latent Variables: A shared latent vector is drawn from a standard normal prior. The view-specific latent vectors are drawn from a spike-and-slab prior, conditioned on the inclusion variable . If , the latent vector is fixed at zero (the “spike”); otherwise, it is drawn from a standard normal (the “slab”).where is the Dirac delta function at zero.
-
Generate Omics Features via Decoders: The observed features for each view are generated from the combined shared and view-specific latent variables. A decoder network maps the concatenated latent vector to the parameters of a zero-inflated Gaussian distribution.Here, is the mean vector, is the vector of zero-inflation probabilities (passed through a sigmoid function to ensure they are in ), and is a view-specific variance parameter.
-
Generate Survival Outcome: A latent risk score for each subject is computed by a survival head network , which takes the shared latent and the gated view-specific latents as input. The observed survival data are assumed to follow a Cox proportional hazards model conditioned on this risk score.The likelihood for the Cox model is given by the partial log-likelihood:where is the set of subjects at risk at time . This semi-parametric form is powerful because it does not require specifying a baseline hazard function . When multiple events occur at the same time (ties), approximations such as Breslow’s or Efron’s can be used.The full joint distribution over all observed and latent variables is given by:
2.3. Variational Inference and the ELBO
-
Reconstruction Term (): The expected log-likelihood of the omics data.The piecewise nature of the zero-inflated log-likelihood makes this term directly computable without explicit sampling of discrete zero-inflation masks.
- Survival Term (): The expected Cox partial log-likelihood.
-
KL Divergence Regularizer (): A sum of Kullback-Leibler (KL) divergences that penalize deviations of the variational posterior from the prior.The KL terms for the Gaussian latents have a closed-form solution. The term involving the discrete is approximated as , effectively weighting the KL penalty by the posterior probability that the view is included.
2.4. Graph-Convolutional Encoders
2.5. Optimization and Implementation
- Using the log-sum-exp trick for stable computation of the Cox likelihood denominator.
- Applying KL annealing to gradually introduce the KL regularization term, preventing posterior collapse [10].
- Initializing the view-selection logits () to favor inclusion initially to promote stable learning.
3. Validation and Diagnostics Strategy
- Simulation Studies: We will generate synthetic multi-view data with known ground-truth latent structures, pre-defined informative vs. uninformative views, and known survival dependencies. This will allow us to quantitatively assess the model’s ability to recover the true latent variables (cosine similarity), correctly perform view selection (accuracy of vs. ground truth), and accurately predict survival (Concordance Index).
- Cross-Validation: On real-world datasets, we will use a k-fold cross-validation scheme, stratified by event status, to evaluate predictive performance on held-out data. The primary metric will be the Concordance Index (C-index), which measures the fraction of concordant pairs of subjects. A secondary metric will be the integrated Brier score, which assesses the calibration of survival probability predictions over time.
- Posterior Predictive Checks (PPCs): To assess goodness-of-fit, we will sample from the posterior predictive distribution. We will compare the distributions of simulated data (e.g., zero frequencies, means, variances) against the observed data to detect model misspecification.
- Cox Diagnostics: We will examine Schoenfeld residuals to test the proportional hazards assumption of the Cox model, a critical assumption for the validity of the survival module.
- Ablation Studies: To quantify the contribution of each model component, we will perform ablation studies by systematically removing key features: (1) the graph structure (setting ), (2) the zero-inflation module (), and (3) the spike-and-slab selectors (forcing for all views).
- Latent Space Analysis: We will visualize the learned shared latent space using techniques like UMAP to check if it clusters subjects by known phenotypes, treatment groups, or survival outcomes, providing a qualitative check on its biological relevance [13].
4. Discussion
5. Conclusions
Appendix A. Supplementary Mathematical Details: Full Derivations
Appendix A.1. Derivation of the Evidence Lower Bound (ELBO)
Appendix A.2. Decomposition of the ELBO
Appendix A.3. Zero-Inflated Gaussian Log-Likelihood and Gradients
Appendix A.4. Gradient of the Cox Partial Log-Likelihood
- When , if subject m experienced the event (), then appears directly in the first term.
- For any subject i for whom subject m is in the risk set (i.e., ), appears in the log-sum-exp term .
Appendix A.5. KL Divergence for Gaussian Distributions
Appendix A.6. KL Divergence for the Spike-and-Slab Prior
Appendix A.7. Derivation of the Variational Objective for the Spike-and-Slab Prior
Appendix A.8. Derivation of the Optimal Variational Distribution for πk
Appendix A.9. Graph Convolutional Layer: Signal Processing Interpretation
Appendix A.10. Hard-Concrete Relaxation for γk
Appendix A.11. Derivation of Gradients for Model Parameters
Appendix A.12. Reparameterization Trick for Gaussian Latents
References
- Ritchie, M.D.; Holzinger, E.R.; Li, R.; Pendergrass, S.A.; Kim, D. Methods of integrating data to uncover genotype–phenotype interactions. Nature Reviews Genetics 2015, 16, 85–97. [Google Scholar] [CrossRef]
- Lambert, D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992, 34, 1–14. [Google Scholar] [CrossRef]
- Cox, D.R. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 1972, 34, 187–202. [Google Scholar] [CrossRef]
- George, E.I.; McCulloch, R.E. Variable selection via Gibbs sampling. Journal of the American statistical association 1993, 88, 881–889. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
- Kingma, D.P.; Welling, M. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 2013. [CrossRef]
- Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082 2014. [CrossRef]
- Jang, E.; Gu, S.; Poole, B. Categorical reparameterization with Gumbel-Softmax. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
- Maddison, C.J.; Mnih, A.; Teh, Y.W. The concrete distribution: A continuous relaxation of discrete random variables. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
- Bowman, S.R.; Vilnis, L.; Vinyals, O.; Dai, A.M.; Jozefowicz, R.; Bengio, S. Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349 2015. [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014. [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 2017. [CrossRef]
- McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 2018. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 1996 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
