Submitted:
01 September 2025
Posted:
02 September 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Theories and Methods
2.1. Exponential Parametric Hazard Model Combined with Machine Learning
2.2. Causal Inference and ML Estimation
2.3. Debiasing ML Estimators
2.4. Extension to Models with Latent Variables
2.5. Design of Models with Multiple RKHSs
2.6. Estimation Algorithm
| Algorithm 1: DML of hazard ratios with cross fitting |
|
3. Results of Numerical Simulations
3.1. Simulation Result 1: Adjustment for Observed Confounders
3.2. Simulation Result 2: Estimation of Treatment Effect in Population with Heterogeneous Risk
4. Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A.
| Symbols | Description |
|---|---|
| , | The sets of natural and real numbers and d-dimensional Euclidean space () |
| Product set (or product space) | |
| Measure space with product space and product -algebra | |
| Tensor product | |
| , | Transposition of vector v and matrix A |
| The smaller of a and | |
| Element a of a set | |
| Set inclusion (with possible equality) | |
| , | Union and intersection of sets |
| The number of elements in a set | |
| B, | Convergence in probability and in distribution |
| , | Asymptotic notations for (see Ref.[54]) |
| , | Probabilistic asymptotic notations for (see Ref.[54]) |
| (arg) | (Element yielding) minimum of over |
| (arg) | (Element yielding) maximum of over |
| Supremum of over | |
| () | Expectation of the argument random variable (for the specified distribution) |
| Equation defining the object on the left-hand side | |
| , | The norm of the metric vector space and the normed vector space |
| The operator norm of a linear operator A | |
| Independence between A and B (conditioned on ). | |
| , () | Independently and identically distributed (objects drawn from the right-hand side) |
| Integration of over the set E with respect to the measure on the space for X | |
| Natural mapping of vectors into their product space | |
| Abbreviation of partial derivative | |
| Gateaux derivative | |
| Uniform probability distribution over the interval | |
| Gaussian probability distribution with mean and (co)variance |
Appendix B. Numerical Implementation of Inference with Multiple-Kernel Models
Appendix C. Proof of Proposition 1
Appendix D. Proof of Proposition 2
Appendix E. Gradient Functional and Hessian Operator
Appendix F. The Validity of Assumption 9
Appendix G. Proof of Proposition 3
Appendix H. Consideration on the Score Regularity and the Quality of Estimation of Nuisance Parameters Required for DML
Appendix I. Additional Consideration on Causal Interpretation of Hazard Ratios

References
- Lin, R.S.; Lin, J.; Roychoudhury, S.; Anderson, K.M.; Hu, T.; Huang, B.; Leon, L.F.; Liao, J.J.; Liu, R.; Luo, X.; et al. Alternative Analysis Methods for Time to Event Endpoints Under Nonproportional Hazards: A Comparative Analysis. Statistics in Biopharmaceutical Research 2020, 12, 187–198. [Google Scholar] [CrossRef]
- Bartlett, J.W.; Morris, T.P.; Stensrud, M.J.; Daniel, R.M.; Vansteelandt, S.K.; Burman, C.F. The Hazards of Period Specific and Weighted Hazard Ratios. Statistics in Biopharmaceutical Research 2020, 12, 518–519. [Google Scholar] [CrossRef]
- Hernán, M.A. The Hazards of Hazard Ratios. Epidemiology 2010, 21. [Google Scholar] [CrossRef]
- Aalen, O.O.; Cook, R.J.; Røysland, K. Does Cox analysis of a randomized survival study yield a causal treatment effect? Lifetime Data Analysis 2015, 21, 579–593. [Google Scholar] [CrossRef] [PubMed]
- Martinussen, T.; Vansteelandt, S.; Andersen, P.K. Subtleties in the interpretation of hazard contrasts. Lifetime Data Analysis 2020, 26, 833–855. [Google Scholar] [CrossRef] [PubMed]
- Martinussen, T. Causality and the Cox Regression Model. Annual Review of Statistics and Its Application 2022, 9, 249–259. [Google Scholar] [CrossRef]
- Prentice, R.L.; Aragaki, A.K. Intention-to-treat comparisons in randomized trials. Statistical Science 2022, 37, 380–393. [Google Scholar] [CrossRef]
- Ying, A.; Xu, R. On Defense of the Hazard Ratio. arXiv, 2307. [Google Scholar]
- Fay, M.P.; Li, F. Causal interpretation of the hazard ratio in randomized clinical trials. Clinical Trials 2024, 21, 623–635. [Google Scholar] [CrossRef] [PubMed]
- Rufibach, K. Treatment effect quantification for time-to-event endpoints–Estimands, analysis strategies, and beyond. Pharmaceutical Statistics 2019, 18, 145–165. [Google Scholar] [CrossRef]
- Kloecker, D.E.; Davies, M.J.; Khunti, K.; Zaccardi, F. Uses and Limitations of the Restricted Mean Survival Time: Illustrative Examples From Cardiovascular Outcomes and Mortality Trials in Type 2 Diabetes. Annals of Internal Medicine 2020, 172, 541–552. [Google Scholar] [CrossRef] [PubMed]
- Snapinn, S.; Jiang, Q.; Ke, C. Treatment effect measures under nonproportional hazards. Pharmaceutical Statistics 2023, 22, 181–193. [Google Scholar] [CrossRef] [PubMed]
- Cui, Y.; Kosorok, M.R.; Sverdrup, E.; Wager, S.; Zhu, R. Estimating heterogeneous treatment effects with right-censored data via causal survival forests. Journal of the Royal Statistical Society Series B: Statistical Methodology 2023, 85, 179–211. [Google Scholar] [CrossRef]
- Xu, S.; Cobzaru, R.; Finkelstein, S.N.; Welsch, R.E.; Ng, K.; Shahn, Z. Estimating Heterogeneous Treatment Effects on Survival Outcomes Using Counterfactual Censoring Unbiased Transformations. arXiv, 2401. [Google Scholar]
- Frauen, D.; Schröder, M.; Hess, K.; Feuerriegel, S. Orthogonal Survival Learners for Estimating Heterogeneous Treatment Effects from Time-to-Event Data. arXiv, 2505. [Google Scholar]
- Leviton, A.; Loddenkemper, T. Design, implementation, and inferential issues associated with clinical trials that rely on data in electronic medical records: a narrative review. BMC Medical Research Methodology 2023, 23, 271. [Google Scholar] [CrossRef]
- Hernán, M.A.; Brumback, B.; Robins, J.M. Marginal Structural Models to Estimate the Joint Causal Effect of Nonrandomized Treatments. Journal of the American Statistical Association 2001, 96, 440–448. [Google Scholar] [CrossRef]
- Van der Laan, M.J.; Rose, S. Targeted learning in data science; Springer, 2018.
- Chernozhukov, V.; Chetverikov, D.; Demirer, M.; Duflo, E.; Hansen, C.; Newey, W.; Robins, J. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 2018, 21, C1–C68. [Google Scholar] [CrossRef]
- Ahrens, A.; Chernozhukov, V.; Hansen, C.; Kozbur, D.; Schaffer, M.; Wiemann, T. An Introduction to Double/Debiased Machine Learning. arXiv, 2504. [Google Scholar]
- Ren, J.J.; Zhou, M. Full likelihood inferences in the Cox model: an empirical likelihood approach. Annals of the Institute of Statistical Mathematics 2011, 63, 1005–1018. [Google Scholar] [CrossRef]
- Berlinet, A.; Thomas-Agnan, C. Reproducing kernel Hilbert spaces in probability and statistics; Springer Science & Business Media, 2011.
- Fukumizu, K.; Song, L.; Gretton, A. Kernel Bayes’ Rule: Bayesian Inference with Positive Definite Kernels. Journal of Machine Learning Research 2013, 14, 3753–3783. [Google Scholar]
- Yang, S.; Eaton, C.B.; Lu, J.; Lapane, K.L. Application of marginal structural models in pharmacoepidemiologic studies: a systematic review. Pharmacoepidemiology and Drug Safety 2014, 23, 560–571. [Google Scholar] [CrossRef]
- Robins, J.M.; Hernán, M.Á.; Brumback, B. Marginal Structural Models and Causal Inference in Epidemiology. Epidemiology 2000, 11. [Google Scholar] [CrossRef] [PubMed]
- Bishop, C.M.; Nasrabadi, N.M. Pattern recognition and machine learning; Vol. 4, Springer, 2006.
- van der Laan, M.J.; Petersen, M.L.; Joffe, M.M. History-adjusted marginal structural models and statically-optimal dynamic treatment regimens. The International Journal of Biostatistics 2005, 1. [Google Scholar] [CrossRef]
- Hille, E.; Phillips, R.S. Functional Analysis and Semi-Groups, 3rd printing of rev. In ed. of 1957. In Proceedings of the Colloq. Publ, Vol. 31. 1974. [Google Scholar]
- Lanckriet, G.R.; Cristianini, N.; Bartlett, P.; Ghaoui, L.E.; Jordan, M.I. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research 2004, 5, 27–72. [Google Scholar]
- Suzuki, T.; Sugiyama, M. Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness. The Annals of Statistics 2013, 41, 1381–1405. [Google Scholar] [CrossRef]
- Aronszajn, N. Theory of reproducing kernels. Transactions of the American mathematical society 1950, 68, 337–404. [Google Scholar] [CrossRef]
- Bach, F.R. Consistency of the group lasso and multiple kernel learning. Journal of Machine Learning Research 2008, 9. [Google Scholar]
- Meier, L.; Van de Geer, S.; Bühlmann, P. High-dimensional additive modeling. The Annals of Statistics 2009, 37, 3779–3821. [Google Scholar] [CrossRef]
- Koltchinskii, V.; Yuan, M. Sparsity in multiple kernel learning. The Annals of Statistics 2010, 38, 3660–3695. [Google Scholar] [CrossRef]
- Cox, D.R. Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological) 1972, 34, 187–202. [Google Scholar] [CrossRef]
- Efron, B. The Efficiency of Cox’s Likelihood Function for Censored Data. Journal of the American Statistical Association 1977, 72, 557–565. [Google Scholar] [CrossRef]
- Oakes, D. The Asymptotic Information in Censored Survival Data. Biometrika 1977, 64, 441–448. [Google Scholar] [CrossRef]
- Thackham, M.; Ma, J. On maximum likelihood estimation of the semi-parametric Cox model with time-varying covariates. Journal of Applied Statistics 2020, 47, 1511–1528. [Google Scholar] [CrossRef]
- Luo, J.; Rava, D.; Bradic, J.; Xu, R. Doubly robust estimation under a possibly misspecified marginal structural Cox model. Biometrika 2024, 112, asae065. [Google Scholar] [CrossRef]
- Zhang, Z.; Stringer, A.; Brown, P.; Stafford, J. Bayesian inference for Cox proportional hazard models with partial likelihoods, nonlinear covariate effects and correlated observations. Statistical Methods in Medical Research 2023, 32, 165–180. [Google Scholar] [CrossRef] [PubMed]
- Inoue, K.; Adomi, M.; Efthimiou, O.; Komura, T.; Omae, K.; Onishi, A.; Tsutsumi, Y.; Fujii, T.; Kondo, N.; Furukawa, T.A. Machine learning approaches to evaluate heterogeneous treatment effects in randomized controlled trials: a scoping review. Journal of Clinical Epidemiology 2024, 176. [Google Scholar] [CrossRef] [PubMed]
- Ma, J.; Heritier, S.; Lô, S.N. On the maximum penalized likelihood approach for proportional hazard models with right censored survival data. Computational Statistics and Data Analysis 2014, 74, 142–156. [Google Scholar] [CrossRef]
- Allman, E.S.; Matias, C.; Rhodes, J.A. Identifiability of parameters in latent structure models with many observed variables 2009.
- Allman, E.S.; Rhodes, J.A.; Stanghellini, E.; Valtorta, M. Parameter identifiability of discrete Bayesian networks with hidden variables. Journal of Causal Inference 2015, 3, 189–205. [Google Scholar] [CrossRef]
- Gassiat, E.; Cleynen, A.; Robin, S. Inference in finite state space non parametric Hidden Markov Models and applications. Statistics and Computing 2016, 26, 61–71. [Google Scholar] [CrossRef]
- Gassiat, E.; Rousseau, J. Nonparametric finite translation hidden Markov models and extensions. Bernoulli 2016, 22, 193–212. [Google Scholar] [CrossRef]
- Wieland, F.G.; Hauber, A.L.; Rosenblatt, M.; Tönsing, C.; Timmer, J. On structural and practical identifiability. Current Opinion in Systems Biology 2021, 25, 60–69. [Google Scholar] [CrossRef]
- Watanabe, S. Algebraic geometry and statistical learning theory; Vol. 25, Cambridge university press, 2009.
- Calderhead, B.; Girolami, M. Estimating Bayes factors via thermodynamic integration and population MCMC. Computational Statistics and Data Analysis 2009, 53, 4028–4045. [Google Scholar] [CrossRef]
- Watanabe, S. A widely applicable Bayesian information criterion. Journal of Machine Learning Research 2013, 14, 867–897. [Google Scholar]
- Drton, M.; Plummer, M. A Bayesian Information Criterion for Singular Models. Journal of the Royal Statistical Society Series B: Statistical Methodology 2017, 79, 323–380. [Google Scholar] [CrossRef]
- Moral, P. Feynman-Kac formulae: genealogical and interacting particle systems with applications; Springer, 2004.
- Chopin, N.; Papaspiliopoulos, O.; et al. An introduction to sequential Monte Carlo; Vol. 4, Springer, 2020.
- Janson, S. Probability asymptotics: notes on notation. arXiv, 1108. [Google Scholar]
- Bach, F.; Jordan, M. Kernel independent component analysis. Journal of Machine Learning Research 2003. [Google Scholar]
- Liu, D.C.; Nocedal, J. On the limited memory BFGS method for large scale optimization. Mathematical Programming 1989, 45, 503–528. [Google Scholar] [CrossRef]
- Williams, C.K.; Rasmussen, C.E. Gaussian processes for machine learning; Vol. 2, MIT press Cambridge, MA, 2006.
- Fukumizu, K.; Bach, F.R.; Gretton, A. Statistical Consistency of Kernel Canonical Correlation Analysis. Journal of Machine Learning Research 2007, 8, 361–383. [Google Scholar]
- Kanamori, T.; Suzuki, T.; Sugiyama, M. Theoretical analysis of density ratio estimation. IEICE transactions on fundamentals of electronics, communications and computer sciences 2010, 93, 787–798. [Google Scholar] [CrossRef]


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
