Preprint
Article

This version is not peer-reviewed.

Bridging Machine Learning and Clinical Endpoints: A METABRIC-Informed Simulation Study of Missing Data Imputation for RECIST-Based Best Overall Response

Submitted:

29 April 2026

Posted:

01 May 2026

You are already at the latest version

Abstract
Background: Missing data, particularly progression-driven dropout, introduces substantial bias in longitudinal oncology studies, directly impacting response classification based on RECIST criteria. While machine learning–based imputation methods are increasingly used, their performance is rarely evaluated in a clinically interpretable framework centered on patient-level endpoints such as Best Overall Response (BOR). Methods: We propose a clinically grounded evaluation RECIST 1.1 framework focused on patient-level response classification. Longitudinal tumor trajectories were simulated for 270 patients (1:1 HER2+ and HER2−) across nine follow-up visits using both Gompertz and Stein–Fojo growth models. Realistic missingness was introduced through a combination of random mechanisms and progression-driven dropout. Two machine learning imputation methods, long short-term memory (LSTM) and MissForest, were evaluated under both direct (MAR-based) and Non-responder imputation strategies. Performance was assessed using BOR classification metrics, including accuracy and Cohen’s kappa. Result: Across both simulation frameworks, imputation substantially improved BOR classification performance. Under the Gompertz model, accuracy increased from 0.83–0.87 with direct imputation to 0.93–0.98 with non-responder imputation, with corresponding kappa improvements from 0.71–0.79 to 0.89–0.97. Similar trends were observed under the Stein–Fojo model (accuracy: 0.82–0.84 vs. 0.91–0.96; kappa: 0.69–0.72 vs. 0.86–0.94). Among the evaluated methods, MissForest combined with non-responder imputation demonstrated the most stable and consistently high performance across simulation settings. In contrast, LSTM exhibited greater variability, particularly under complex missingness patterns. Conclusion: Imputation strategies aligned with clinical estimands, such as non-responder imputation, substantially improve patient-level response classification. This study establishes a clinically interpretable evaluation framework linking machine learning–based imputation to RECIST-based endpoints, supporting more robust and regulator-relevant handling of patient-level interpretability under missing data in oncology trials.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated