Bridging Machine Learning and Clinical Endpoints: A METABRIC-Informed Simulation Study of Missing Data Imputation for RECIST-Based Best Overall Response

Fangya Tan; Bowen Long

doi:10.20944/preprints202604.2186.v1

Submitted:

29 April 2026

Posted:

01 May 2026

You are already at the latest version

Abstract

Background: Missing data, particularly progression-driven dropout, introduces substantial bias in longitudinal oncology studies, directly impacting response classification based on RECIST criteria. While machine learning–based imputation methods are increasingly used, their performance is rarely evaluated in a clinically interpretable framework centered on patient-level endpoints such as Best Overall Response (BOR). Methods: We propose a clinically grounded evaluation RECIST 1.1 framework focused on patient-level response classification. Longitudinal tumor trajectories were simulated for 270 patients (1:1 HER2+ and HER2−) across nine follow-up visits using both Gompertz and Stein–Fojo growth models. Realistic missingness was introduced through a combination of random mechanisms and progression-driven dropout. Two machine learning imputation methods, long short-term memory (LSTM) and MissForest, were evaluated under both direct (MAR-based) and Non-responder imputation strategies. Performance was assessed using BOR classification metrics, including accuracy and Cohen’s kappa. Result: Across both simulation frameworks, imputation substantially improved BOR classification performance. Under the Gompertz model, accuracy increased from 0.83–0.87 with direct imputation to 0.93–0.98 with non-responder imputation, with corresponding kappa improvements from 0.71–0.79 to 0.89–0.97. Similar trends were observed under the Stein–Fojo model (accuracy: 0.82–0.84 vs. 0.91–0.96; kappa: 0.69–0.72 vs. 0.86–0.94). Among the evaluated methods, MissForest combined with non-responder imputation demonstrated the most stable and consistently high performance across simulation settings. In contrast, LSTM exhibited greater variability, particularly under complex missingness patterns. Conclusion: Imputation strategies aligned with clinical estimands, such as non-responder imputation, substantially improve patient-level response classification. This study establishes a clinically interpretable evaluation framework linking machine learning–based imputation to RECIST-based endpoints, supporting more robust and regulator-relevant handling of patient-level interpretability under missing data in oncology trials.

Keywords:

LSTM

;

machine learning

;

MissForest

;

METABRIC

;

oncology

;

RECIST

;

BOR

;

missing data

;

MAR

;

MNAR

;

non-responder imputation

Subject:

Public Health and Healthcare - Public Health and Health Services

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Bridging Machine Learning and Clinical Endpoints: A METABRIC-Informed Simulation Study of Missing Data Imputation for RECIST-Based Best Overall Response

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe