Background: Missing data, particularly progression-driven dropout, introduces substantial bias in longitudinal oncology studies, directly impacting response classification based on RECIST criteria. While machine learning–based imputation methods are increasingly used, their performance is rarely evaluated in a clinically interpretable framework centered on patient-level endpoints such as Best Overall Response (BOR). Methods: We propose a clinically grounded evaluation RECIST 1.1 framework focused on patient-level response classification. Longitudinal tumor trajectories were simulated for 270 patients (1:1 HER2+ and HER2−) across nine follow-up visits using both Gompertz and Stein–Fojo growth models. Realistic missingness was introduced through a combination of random mechanisms and progression-driven dropout. Two machine learning imputation methods, long short-term memory (LSTM) and MissForest, were evaluated under both direct (MAR-based) and Non-responder imputation strategies. Performance was assessed using BOR classification metrics, including accuracy and Cohen’s kappa. Result: Across both simulation frameworks, imputation substantially improved BOR classification performance. Under the Gompertz model, accuracy increased from 0.83–0.87 with direct imputation to 0.93–0.98 with non-responder imputation, with corresponding kappa improvements from 0.71–0.79 to 0.89–0.97. Similar trends were observed under the Stein–Fojo model (accuracy: 0.82–0.84 vs. 0.91–0.96; kappa: 0.69–0.72 vs. 0.86–0.94). Among the evaluated methods, MissForest combined with non-responder imputation demonstrated the most stable and consistently high performance across simulation settings. In contrast, LSTM exhibited greater variability, particularly under complex missingness patterns. Conclusion: Imputation strategies aligned with clinical estimands, such as non-responder imputation, substantially improve patient-level response classification. This study establishes a clinically interpretable evaluation framework linking machine learning–based imputation to RECIST-based endpoints, supporting more robust and regulator-relevant handling of patient-level interpretability under missing data in oncology trials.