Zhovannik, I.; Bontempi, D.; Romita, A.; Pfaehler, E.; Primakov, S.; Dekker, A.; Bussink, J.; Traverso, A.; Monshouwer, R. Segmentation Uncertainty Estimation as a Sanity Check for Image Biomarker Studies. Cancers2022, 14, 1288.
Zhovannik, I.; Bontempi, D.; Romita, A.; Pfaehler, E.; Primakov, S.; Dekker, A.; Bussink, J.; Traverso, A.; Monshouwer, R. Segmentation Uncertainty Estimation as a Sanity Check for Image Biomarker Studies. Cancers 2022, 14, 1288.
Problem. Image biomarker analysis, also known as radiomics, is a tool for tissue characterization and treatment prognosis that relies on routinely acquired clinical images and delineations. Due to the uncertainty in image acquisition, processing, and segmentation (delineation) protocols, radiomics often lacks reproducibility. Radiomics harmonization techniques have been proposed as a solution to reduce these sources of uncertainty and/or their influence on the prognostic model performance. A relevant question is how to estimate the protocol-induced uncertainty of a specific image biomarker, what the effect is on the model performance, and how to optimize the model given the uncertainty. In this manuscript, we show how protocol uncertainty can drastically reduce prognostic model performance. We introduce an effect-size measure η that assesses the protocol-induced uncertainty versus the measurable effect.
Methods. Two non-small cell lung cancer (NSCLC) cohorts, composed of 421 and 240 patients respectively, were used for training and testing. Per patient, a Monte Carlo algorithm was used to generate three hundred synthetic contours with a surface dice tolerance measure less than 1.18 mm with respect to the original GTV. These contours were subsequently used to derive 104 radiomic features, which were ranked on their relative sensitivity to contour perturbation, expressed in the parameter η. The top four (low η) and the bottom four (high η) features were selected for two models based on Cox proportional hazards model. To investigate the influence of segmentation uncertainty on the prognostic model, we trained and tested the setup in 5000 augmented realizations (using a Monte Carlo sampling method); the log-rank test was used to assess the stratification performance and stability to segmentation uncertainty.
Results. Although both low and high η setup showed significant testing set log-rank p-values (p=0.01) in the original GTV delineations (without segmentation uncertainty introduced), in the model with high uncertainty to effect ratio only around 30% of the augmented realizations resulted in model performance with p < 0.05 in the test set. In contrast, the low η setup performed with log-rank p < 0.05 in 90% of the augmented realizations. Moreover, the high η setup classification was uncertain for 50% of the subjects in the testing set (for 80% agreement rate), whereas the low η setup was uncertain only in 10% of the cases. The code and part of the data are available at https://github.com/Maastro-CDS-Imaging-Group/sure.
Discussion. Estimating image biomarker model performance based only on the original GTV segmentation without considering segmentation uncertainty may be deceiving. The model might result in a significant stratification performance, but can be unstable for delineation variations, which are inherent to manual segmentation. Simulating segmentation uncertainty using the method described allows for more stable image biomarker estimation, selection, and model development. The segmentation uncertainty estimation method described here is universal and can be extended to estimate other protocol uncertainties (such as image acquisition and pre-processing).
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.