Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Reproducibility in Radiomics: A Comparison of Feature Extraction Methods and Two Independent Datasets

Version 1 : Received: 25 May 2023 / Approved: 26 May 2023 / Online: 26 May 2023 (07:15:55 CEST)

A peer-reviewed article of this Preprint also exists.

Thomas, H.M.T.; Wang, H.Y.C.; Varghese, A.J.; Donovan, E.M.; South, C.P.; Saxby, H.; Nisbet, A.; Prakash, V.; Sasidharan, B.K.; Pavamani, S.P.; D., D.; Mathew, M.; Isiah, R.G.; Evans, P.M. Reproducibility in Radiomics: A Comparison of Feature Extraction Methods and Two Independent Datasets. Appl. Sci. 2023, 13, 7291. Thomas, H.M.T.; Wang, H.Y.C.; Varghese, A.J.; Donovan, E.M.; South, C.P.; Saxby, H.; Nisbet, A.; Prakash, V.; Sasidharan, B.K.; Pavamani, S.P.; D., D.; Mathew, M.; Isiah, R.G.; Evans, P.M. Reproducibility in Radiomics: A Comparison of Feature Extraction Methods and Two Independent Datasets. Appl. Sci. 2023, 13, 7291.

Abstract

Radiomics involves the extraction of information from medical images not visible to the human eye. There is evidence these features can be used for treatment stratification and outcome prediction. However, there is much discussion about the reproducibility of results between different studies. This paper studies the reproducibility of CT texture features used in radiomics, comparing two feature extraction implementations namely Matlab toolkit and Pyradiomics when applied on independent datasets of CT scans of patients i) the open access RIDER dataset containing a set of repeat CT scans taken 15 minutes apart for 31 patients (RIDER Scan 1 and Scan 2 respectively) treated for lung cancer and ii) the open access HN1 dataset containing 137 patients treated for head and neck cancer. Gross tumor volume (GTV) manually outlined by an experienced observer available on both datasets was used. 43 common radiomics features available on Matlab and Pyradiomics were calculated using 2 intensity-level quantization methods with and without an intensity threshold. Cases were ranked for each feature for all combinations of quantization parameters and the Spearman’s rank coefficient, rs, calculated. Reproducibility was defined when a highly correlated feature in the RIDER dataset also correlated highly in the HN1 dataset and vice versa. 29 out of 43 reported stable features were found to be highly reproducible between Matlab and Pyradiomics implementations, having consistently high correlation in rank ordering for RIDER Scan 1 and RIDER Scan 2 (rs > 0.8). 18/43 reported features were common in RIDER and HN1 datasets, suggesting they may be agnostic to disease site. Useful radiomics features should be selected based on reproducibility. This study identified a set of features that meet this requirement and validated the methodology for evaluating reproducibility between datasets.

Keywords

radiomics; reproducibility; repeatability; validation; lung cancer; head and neck cancer; CT im-aging

Subject

Public Health and Healthcare, Other

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.