Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Ensemble Learning for Breast Cancer Lesion Classification: A Pilot Validation using Correlated Spectroscopic Imaging and Diffusion-Weighted Imaging

Version 1 : Received: 5 May 2023 / Approved: 8 May 2023 / Online: 8 May 2023 (02:53:22 CEST)

A peer-reviewed article of this Preprint also exists.

Joy, A.; Lin, M.; Joines, M.; Saucedo, A.; Lee-Felker, S.; Baker, J.; Chien, A.; Emir, U.; Macey, P.M.; Thomas, M.A. Ensemble Learning for Breast Cancer Lesion Classification: A Pilot Validation Using Correlated Spectroscopic Imaging and Diffusion-Weighted Imaging. Metabolites 2023, 13, 835. Joy, A.; Lin, M.; Joines, M.; Saucedo, A.; Lee-Felker, S.; Baker, J.; Chien, A.; Emir, U.; Macey, P.M.; Thomas, M.A. Ensemble Learning for Breast Cancer Lesion Classification: A Pilot Validation Using Correlated Spectroscopic Imaging and Diffusion-Weighted Imaging. Metabolites 2023, 13, 835.

Abstract

The main objective of this work was to evaluate the application of individual and ensemble machine learning models to classify malignant and benign breast masses using features from two-dimensional (2D) correlated spectroscopy spectra extracted from five-dimensional Echo Planar-Correlated Spectroscopic Imaging (5D EP-COSI) and Diffusion-weighted imaging (DWI). Twenty-four different metabolite and lipid ratios with respect to 2D diagonal peaks at 1.4ppm and 5.4ppm, and water from one-dimensional non-water-suppressed (NWS) spectra were used as the features. Additionally, water fraction, fat fraction and water-to-fat ratios from NWS spectra and apparent diffusion coefficients (ADC) from DWI were included. Nine most important features were identified using recursive feature elimination. XGBoost (AUC:93.0%, Accuracy:85.7%, F1-score:87.6%), GradientBoost (AUC:94.4%, Accuracy:87.0%, F1-score:89.4%), CatBoost (AUC:95.2%, Accuracy:86.9%, F1-score:88.4%) and RandomForest (AUC:92.2%, Accuracy:85.3%, F1-score:87.6%) were the best performing models. While the conventional biomarkers like choline, myo-Inositol, and glycine were statistically significant predictors, the key features contributing to the classification were ADC, 2D diagonal peaks at 0.9ppm, 2.1ppm and 2.3ppm, cross peaks between 1.4 and 0.9ppm, 4.3 and 4.1ppm, 2.1 and 1.4ppm and the triglyceryl-fat cross peak. The results highlight the contribution of the 2D spectral peaks to the model, and they demonstrate the potential of 5D EP-COSI for early breast cancer detection.

Keywords

Correlated Spectroscopic Imaging; Diffusion weighted imaging; Machine Learning; Breast Cancer; Choline; Myo-inositol; Glycine; Water; Lipids

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.