Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

A Hierarchical Machine Learning Model to Discover Gleason Grade Group-specific Biomarkers in Prostate Cancer

Version 1 : Received: 8 October 2019 / Approved: 24 November 2019 / Online: 24 November 2019 (16:51:25 CET)
Version 2 : Received: 26 November 2019 / Approved: 26 November 2019 / Online: 26 November 2019 (04:01:21 CET)

A peer-reviewed article of this Preprint also exists.

Hamzeh, O.; Alkhateeb, A.; Zheng, J.Z.; Kandalam, S.; Leung, C.; Atikukke, G.; Cavallo-Medved, D.; Palanisamy, N.; Rueda, L. A Hierarchical Machine Learning Model to Discover Gleason Grade-Specific Biomarkers in Prostate Cancer. Diagnostics 2019, 9, 219. Hamzeh, O.; Alkhateeb, A.; Zheng, J.Z.; Kandalam, S.; Leung, C.; Atikukke, G.; Cavallo-Medved, D.; Palanisamy, N.; Rueda, L. A Hierarchical Machine Learning Model to Discover Gleason Grade-Specific Biomarkers in Prostate Cancer. Diagnostics 2019, 9, 219.

Abstract

1) Background: One of the deadliest cancers that affect men worldwide and North American men is prostate cancer. This disease motivates parts of the cells in the prostate to lose control of their growth and division. 2) Methods: We are proposing a machine learning method used to analyze gene expressions of prostate tumors with different Gleason scores, and to identify potential genetic biomarkers for each group. A publicly-available RNA-Seq dataset of a cohort of 104 prostate cancer patients have been retrieved from the National Center for Biotechnology Information's (NCBI) Gene Expression Omnibus (GEO) repository. We categorize patients by their Gleason scores into different groups to create a hierarchy of disease progression. A hierarchical model with standard classifiers in different Gleason groups (hereinafter called nodes) to identify and predict nodes based on their mRNA or gene expressions. At each node, patient samples are analyzed via class imbalance and hybrid feature selection techniques to build the prediction model. The outcome of each node is a set of genes that can separate the Gleason group from the remaining groups. To validate the proposed method, the set of identified genes are used to classify a second dataset of 499 prostate cancer patients that have been collected from cBioportal.. 3) Results: Two genes have been found to be potential biomarkers of specific Gleason groups; PIAS3 has been identifed for Gleason score 4+3=7, while UBE2 could be a poteintial biomarker for Gleason score 6. Other proposed genes that were not found in the literature might be potential biomarkers. 4) Conclusion: The latest literature supports that the genes predicted by the proposed method are strongly correlated with prostate cancer progression and tumour development processes. Furthermore, pathway analysis shows that both PIAS3 and UBE2 share the same protein interaction pathway, the JAK/STAT signaling process.

Keywords

classification; prostate cancer; gleason score; machine learning; next-generation sequencing

Subject

Medicine and Pharmacology, Oncology and Oncogenics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.