Preprint Article Version 2 Preserved in Portico This version is not peer-reviewed

A Hierarchical Machine Learning Model to Discover Gleason Grade Group-specific Biomarkers in Prostate Cancer

Version 1 : Received: 8 October 2019 / Approved: 24 November 2019 / Online: 24 November 2019 (16:51:25 CET)
Version 2 : Received: 26 November 2019 / Approved: 26 November 2019 / Online: 26 November 2019 (04:01:21 CET)

A peer-reviewed article of this Preprint also exists.

Hamzeh, O.; Alkhateeb, A.; Zheng, J.Z.; Kandalam, S.; Leung, C.; Atikukke, G.; Cavallo-Medved, D.; Palanisamy, N.; Rueda, L. A Hierarchical Machine Learning Model to Discover Gleason Grade-Specific Biomarkers in Prostate Cancer. Diagnostics 2019, 9, 219. Hamzeh, O.; Alkhateeb, A.; Zheng, J.Z.; Kandalam, S.; Leung, C.; Atikukke, G.; Cavallo-Medved, D.; Palanisamy, N.; Rueda, L. A Hierarchical Machine Learning Model to Discover Gleason Grade-Specific Biomarkers in Prostate Cancer. Diagnostics 2019, 9, 219.

Abstract

1) Background: One of the most common cancer that affects men worldwide and North American men is prostate cancer. Gleason score is a pathological grading system to examine the potential aggressiveness of the disease in the prostate tissue. The advancement in computing and next-generation sequencing technology now allow us to study the genomic profiles of patients in association with their different Gleason score more accurately and effectively. 2) Methods: In this study, we used a novel machine learning method to analyze gene expression of prostate tumors with different Gleason scores, and identify potential genetic biomarkers for each Gleason group. We obtained a publicly-available RNA-Seq dataset of a cohort of 104 prostate cancer patients from the National Center for Biotechnology Information’s (NCBI) Gene Expression Omnibus (GEO) repository, and categorized patients based on their Gleason scores to create a hierarchy of disease progression. A hierarchical model with standard classifiers in different Gleason groups, also known as nodes, was developed to identify and predict nodes based on their mRNA or gene expression. In each node, patient samples were analyzed via class imbalance and hybrid feature selection techniques to build the prediction model. The outcome from analysis of each node is a set of genes that can differentiate each Gleason group from the remaining groups. To validate the proposed method, the set of identified genes are used to classify a second dataset of 499 prostate cancer patients collected from cBioportal [1]. 3) Results: The overall accuracy of applying this novel method to the first dataset was 93.3%, and further validated to 87% accuracy using the second dataset. This method also identified genes that were not previously reported as potential biomarkers for specific Gleason groups. In particular, PIAS3 was identified as a potential biomarker for Gleason score 4+3=7, and UBE2V2 for Gleason score 6. 4) Insight: Previous reports show that the genes predicted by this newly proposed method strongly correlate with prostate cancer development and progression. Furthermore, pathway analysis shows that both PIAS3 and UBE2V2 share similar protein interaction pathways, the JAK/STAT signaling process.

Keywords

classification; prostate cancer; gleason score; machine learning; next-generation sequencing

Subject

Medicine and Pharmacology, Oncology and Oncogenics

Comments (1)

Comment 1
Received: 26 November 2019
Commenter: Abedalrhman Alkhateeb
Commenter's Conflict of Interests: Author
Comment: Added the bioloical validation of the findings
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.