Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Machine Learning Driven Dashboard for Chronic Myeloid Leukemia Prediction using Protein Sequences

Version 1 : Received: 30 November 2023 / Approved: 1 December 2023 / Online: 1 December 2023 (05:39:10 CET)

How to cite: Ahmad, W.; Iqbal, M.; Amin, M.A.; Bangyal, W.H.; Shahzad, A.R. Machine Learning Driven Dashboard for Chronic Myeloid Leukemia Prediction using Protein Sequences. Preprints 2023, 2023120053. https://doi.org/10.20944/preprints202312.0053.v1 Ahmad, W.; Iqbal, M.; Amin, M.A.; Bangyal, W.H.; Shahzad, A.R. Machine Learning Driven Dashboard for Chronic Myeloid Leukemia Prediction using Protein Sequences. Preprints 2023, 2023120053. https://doi.org/10.20944/preprints202312.0053.v1

Abstract

In Southeast Asia, the incidence of Leukemia, a malignant blood cancer originating from hema-topoietic progenitor cells, is on the rise, marked by a concerning 54% mortality rate. This study focuses on enhancing early-stage prediction to improve patient recovery prospects significantly. Leveraging Machine Learning and Data Science, we employ protein sequential data from frequently mutated genes such as BCL2, HSP90, PARP, and RB to predict Chronic Myeloid Leukemia (CML). Our approach relies on robust feature extraction techniques, namely Di-peptide Composition (DPC), Amino Acid Composition (AAC), and Pseudo amino acid composition (Pse-AAC), with prior attention to addressing outliers and validating feature selection through the Pearson Corre-lation Coefficient. Data augmentation ensures a well-rounded dataset for analysis. Employing a range of Machine Learning models, including Support Vector Machine (SVM), XGBoost, Random Forest (RF), K Nearest Neighbor (KNN), Decision Tree (DT), and Logistic Regression (LR), we achieve accuracy rates spanning from 66% to 94%. These classifiers undergo comprehensive as-sessment using performance metrics such as accuracy, sensitivity, specificity, F1-score, and the confusion matrix. Our proposed solution, encompassing a user-friendly web application dashboard, presents an invaluable tool for early CML diagnosis with profound implications for practitioners, offering a deploy-able asset within healthcare institutions and hospitals.

Keywords

Protein Sequences; Pseudo-AAC; AAC; Dipeptide-C; Machine Learning Classifiers; Chronic Myeloid Leukemia; Blood Cancer

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.