Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Towards the Gene Profile of Acute Myeloid Leukaemia Using Machine Learning and Blood Transcriptomics

Version 1 : Received: 8 February 2024 / Approved: 9 February 2024 / Online: 12 February 2024 (04:56:38 CET)

How to cite: Angelakis, A.; Nathoe, R.; Filippakis, M. Towards the Gene Profile of Acute Myeloid Leukaemia Using Machine Learning and Blood Transcriptomics. Preprints 2024, 2024020593. https://doi.org/10.20944/preprints202402.0593.v1 Angelakis, A.; Nathoe, R.; Filippakis, M. Towards the Gene Profile of Acute Myeloid Leukaemia Using Machine Learning and Blood Transcriptomics. Preprints 2024, 2024020593. https://doi.org/10.20944/preprints202402.0593.v1

Abstract

Applying the iterative methodology for dimensionality reduction/feature selection using categorical gradient boosted trees, as it has been defined in and has been successfully applied on similar datasets in and , on a dataset consisted of 12708 gene expressions coming from 5052 individuals from 105 studies, we classify whether a person has acute myeloid leukaemia (AML) or is healthy. A CatBoost model on a dataset with reduced dimensions of 72 genes reached a ROC-AUC score of 0.9973 using ten fold cross validation (10CV) and ROC-AUC: 0.9988 on an inference dataset. We further investigate the options of using less genes that potentially could be used in clinical practise and genes than have not been associated to AML yet, or to blood cancer in general. On the same folds of the 10CV and on the same inference dataset the performance of the tuned CatBoost models show that it could be the case that not all genes with an association to AML have been found yet and 19 genes could be enough to predict AML: CatBoost63 (ROC-AUC: 0.9941, Test: 0.9942), CatBoost19: (ROC-AUC: 0.9946, Test: 0.9941) and CatBoost15: (ROC-AUC: 0.9922, Test: 0.9900). In addition, our results verify that a gene diagnostic test for AML could be possible in the future as well as further research is needed on these 15 genes that it could lead to new and better drugs.

Keywords

acute myeloid leukemia; blood transcriptomics; machine learning; explainable artificial intelligence; Catboost

Subject

Medicine and Pharmacology, Hematology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.