Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Using Machine Learning Algorithms to find Novel Biomarkers for Breast Cancer using RNA-Seq Dataset

Version 1 : Received: 30 August 2023 / Approved: 31 August 2023 / Online: 1 September 2023 (07:40:33 CEST)

How to cite: Abdul, A.; Paudel, R.; Rahman, M.M. Using Machine Learning Algorithms to find Novel Biomarkers for Breast Cancer using RNA-Seq Dataset. Preprints 2023, 2023090006. https://doi.org/10.20944/preprints202309.0006.v1 Abdul, A.; Paudel, R.; Rahman, M.M. Using Machine Learning Algorithms to find Novel Biomarkers for Breast Cancer using RNA-Seq Dataset. Preprints 2023, 2023090006. https://doi.org/10.20944/preprints202309.0006.v1

Abstract

Breast cancer (BC) is the second leading cause of death in the United States in women. 1 in 8 women in their lifetime has a risk of developing breast cancer. With advances in omics technology, more data is available for diagnosis, treatment, and prognosis. The help of Machine Learning algorithms, a subdomain of Artificial Intelligence, would allow for a better clinical support tool. Machine Learning could be the key to answering a complex problem in BC. The methodology of this research question involves collecting and finding data such as RNA-Seq datasets, selecting and training appropriate machine learning algorithms, and evaluating the performance of the algorithms in predicting BC prognosis. Also, the study explores and tries to identify the most relevant genes and pathways through feature selection extraction and dimensionality reduction techniques. After applying feature extraction, the top 10 genes were extracted using the Univariate Feature Selection (UFS) method, precisely the SelectKBest technique. After employing different machine learning algorithms, such as Networks, Random Forest, Linear SVM, Logistic Regression, and Quadratic Discriminant Analysis (QDA), we found QDA was the best model in classifying the RNA-seq dataset with a 96% accuracy rate and 94% ROC AUC. This study suggests that AK2 and CD68 have a positive correlation with each other, and it could potentially be a biomarker and therapeutic target for BC.

Keywords

breast cancer; machine learning; BRACA1; BRACA2; biomarker; QDA

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.