Submitted:
22 October 2024
Posted:
23 October 2024
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
2. Literature Review
3. Data and Methods
3.1. Data Introduction

3.2. T-Distributed Stochastic Neighbor Embedding
3.3. Principal Component Analysis
3.4. Extreme Gradient Boosting
4. Model Analysis
5. Conclusions
References
- Zelli, V.; Manno, A.; Compagnoni, C.; et al. Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations. J. Transl. Med. 2023, 21, 836. [Google Scholar] [CrossRef] [PubMed]
- Hoque, R.; Das, S.; Hoque, M.; et al. Breast Cancer Classification using XGBoost. World J. Adv. Res. Rev. 2024, 21, 1985–1994. [Google Scholar] [CrossRef]
- Song, Y.; Westerhuis, J.A.; Aben, N.; et al. Principal component analysis of binary genomics data. Brief. Bioinform. 2019, 20, 317–329. [Google Scholar] [CrossRef] [PubMed]
- Laghmati, S.; Hamida, S.; Hicham, K.; et al. An improved breast cancer disease prediction system using ML and PCA. Multimed. Tools Appl. 2024, 83, 33785–33821. [Google Scholar] [CrossRef]
- Sharma, A.; Mishra, P.K. Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis. Int. J. Inf. Technol. 2022, 14, 1949–1960. [Google Scholar] [CrossRef]
- Nguyen, Q.H.; Do TT, T.; Wang, Y.; et al. Breast cancer prediction using feature selection and ensemble voting[C]//2019 International Conference on System Science and Engineering (ICSSE). IEEE, 2019, 250-254.
- Song, X.; Zhu, J.; Tan, X.; et al. XGBoost-based feature learning method for mining COVID-19 novel diagnostic markers. Front. Public Health 2022, 10, 926069. [Google Scholar] [CrossRef] [PubMed]
- Liew, X.Y.; Hameed, N.; Clos, J. An investigation of XGBoost-based algorithm for breast cancer classification. Mach. Learn. Appl. 2021, 6, 100154. [Google Scholar] [CrossRef]
- Meng, C.; Zeleznik, O.A.; Thallinger, G.G.; et al. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief. Bioinform. 2016, 17, 628–641. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Liu, C.C.; Li, W.; et al. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res. 2012, 40, 9379–9391. [Google Scholar] [CrossRef] [PubMed]
- Chiu PK, F.; Shen, X.; Wang, G.; et al. Enhancement of prostate cancer diagnosis by machine learning techniques: an algorithm development and validation study. Prostate Cancer Prostatic Dis. 2022, 25, 672–676. [Google Scholar] [CrossRef] [PubMed]


| precision | recall | f1-score | support | |
|---|---|---|---|---|
| 0 | 0.50 | 0.67 | 0.57 | 9 |
| 1 | 0.93 | 0.78 | 0.85 | 18 |
| 2 | 0.83 | 1.00 | 0.91 | 5 |
| 3 | 1.00 | 0.77 | 0.87 | 13 |
| 4 | 1.00 | 0.50 | 0.67 | 14 |
| 5 | 1.00 | 1.00 | 1.00 | 2 |
| micro avg | 0.85 | 0.72 | 0.78 | 61 |
| macro avg | 0.88 | 0.79 | 0.81 | 61 |
| weighted avg | 0.89 | 0.72 | 0.78 | 61 |
| samples avg | 0.69 | 0.72 | 0.70 | 61 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).