Submitted:
13 September 2024
Posted:
14 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. State-of-the-art Algorithms for Prediction in Healthcare
2.1. Classification Algorithms
KNN algorithm
| Algorithm 1 K-Nearest Neighbors Algorithm [13] |
|
LR algorithm
| Algorithm 2 Logistic Regression Algorithm [13] |
|
RF algorithm
| Algorithm 3 Random Forest Algorithm [13] |
|
2.2. Sampling Methods
2.2.1. Undersampling
2.2.2. Oversampling
3. Materials and methods
3.1. Dataset description
3.2. Data Pre-processing
3.3. Model training and evaluation
4. Results
5. Discussions
5.1. Benefits of prediction in healthcare
5.2. Scientific literature advances
5.3. Results of this research
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ADASYN | Adaptive Synthetic Sampling Approach for Imbalanced Learning |
| All-KNN | All K-Nearest Neighbours |
| AUC | Area Under the Curve |
| bLR | Binary Logistic Regression |
| CI | Confidence Interval |
| CNN | Condensed Nearest Neighbour |
| EB | Entropy-Based |
| ENN | Edited Nearest Neighbours |
| IB2 | Instance-Based Learning 2 |
| ICA | Independent Component Analysis |
| IHT | Instance Hardness Threshold |
| IR | Imbalance Ratio |
| KNN | K-Nearest Neighbours |
| LASSO | Least Absolute Shrinkage and Selection Operator |
| LDA | Linear Discriminant Analysis |
| LR | Logistic Regression |
| ML | Machine Learning |
| NCR | Neighbourhood Cleaning Rule |
| NNs | Neural Networks |
| NM | NearMiss |
| RF | Random Forest |
| RENN | Repeated Edited Nearest Neighbours |
| ROC | Receiver Operating Characteristic |
| SMOTE | Synthetic Minority Oversampling Technique |
| SVM | Support Vector Machine |
| TL | Tomek Links |
References
- Newaz, A.; Muhtadi, S.; Haq, F.S. An intelligent decision support system for the accurate diagnosis of cervical cancer. Knowledge-Based Systems 2022, 245, 108634. [Google Scholar] [CrossRef]
- Bowden, S.J.; Doulgeraki, T.; Bouras, E.; et al. Risk factors for human papillomavirus infection, cervical intraepithelial neoplasia and cervical cancer: an umbrella review and follow-up Mendelian randomisation studies. BMC Medicine 2023, 21, 274. [Google Scholar] [CrossRef] [PubMed]
- Machado, D.; Santos Costa, V.; Brandão, P. Using Balancing Methods to Improve Glycaemia-Based Data Mining. In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 5: HEALTHINF; SciTePress, 2023; pp. 188–198. [Google Scholar] [CrossRef]
- Alfakeeh, A.S.; Javed, M.A. Efficient Resource Allocation in Blockchain-Assisted Health Care Systems. Applied Sciences 2023, 13, 9625. [Google Scholar] [CrossRef]
- Jo, W.; Kim, D. OBGAN: Minority oversampling near borderline with generative adversarial networks. Expert Systems with Applications 2022, 197, 116694. [Google Scholar] [CrossRef]
- Lopo, J.A.; Hartomo, K.D. Evaluating Sampling Techniques for Healthcare Insurance Fraud Detection in Imbalanced Dataset. Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI) 2023, 9, 223–238. [Google Scholar]
- Wang, W.; Chakraborty, G.; Chakraborty, B. Predicting the Risk of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm. Applied Sciences 2021, 11, 202. [Google Scholar] [CrossRef]
- Papakostas, M.; Das, K.; Abouelenien, M.; Mihalcea, R.; Burzo, M. Distracted and Drowsy Driving Modeling Using Deep Physiological Representations and Multitask Learning. Applied Sciences 2021, 11, 88. [Google Scholar] [CrossRef]
- Suhas, S.; Manjunatha, N.; Kumar, C.N.; Benegal, V.; Rao, G.N.; Varghese, M.; Gururaj, G. Firth’s penalized logistic regression: A superior approach for analysis of data from India’s National Mental Health Survey, 2016. Indian Journal of Psychiatry 2023, 65, 1208–1213. [Google Scholar] [CrossRef]
- Yang, C.; Fridgeirsson, E.A.; Kors, J.A.; et al. Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data. J Big Data 2024, 11. [Google Scholar] [CrossRef]
- Awe, O.O.; Ojumu, J.B.; Ayanwoye, G.A.; Ojumoola, J.S.; Dias, R. Machine Learning Approaches for Handling Imbalances in Health Data Classification. In Sustainable Statistical and Data Science Methods and Practices; Awe, O.O., Vance, E.A., Eds.; Springer: Cham, 2023; pp. 19–33. [Google Scholar] [CrossRef]
- Sajana, T.; Rao, K.V.S.N. Machine Learning Algorithms for Health Care Data Analytics Handling Imbalanced Datasets. In Handbook of Artificial Intelligence; 2023; p. 75. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, 2009; Part of the Springer Series in Statistics (SSS). [Google Scholar]
- Wang, L.; Han, M.; Li, X.; Zhang, N.; Cheng, H. Review of Classification Methods on Unbalanced Data Sets. IEEE Access 2021, 9, 64606–64628. [Google Scholar] [CrossRef]
- Zhao, H.; Wang, R.; Lei, Y.; Liao, W.-H.; Cao, H.; Cao, J. Severity level diagnosis of Parkinson’s disease by ensemble K-nearest neighbor under imbalanced data. Expert Systems with Applications 2022, 189, 116113. [Google Scholar] [CrossRef]
- Vommi, A.M.; Battula, T.K. A hybrid filter-wrapper feature selection using Fuzzy KNN based on Bonferroni mean for medical datasets classification: A COVID-19 case study. Expert Systems with Applications 2023, 218, 119612. [Google Scholar] [CrossRef]
- Iantovics, L.B.; Enăchescu, C. Method for Data Quality Assessment of Synthetic Industrial Data. Sensors 2022, 22, 1608. [Google Scholar] [CrossRef] [PubMed]
- Lynam, A.L.; Dennis, J.M.; Owen, K.R.; et al. Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adults. Diagn Progn Res 2020, 4, 6. [Google Scholar] [CrossRef] [PubMed]
- Morgado, J.; Pereira, T.; Silva, F.; Freitas, C.; Negrão, E.; de Lima, B.F.; da Silva, M.C.; Madureira, A.J.; Ramos, I.; Hespanhol, V.; et al. Machine Learning and Feature Selection Methods for EGFR Mutation Status Prediction in Lung Cancer. Applied Sciences 2021, 11, 3273. [Google Scholar] [CrossRef]
- Saharan, S.S.; Nagar, P.; Creasy, K.T.; Stock, E.O.; James, F.; Malloy, M.J.; Kane, J.P. Logistic Regression and Statistical Regularization Techniques for Risk Classification of Coronary Artery Disease Using Cytokines Transported by High Density Lipoproteins. In Proceedings of the 2023 International Conference on Computational Science and Computational Intelligence (CSCI); IEEE, 2023; pp. 652–660. [Google Scholar] [CrossRef]
- Ayoub, S.; Mohammed Ali, A.G.; Narhimene, B. Enhanced Intrusion Detection System for Remote Healthcare. In Advances in Computing Systems and Applications; Senouci, M.R., Boulahia, S.Y., Benatia, M.A., Eds.; Lecture Notes in Networks and Systems; Springer: Cham, 2022; Volume 513, pp. 1–11. [Google Scholar] [CrossRef]
- Xin, L.K.; Rashid, N.b.A. Prediction of Depression among Women Using Random Oversampling and Random Forest. In Proceedings of the 2021 International Conference of Women in Data Science at Taif University (WiDSTaif); 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Loef, B.; Wong, A.; Janssen, N.A.H.; et al. Using random forest to identify longitudinal predictors of health in a 30-year cohort study. Sci Rep 2022, 12, 10372. [Google Scholar] [CrossRef]
- Filippakis, P.; Ougiaroglou, S.; Evangelidis, G. Prototype Selection for Multilabel Instance-Based Learning. Information 2023, 14, 572. [Google Scholar] [CrossRef]
- Khushi, M.; et al. A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data. IEEE Access 2021, 9, 109960–109975. [Google Scholar] [CrossRef]
- AlMahadin, G.; Lotfi, A.; Carthy, M.M.; et al. Enhanced Parkinson’s Disease Tremor Severity Classification by Combining Signal Processing with Resampling Techniques. SN COMPUT. SCI. 2022, 3. [Google Scholar] [CrossRef]
- Bounab, R.; Zarour, K.; Guelib, B.; Khlifa, N. Enhancing Medicare Fraud Detection Through Machine Learning: Addressing Class Imbalance With SMOTE-ENN. IEEE Access 2024, 12, 54382–54396. [Google Scholar] [CrossRef]
- Bach, M.; Trofimiak, P.; Kostrzewa, D.; Werner, A. CLEANSE – Cluster-based Undersampling Method. Procedia Computer Science 2023, 225, 4541–4550. [Google Scholar] [CrossRef]
- Tumuluru, P.; Daniel, R.; Mahesh, G.; Lakshmi, K.D.; Mahidhar, P.; Kumar, M.V. Class Imbalance of Bio-Medical Data by Using PCA-Near Miss for Classification. In Proceedings of the 2023 5th International Conference on Inventive Research in Computing Applications (ICIRCA); Coimbatore, India; 2023; pp. 1832–1839. [Google Scholar] [CrossRef]
- Iantovics, L.B.; Rotar, C.; Morar, F. Survey on establishing the optimal number of factors in exploratory factor analysis applied to data mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2019, 9, e1294. [Google Scholar] [CrossRef]
- Hassanzadeh, R.; Farhadian, M.; Rafieemehr, H. Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms. BMC Med Res Methodol 2023, 23, 101. [Google Scholar] [CrossRef] [PubMed]
- Sinha, N.; Kumar, M.A.G.; Joshi, A.M.; Cenkeramaddi, L.R. DASMcC: Data Augmented SMOTE Multi-Class Classifier for Prediction of Cardiovascular Diseases Using Time Series Features. IEEE Access 2023, 11, 117643–117655. [Google Scholar] [CrossRef]
- Bektaş, J. EKSL: An effective novel dynamic ensemble model for unbalanced datasets based on LR and SVM hyperplane-distances. Information Sciences 2022, 597, 182–192. [Google Scholar] [CrossRef]
- Ahmed, G.; et al. DAD-Net: Classification of Alzheimer’s Disease Using ADASYN Oversampling Technique and Optimized Neural Network. Molecules 2022, 27, 7085. [Google Scholar] [CrossRef]
- Cervical cancer (Risk Factors) Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/Cervical+cancer (accessed on April 2024).
- Pinheiro, V.C.; do Carmo, J.C.; de O. Nascimento, F.A.; Miosso, C.J. System for the analysis of human balance based on accelerometers and support vector machines. Computer Methods and Programs in Biomedicine Update 2023, 4, 100123. [Google Scholar] [CrossRef]
- Iantovics, L. B.; Dehmer, M.; Emmert-Streib, F. MetrIntSimil—An Accurate and Robust Metric for Comparison of Similarity in Intelligence of Any Number of Cooperative Multiagent Systems. Symmetry 2018, 10, 48. [Google Scholar] [CrossRef]
- Darville, J.; Yavuz, A.; Runsewe, T.; Celik, N. Effective sampling for drift mitigation in machine learning using scenario selection: A microgrid case study. Applied Energy 2023, 341, 121048. [Google Scholar] [CrossRef]
- Ibrahim, K.S.M.H.; Huang, Y.F.; Ahmed, A.N.; Koo, C.H.; El-Shafie, A. A review of the hybrid artificial intelligence and optimization modelling of hydrological streamflow forecasting. Alexandria Engineering Journal 2022, 61, 279–303. [Google Scholar] [CrossRef]
- Naidu, G.; Zuva, T.; Sibanda, E.M. A Review of Evaluation Metrics in Machine Learning Algorithms. In Artificial Intelligence Application in Networks and Systems; CSOC 2023. Lecture Notes in Networks and Systems; Silhavy, R., Silhavy, P., Eds.; Springer: Cham, 2023; Volume 724. [Google Scholar] [CrossRef]
- Chen, R.J.; Wang, J.J.; Williamson, D.F.K.; et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nature Biomedical Engineering 2023, 7, 719–742. [Google Scholar] [CrossRef] [PubMed]
- Ng, A.P.; Koumchatzky, N. Machine Learning Engineering with Python - Second Edition; Packt Publishing: Birmingham, UK, 2023; EAN 9781837631964; 462 pages, Paperback. [Google Scholar]
- Edward, J.; Rosli, M.M.; Seman, A. A New Multi-Class Rebalancing Framework for Imbalance Medical Data. IEEE Access 2023, 11, 92857–92874. [Google Scholar] [CrossRef]
- Manchadi, O.; Ben-Bouazza, F.-E.; Jioudi, B. Predictive Maintenance in Healthcare System: A Survey. IEEE Access 2023, 11, 61313–61330. [Google Scholar] [CrossRef]
- Rubinger, L.; Gazendam, A.; Ekhtiari, S.; Bhandari, M. Machine learning and artificial intelligence in research and healthcare. Injury 2023, 54 (Suppl. S3), S69–S73. [Google Scholar] [CrossRef] [PubMed]
- Badawy, M.; Ramadan, N.; Hefny, H.A. Healthcare predictive analytics using machine learning and deep learning techniques: a survey. Journal of Electrical Systems and Information Technology 2023, 10, 40. [Google Scholar] [CrossRef]
- Subrahmanya, S.V.G.; Shetty, D.K.; Patil, V.; et al. The role of data science in healthcare advancements: applications, benefits, and future prospects. Irish Journal of Medical Science 2022, 191, 1473–1483. [Google Scholar] [CrossRef]
- Alsmariy, R.; Healy, G.; Abdelhafez, H. Predicting Cervical Cancer using Machine Learning Methods. International Journal of Advanced Computer Science and Applications (IJACSA) 2020, 11, 7. [Google Scholar] [CrossRef]
- Rajendran, R.; Karthi, A. Heart Disease Prediction using Entropy Based Feature Engineering and Ensembling of Machine Learning Classifiers. Expert Systems with Applications 2022, 207, 117882. [Google Scholar] [CrossRef]
- Toğaçar, M.; Ergen, B.; Cömert, Z.; Özyurt, F. A Deep Feature Learning Model for Pneumonia Detection Applying a Combination of mRMR Feature Selection and Machine Learning Models. IRBM 2020, 41, 212–222. [Google Scholar] [CrossRef]




| Attribute | Type | Attribute | Type |
|---|---|---|---|
| Age | int | STDs:pelvic inflammatory disease | bool |
| Number of sexual partners | int | STDs:genital herpes | bool |
| First sexual intercourse (age) | int | STDs:molluscum contagiosum | bool |
| Number of pregnancies | int | STDs:AIDS | bool |
| Smokes | bool | STDs:HIV | bool |
| Smokes (years) | int | STDs:Hepatitis B | bool |
| Smokes (packs/year) | int | STDs:HPV | bool |
| Hormonal Contraceptives | bool | STDs: Number of diagnosis | int |
| Hormonal Contraceptives (years) | int | STDs: Time since first diagnosis | int |
| IUD | bool | STDs: Time since last diagnosis | int |
| IUD (years) | int | Dx:Cancer | bool |
| STDs | bool | Dx:CIN | bool |
| STDs (number) | int | Dx:HPV | bool |
| STDs:condylomatosis | bool | Dx | bool |
| STDs:cervical condylomatosis | bool | Hinselmann: target variable | bool |
| STDs:vaginal condylomatosis | bool | Schiller: target variable | bool |
| STDs:vulvo-perineal condylomatosis | bool | Cytology: target variable | bool |
| STDs:syphilis | bool | Biopsy: target variable | bool |
| Prediction | ||
|---|---|---|
| Actual | Positive | Negative |
| Positive | TP | FN |
| Negative | FP | TN |
| Metric/Sampling | KNN2 | KNN3 | bLR | RF |
|---|---|---|---|---|
| Accuracy | 93.12 | 91.53 | 92.59 | 95.76 |
| Ba Accuracy | 53.57 | 52.71 | 50.00 | 71.42 |
| Precision | 96.54;93.59 | 58.98;87.93 | 46.29;85.73 | 97.81;95.95 |
| Recall | 53.57;93.12 | 52.71;91.53 | 50.00;92.59 | 71.42;95.76 |
| MajClass | 100 | 98.28 | 100 | 100 |
| MinClass | 7.14 | 7.14 | 0 | 42.85 |
| F1-Score | 54.87;90.26 | 53.33;89.30 | 48.07;89.03 | 78.88;94.96 |
| ROC_AUC | 0.61 | 0.63 | 0.71 | 0.99 |
| Metric/Sampling | SMOTE | SVM SMOTE | Borderline SMOTE | ADASYN |
|---|---|---|---|---|
| Accuracy | 88.35 | 91.01 | 90.47 | 87.83 |
| Ba Accuracy | 60.85 | 65.57 | 68.57 | 60.57 |
| Precision (Mac;Wght) | 59.60;89.09 | 66.67;90.70 | 66.43;91.08 | 58.85;87.83 |
| Recall (Mac;Wght) | 60.85;88.35 | 65.57;91.00 | 68.57;90.47 | 60.57;87.83 |
| MajClass | 93.14 | 95.43 | 94.29 | 92.57 |
| MinClass | 28.57 | 35.71 | 42.86 | 28.57 |
| F1-Score (Mac;Wght) | 60.17;88.71 | 66.09;90.85 | 67.41;90.76 | 59.58;88.36 |
| ROC_AUC | 0.72 | 0.74 | 0.74 | 0.72 |
| Metric/Sampling | CNN | TL | All-KNN | RENN | ENN | NCR | NM1 | IHT |
|---|---|---|---|---|---|---|---|---|
| Accuracy | 93.65 | 92.59 | 87.30 | 87.30 | 87.30 | 90.47 | 75.66 | 45.50 |
| Ba Accuracy | 60.42 | 53.28 | 50.42 | 50.42 | 50.42 | 55.42 | 77.00 | 64.00 |
| Precision (Mac;Wght) | 84.52;92.64 | 71.52;89.85 | 50.49;86.40 | 50.49;86.40 | 50.49;86.40 | 59.18;88.30 | 59.07;92.04 | 53.99;90.94 |
| Recall (Mac;Wght) | 60.42;93.65 | 53.28;92.59 | 50.42;87.30 | 50.42;87.30 | 50.42;87.30 | 55.42;90.47 | 77.00;75.66 | 64.00;45.50 |
| MajClass | 99.43 | 99.43 | 93.71 | 93.71 | 93.71 | 96.57 | 75.43 | 42.29 |
| MinClass | 21.43 | 7.14 | 7.14 | 7.14 | 7.14 | 14.29 | 78.57 | 85.71 |
| F1-Score (Mac;Wght) | 65.00;91.97 | 54.31;89.93 | 50.43;86.84 | 50.43;86.84 | 50.43;86.84 | 56.56;89.25 | 58.75;81.24 | 38.93;55.99 |
| ROC_AUC | 0.70 | 0.61 | 0.61 | 0.62 | 0.62 | 0.63 | 0.80 | 0.65 |
| Metric/Sampling | SMOTE | SVM SMOTE | Borderline SMOTE | ADASYN |
|---|---|---|---|---|
| Accuracy | 86.24 | 89.94 | 88.35 | 87.30 |
| Ba Accuracy | 72.85 | 74.85 | 74.00 | 73.42 |
| Precision (Mac;Wght) | 62.42;91.25 | 67.26;92.10 | 64.48;91.69 | 63.54;91.46 |
| Recall (Mac;Wght) | 72.85;86.24 | 74.85;89.94 | 74.00;88.35 | 73.42;87.30 |
| MajClass | 88.57 | 92.57 | 90.86 | 89.71 |
| MinClass | 57.14 | 57.14 | 57.14 | 51.47 |
| F1-Score (Mac;Wght) | 65.17;88.24 | 70.08;90.84 | 67.81;89.72 | 66.44;88.98 |
| ROC_AUC | 0.71 | 0.73 | 0.73 | 0.72 |
| Metric/Sampling | CNN | TL | All-KNN | RENN | ENN | NCR | NM1 | IHT |
|---|---|---|---|---|---|---|---|---|
| Accuracy | 90.47 | 91.53 | 86.24 | 86.24 | 87.30 | 88.35 | 69.84 | 43.91 |
| Ba Accuracy | 68.57 | 52.71 | 49.85 | 49.85 | 50.42 | 54.28 | 77.14 | 63.14 |
| Precision (Mac;Wght) | 66.43;91.08 | 58.98;87.93 | 49.85;86.24 | 49.85;86.24 | 50.49;86.40 | 54.94;87.54 | 58.13;92.40 | 53.80;90.82 |
| Recall (Mac;Wght) | 68.57;90.47 | 52.71;91.53 | 49.85;86.24 | 49.85;86.24 | 50.42;87.30 | 54.28;88.35 | 77.14;69.84 | 63.14;43.91 |
| MajClass | 94.29 | 98.29 | 92.57 | 92.57 | 93.71 | 94.29 | 68.57 | 40.57 |
| MinClass | 42.86 | 7.14 | 7.14 | 7.14 | 7.14 | 14.29 | 85.71 | 85.71 |
| F1-Score (Mac;Wght) | 67.41;90.76 | 53.33;89.30 | 49.85;86.24 | 49.85;86.24 | 50.43;86.84 | 54.56;87.94 | 55.21;77.01 | 37.85;54.38 |
| ROC_AUC | 0.67 | 0.63 | 0.60 | 0.60 | 0.61 | 0.62 | 0.79 | 0.64 |
| Metric/Sampling | SMOTE | SVM SMOTE | Borderline SMOTE | ADASYN |
|---|---|---|---|---|
| Accuracy | 86.77 | 88.35 | 86.77 | 85.18 |
| Ba Accuracy | 89.57 | 51.00 | 86.28 | 85.42 |
| Precision (Mac;Wght) | 67.72;94.66 | 50.89;86.50 | 66.49;93.92 | 65.12;93.70 |
| Recall (Mac;Wght) | 89.57;86.77 | 51.00;88.35 | 86.28;86.77 | 85.42;85.18 |
| MajClass | 86.86 | 94.29 | 86.86 | 85.14 |
| MinClass | 92.86 | 7.14 | 85.71 | 85.71 |
| F1-Score (Mac;Wght) | 72.34;89.66 | 50.74;87.15 | 70.69;89.19 | 68.78;88.05 |
| ROC_AUC | 0.92 | 0.87 | 0.89 | 0.91 |
| Metric/Sampling | CNN | TL | All-KNN | RENN | ENN | NCR | NM1 | IHT |
|---|---|---|---|---|---|---|---|---|
| Accuracy | 92.59 | 92.59 | 92.59 | 92.59 | 92.59 | 92.59 | 58.20 | 41.79 |
| Ba Accuracy | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 57.71 | 58.71 |
| Precision (Mac;Wght) | 46.29;85.73 | 46.29;85.73 | 46.29;85.73 | 46.29;85.73 | 46.29;85.73 | 46.29;85.73 | 52.16;88.18 | 52.54;89.37 |
| Recall (Mac;Wght) | 50.00;92.59 | 50.00;92.59 | 50.00;92.59 | 50.00;92.59 | 50.00;92.59 | 50.00;92.59 | 57.71;58.20 | 58.71;41.79 |
| MajClass | 100 | 100 | 100 | 100 | 100 | 100 | 58.29 | 38.86 |
| MinClass | 0 | 0 | 0 | 0 | 0 | 0 | 57.14 | 78.57 |
| F1-Score (Mac;Wght) | 48.07;89.03 | 48.07;89.03 | 48.07;89.03 | 48.07;89.03 | 48.07;89.03 | 48.07;89.03 | 44.46;67.99 | 35.97;67.99 |
| ROC_AUC | 0.61 | 0.73 | 0.71 | 0.71 | 0.72 | 0.72 | 0.66 | 0.59 |
| Metric/Sampling | SMOTE | SVM SMOTE | Borderline SMOTE | ADASYN |
|---|---|---|---|---|
| Accuracy | 98.41 | 98.41 | 98.41 | 98.41 |
| Ba Accuracy | 92.57 | 92.57 | 92.57 | 98.41 |
| Precision (Mac;Wght) | 95.58;98.37 | 95.58;98.37 | 95.58;98.37 | 95.58;98.37 |
| Recall (Mac;Wght) | 92.57;98.41 | 92.57;98.41 | 92.57;98.41 | 92.57;98.41 |
| MajClass | 99.43 | 99.43 | 99.43 | 99.43 |
| MinClass | 85.71 | 85.71 | 85.71 | 85.71 |
| F1-Score (Mac;Wght) | 94.01;98.38 | 94.01;98.38 | 94.01;98.38 | 94.01;98.38 |
| ROC_AUC | 0.99 | 0.99 | 0.99 | 0.99 |
| Metric/Sampling | CNN | TL | All-KNN | RENN | ENN | NCR | NM1 | IHT |
|---|---|---|---|---|---|---|---|---|
| Accuracy | 98.94 | 95.76 | 95.23 | 97.35 | 96.29 | 96.29 | 95.76 | 76.19 |
| Ba Accuracy | 96.14 | 71.42 | 67.85 | 82.14 | 75.00 | 75.00 | 97.71 | 83.85 |
| Precision (Mac;Wght) | 96.14;98.94 | 97.81;95.95 | 96.14;98.94 | 98.07;96.43 | 98.07;96.43 | 98.07;96.43 | 81.81;97.30 | 61.02;93.58 |
| Recall (Mac;Wght) | 96.14;98.94 | 71.42;95.76 | 67.85;95.23 | 82.14;97.35 | 75.00;96.29 | 75.00;96.29 | 97.71;96.43 | 83.85;76.19 |
| MajClass | 99.43 | 100 | 100 | 100 | 100 | 100 | 95.43 | 74.86 |
| MinClass | 92.86 | 42.86 | 64.29 | 64.29 | 50 | 50 | 100 | 92.86 |
| F1-Score (Mac;Wght) | 96.14;98.94 | 78.88;94.96 | 75.06;94.17 | 82.35;95.71 | 82.35;95.71 | 82.35;95.71 | 87.71;96.18 | 60.98;81.73 |
| ROC_AUC | 0.99 | 0.99 | 1.0 | 1.0 | 0.99 | 0.99 | 0.99 | 0.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).