Submitted:
09 September 2024
Posted:
09 September 2024
You are already at the latest version
Abstract
Keywords:
MSC: 68U01; 62P10; 62P20
1. Introduction
- Mitigating the impact of extreme class imbalance on the random forest ensemble.
- Generating synthetic data from the minority class observations during the training of the tree forest.
- Exploring the concept of tree selection in combination with data balancing to achieve an overall improved ensemble.
2. Materials and Methods
2.1. Balancing the Training Data
2.2. Enhanced Tree Ensembles via Out-of-Bag () Observations:
2.3. Enhanced Tree Ensembles using Sub-Samples () Observations
- Training data consisting of observations and p variables;
- ← Number of majority class observations;
- ←Number of minority class observations;
- ←Balanced data
- ← Bootstrap sample
- If || < ||; .
- for 1 : : do
- Using the training data (), take a bootstrap sample from the minority class;
- If a feature is continuous, find its mean ().
- If categorical, find its mode ()
- Concatenate the values in Steps 9 and 10 to get a new row arranged according to the original training data.
- Add the new row () to the training data .
- Combine the training data () with generated data () to obtain the balanced data ()
- end for
- for t 1 : T do
- Take a bootstrap/sub-sample () from balanced training data ().
- Store OOB/out of sample observations.
- Grow classification tree (G(B)) on the bootstrap/ sub-sample ().
- Use OOB/out of sample observations and estimate prediction error ().
- end for
- Arrange the trees in ascending order with respect to OOB/out-of-sample errors.
- Select the top ranked trees () as the final ensemble
3. Experiments and Results
4. Simulation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10, No. 2018). Cham: Springer.
- Hoens, T. R., & Chawla, N. V. (2013). Imbalanced datasets: from sampling to classifiers. Imbalanced learning: Foundations, algorithms, and applications, 43-59.
- Juba, B., & Le, H. S. (2019, July). Precision-recall versus accuracy and the role of large data sets. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 4039-4048).
- Tsai, H., Yang, T. W., Wong, W. M., Kao, H. Y., & Chou, C. F. (2024). A Hybrid Approach for Binary Classification of Imbalanced Data. International Journal of Computational Intelligence and Applications, 2450013.
- Chiamanusorn, C., & Sinapiromsaran, K. (2017, December). Extreme anomalous oversampling technique for class imbalance. In Proceedings of the 2017 International Conference on Information Technology (pp. 341-345).
- Emu, I. J. , Jahin, D., Akter, S., Patwary, M. J., & Akter, S. (2022, February). A novel technique to solve class imbalance problem. In 2022 international conference on innovations in science, engineering and technology (ICISET) (pp. 486-491). IEEE.
- Zakaria, A. Z. , Selamat, A., Cheng, L. K., & Krejcar, O. (2022, November). Improving Class Imbalance Detection And Classification Performance: A New Potential of Combination Resample and Random Forest. In 2022 IEEE International Conference on Computing (ICOCO) (pp. 316-323). IEEE.
- Velarde, G., Sudhir, A., Deshmane, S., Deshmunkh, A., Sharma, K., & Joshi, V. (2023). Evaluating XGBoost for balanced and imbalanced data: application to fraud detection. arXiv preprint arXiv:2303.15218.K.
- Weiss, G. M. , & Provost, F. (2001). The effect of class distribution on classifier learning: an empirical study.
- Fotouhi, S., Asadi, S., & Kattan, M. W. (2019). A comprehensive data level analysis for cancer diagnosis on imbalanced data. Journal of biomedical informatics, 90, 103089.
- Brabec, J., & Machlica, L. (2018). Bad practices in evaluation methodology relevant to class-imbalanced problems. arXiv preprint arXiv:1812.01388.
- Aswathi, M., Ghosh, A., & Namboothiri, L. V. (2022). Borda count versus majority voting for credit card fraud detection. In Ubiquitous Intelligent Systems: Proceedings of ICUIS 2021 (pp. 319-330). Springer Singapore.
- Di Martino, M. , Decia, F., Molinelli, J., & Fernández, A. (2012). Improving electric fraud detection using class imbalance strategies. In International Conference on Pattern Recognition Applications and Methods (IPRAM 2012).
- Rhmann, W. (2024). An empirical study on the class imbalance handling techniques for different diseases. Soft Computing, 1-18.
- Ali, M. Z., Rauf, S., Javed, K., & Hussain, S. (2021). Improving hate speech detection of Urdu tweets using sentiment analysis. IEEE Access, 9, 84296-84305.
- Adimoolam, Y. , Pillai, N. D., Lakshmanan, G., Mishra, D., & Dadhwal, V. K. (2022). Estimation of Above Ground Volume of Mangrove Forest Trees from Terrestrial LiDAR Data using Supervised Machine Learning Algorithms.
- Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 6(1), 20-29.
- Van Hulse, J., Khoshgoftaar, T. M., & Napolitano, A. (2007, June). Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th international conference on Machine learning (pp. 935-942).
- Homjandee, S., & Sinapiromsaran, K. (2021). A Random Forest with Minority Condensation and Decision Trees for Class Imbalanced Problems. WSEAS TRANSACTIONS ON SYSTEMS AND CONTROL, 16, 502-507.
- Dittman, D. J. , Khoshgoftaar, T. M., & Napolitano, A. (2015, August). The effect of data sampling when using random forest on imbalanced bioinformatics data. In 2015 IEEE international conference on information reuse and integration (pp. 457-463). IEEE.
- Pristyanto, Y. , Nugraha, A. F., Pratama, I., Dahlan, A., & Wirasakti, L. A. (2021, January). Dual approach to handling imbalanced class in datasets using oversampling and ensemble learning techniques. In 2021 15th international conference on ubiquitous information management and communication (IMCOM) (pp. 1-7). IEEE.
- Tangirala, S. (2020). Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm. International Journal of Advanced Computer Science and Applications, 11(2), 612-619.
- Pristyanto, Y., & Zein, A. A. (2023). Model Balanced Bagging Berbasis Decision Tree Pada Dataset Imbalanced Class. Jurnal Sisfokom (Sistem Informasi dan Komputer), 12(1), 9-15.
- Seiffert, C. , Khoshgoftaar, T. M., & Van Hulse, J. (2009). Hybrid sampling for imbalanced data. Integrated Computer-Aided Engineering, 16(3), 193-210.
- Kumar, S. , & Ratnoo, S. MULTI-OBJECTIVE HYPERPARAMETER TUNING OF CLASSIFIERS FOR DISEASE DIAGNOSIS.
- Owaida, M., Alonso, G., Fogliarini, L., Hock-Koon, A., & Melet, P. (2019). Lowering the latency of data processing pipelines through fpga based hardware acceleration. Proceedings of the VLDB Endowment, 13(1), 71-85. [CrossRef]
- Yasodhara, A., Asgarian, A., Huang, D., & Sobhani, P. (2021). On the trustworthiness of tree ensemble explainability methods. Lecture Notes in Computer Science, 293-308. [CrossRef]
- Zhou, L., & Wang, H. (2012). Loan default prediction on large imbalanced data using random forests. TELKOMNIKA Indonesian Journal of Electrical Engineering, 10(6), 1519-1525.
- Mohandoss, D. P. , Shi, Y., & Suo, K. (2021, January). Outlier prediction using random forest classifier. In 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 0027-0033). IEEE.
- Khan, Z. , Gul, N., Faiz, N., Gul, A., Adler, W., & Lausen, B. (2021). Optimal trees selection for classification via out-of-bag assessment and sub-bagging. IEEE Access, 9, 28591-28607.
- Agusta, Z. P. (2019). Modified balanced random forest for improving imbalanced data prediction. International Journal of Advances in Intelligent Informatics, 5(1), 58-65.
- ao, A. R., Wang, H., & Gupta, C. (2024). Predictive Analysis for Optimizing Port Operations. arXiv preprint arXiv:2401.14498.
- Li, Z. , Shahrajabian, H., Bagherzadeh, S. A., Jadidi, H., Karimipour, A., & Tlili, I. (2020). Effects of nano-clay content, foaming temperature and foaming time on density and cell size of PVC matrix foam by presented Least Absolute Shrinkage and Selection Operator statistical regression via suitable experiments as a function of MMT content. Physica A: Statistical Mechanics and its Applications, 537, 122637.
- Shahgholi, M. , Firouzi, P., Malekahmadi, O., Vakili, S., Karimipour, A., Ghashang, M.,... & Baghaei, S. (2022). Fabrication and characterization of nanocrystalline hydroxyapatite reinforced with silica-magnetite nanoparticles with proper thermal conductivity. Materials Chemistry and Physics, 289, 126439.
- Shu, Q. , Hu, T., & Liu, S. (2020, May). Random Forest Algorithm based on GAN for imbalanced data classification. In Journal of Physics: Conference Series (Vol. 1544, No. 1, p. 012014). IOP Publishing.
- Su, C., Ju, S., Liu, Y., & Yu, Z. (2015). Improving random forest and rotation forest for highly imbalanced datasets. Intelligent Data Analysis, 19(6), 1409-1432.
- Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
- Korn, J. (2024). Ensemble Classification: An Analysis of the Random Forest Model.
- Mišić, V. V. (2017). Optimization of tree ensembles. [CrossRef]
- Gul, N. , Faiz, N., Brawn, D., Kulakowski, R., Khan, Z., & Lausen, B. (2020). Optimal survival trees ensemble. [CrossRef]
- Ma, J. , Sheridan, R. P., Liaw, A., Dahl, G. E., & Svetnik, V. (2015). Deep neural nets as a method for quantitative structure–activity relationships. Journal of Chemical Information and Modeling, 55(2), 263-274. [CrossRef]
- Biggs, M., Hariss, R., & Perakis, G. (2023). Constrained optimization of objective functions determined from random forests. Production and Operations Management, 32(2), 397-415. [CrossRef]
- Rahman, R. , Haider, S., Ghosh, S., & Pal, R. (2015). Design of probabilistic random forests with applications to anticancer drug sensitivity prediction. Cancer Informatics, 14s5, CIN.S30794. [CrossRef]
- Wright, M. N. and Ziegler, A. (2017). ranger: a fast implementation of random forests for high dimensional data in c++ and r. Journal of Statistical Software, 77(1). [CrossRef]
- Khan, Z., Gul, A., Mahmoud, O., Miftahuddin, M., Perperoglou, A., Adler, W., … & Lausen, B. (2016). An ensemble of optimal trees for class membership probability estimation. Analysis of Large and Complex Data, 395-409. [CrossRef]
- López, O. A. M., López, A. M., & Crossa, J. (2022). Support vector machines and support vector regression. Multivariate Statistical Machine Learning Methods for Genomic Prediction, 337-378. [CrossRef]
- Meyer, D. , Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2014). e1071: Misc Functions of the Department of Statistics (e1071). R package version 1.6-4. TU Wien, Vienna.
- Karatzoglou, A. , Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab-an S4 package for kernel methods in R. Journal of statistical software, 11, 1-20.














| No | Dataset (DS) | Instances | Features | Class-based Distribution |
Source | |
|---|---|---|---|---|---|---|
| Breast Cancer | 569 | 31 | 357/212 | (1.6839:1) | https://www.kaggle.com/datasets/utkarshx27/breast-cancer-wisconsin-diagnostic-dataset | |
| Credit Card | 284807 | 30 | 284807/492 | (578.876:1) | https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud | |
| Drug Classification | 200 | 6 | 145/54 | (2.685:1) | https://openml.org/search?type=data&status=active&id=43382 | |
| Eeg eye | 5856 | 14 | 5708/148 | (38.567:1) | https://openml.org/search?type=data&status=active&sort=runs&id=1471 | |
| Glass Classification | 213 | 9 | 144/69 | (2.086:1) | https://openml.org/search?type=data&status=active&id=43750 | |
| Pc4 | 1339 | 37 | 1279/60 | (21.316:1) | https://openml.org/search?type=data&status=active&sort=runs&id=1049 | |
| Madelon | 1358 | 500 | 1300/58 | (22.413:1) | https://openml.org/search?type=data&status=active&sort=runs&id=1485 | |
| Turing binary | 6384 | 20 | 6260/124 | (50.483:1) | https://www.openml.org/search?type=data&status=active&id=44269 | |
| KDD | 2566 | 35 | 2515/51 | (49.313:1) | https://openml.org/search?type=data&status=active&id=45075 | |
| Liver disorder | 220 | 6 | 200/20 | (10:1) | https://openml.org/search?type=data&status=active&id=8 | |
| Wine | 143 | 13 | 106/37 | (2.864:1) | https://archive.ics.uci.edu/dataset/109/wine | |
| Soy bean | 167 | 8 | 160/7 | (22.857:1) | https://archive.ics.uci.edu/dataset/913/forty+soybean+cultivars+from+subsequent+harvests | |
| Ionosphere | 350 | 32 | 312/38 | (8.210:1) | https://archive.ics.uci.edu/dataset/52/ionosphere | |
| Room Occupancy | 8407 | 14 | 8228/179 | (45.966:1) | https://archive.ics.uci.edu/dataset/864/room+occupancy+estimation | |
| Harth | 7269 | 7 | 6771/498 | (13.596:1) | https://archive.ics.uci.edu/dataset/779/harth | |
| Rocket League | 3015 | 6 | 2830/185 | 15.297:1 | https://archive.ics.uci.edu/dataset/858/rocket+league+skillshots | |
| Sirtuin6 | 54 | 6 | 50/4 | (12.5:1) | https://archive.ics.uci.edu/dataset/748/sirtuin6+small+molecules-1 | |
| Toxicity | 123 | 12 | 115/8 | (14.375:1) | https://archive.ics.uci.edu/dataset/728/toxicity-2 | |
| Dry bean | 4357 | 16 | 4246/129 | (32.914:1) | https://archive.ics.uci.edu/dataset/602/dry+bean+dataset | |
| Kc2 | 520 | 21 | 414/106 | (3.905:1) | https://openml.org/search?type=data&status=active&sort=runs&id=1063 |
| Dataset | OTE | RF(smote) | RF(over) | RF(under) | k-NN | SVM | ANN | Tree | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Breast Cancer | 0.0015 | 0.0311 | 0.0427 | 0.0264 | 0.0166 | 0.0422 | 0.0851 | 0.0486 | 0.3625 | 0.0723 | |||||
| Credit Card | 0.0005 | 0.0005 | 0.0010 | 0.0046 | 0.0006 | 0.0083 | 0.1395 | 0.0071 | 0.0016 | 0.0014 | |||||
| Drug Classification | 0.0038 | 0.0461 | 0.0624 | 0.1875 | 0.0296 | 0.0650 | 0.3542 | 0.1771 | 0.4998 | 0.0697 | |||||
| Eeg eye | 0.0035 | 0 | 0.0063 | 0.0942 | 0.0521 | 0.0872 | 0.0553 | 0.9827 | 0.1094 | 0.2947 | |||||
| Glass Classification | 0.0914 | 0.0974 | 0.1330 | 0.0405 | 0.0710 | 0.1518 | 0.2194 | 0.9839 | 0.2879 | 0.1993 | |||||
| Pc4 | 0.0384 | 0.0213 | 0.0945 | 0.0411 | 0.0421 | 0.1519 | 0.1392 | 0.1006 | 0.9555 | 0.1120 | |||||
| Madelon | 0.0222 | 0.0218 | 0.2675 | 0.3459 | 0.1397 | 0.3525 | 0.0416 | 0.0420 | 0.0435 | 0.2260 | |||||
| Turing binary | 0.0102 | 0.0214 | 0.1321 | 0.3173 | 0.0437 | 0.1882 | 0.1392 | 0.1541 | 0.0271 | 0.1303 | |||||
| KDD | 0.0099 | 0.0101 | 0.0355 | 0.1394 | 0.0098 | 0.1191 | 0.0198 | 0.0226 | 0.9804 | 0.0185 | |||||
| Liver disorder | 0.0400 | 0.0453 | 0.0864 | 0.0469 | 0.0512 | 0.1153 | 0.0862 | 0.0915 | 0.7661 | 0.1126 | |||||
| Wine | 0.0227 | 0.0173 | 0.0796 | 0.1029 | 0.0680 | 0.0324 | 0.1605 | 0.0305 | 0.2574 | 0.1235 | |||||
| Soy bean | 0.0093 | 0.0100 | 0.0364 | 0.0706 | 0.0308 | 0.1229 | 0.0416 | 0.0420 | 0.0428 | 0.0402 | |||||
| Ionosphere | 0.0429 | 0.0172 | 0.1168 | 0.2830 | 0.0985 | 0.1531 | 0.1107 | 0.0898 | 0.8913 | 0.1093 | |||||
| Room Occupancy | 0 | 0 | 0.0002 | 0.0034 | 0.0003 | 0.0020 | 0.0001 | 0.00001 | 0.0211 | 0.0002 | |||||
| Harth | 0.0121 | 0.0119 | 0.0438 | 0.2698 | 0.0379 | 0.1214 | 0.0253 | 0.0682 | 0.9318 | 0.0303 | |||||
| Rocket League | 0.0334 | 0.0331 | 0.1032 | 0.4525 | 0.0914 | 0.1158 | 0.0622 | 0.0616 | 0.0614 | 0.0613 | |||||
| Sirtuin6 | 0.0516 | 0.0516 | 0.2433 | 0.2680 | 0.1200 | 0.3275 | 0.0700 | 0.0965 | 0.6519 | 0.0847 | |||||
| Toxicity | 0.0339 | 0.0352 | 0.1241 | 0.3828 | 0.1392 | 0.2750 | 0.0645 | 0.0570 | 0.0694 | 0.0692 | |||||
| Dry bean | 0.0030 | 0.0035 | 0.0102 | 0.0500 | 0.0075 | 0.0134 | 0.0301 | 0.0103 | 0.0295 | 0.0091 | |||||
| Kc2 | 0.0029 | 0.0026 | 0.0639 | 0.0459 | 0.0384 | 0.0236 | 0.0519 | 0.0353 | 0.0250 | 0.0277 | |||||
| Dataset | OTE | RF(smote) | RF(over) | RF(under) | k-NN | SVM | ANN | Tree | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Breast Cancer | 1.0000 | 0.9629 | 0.9364 | 0.9670 | 0.9795 | 0.9517 | 0.9696 | 0.9353 | 0.7200 | 0.9029 | ||
| Credit Card | 0.9993 | 0.9992 | 0.6503 | 0.9923 | 0.9241 | 0.9505 | 0.9860 | 0.9889 | 0.6123 | 0.7284 | ||
| Drug Classification | 1.0000 | 0.9378 | 0.8655 | 0.8277 | 0.9714 | 0.9361 | 0.6401 | 0.8328 | 0.2006 | 0.9031 | ||
| Eeg eye | 0.9929 | 1 | 0.7578 | 0.9216 | 0.9576 | 0.9049 | 0.9327 | 0.0700 | 0.6646 | 0.7257 | ||
| Glass Classification | 0.9153 | 0.9022 | 0.8050 | 0.9739 | 0.8984 | 0.8166 | 0.6440 | 0.1300 | 0.2665 | 0.7204 | ||
| Pc4 | 0.9981 | 0.9971 | 0.9755 | 0.9904 | 0.9885 | 0.9059 | 0.8811 | 0.9207 | 0 | 0.9287 | ||
| Madelon | 0.9554 | 0.9563 | 0.4063 | 0.6586 | 0.8614 | 0.6449 | 0.1750 | 0.0192 | 0 | 0.7171 | ||
| Turing binary | 0.9796 | 0.9799 | 0.0162 | 0.6648 | 0.9757 | 0.4800 | 0.2519 | 0.2176 | 0.0541 | 0.0140 | ||
| KDD | 1 | 1 | 0.9987 | 0.8649 | 0.9896 | 0.2506 | 0.9802 | 0.9794 | 0 | 0.9830 | ||
| Liver disorder | 0.9860 | 0.9830 | 0.9835 | 0.9713 | 0.9835 | 0.9134 | 0.9206 | 0.9135 | 0.1604 | 0.9365 | ||
| Wine | 0.9584 | 0.9720 | 0.8302 | 0.9111 | 0.9187 | 0.9218 | 0.7678 | 0.9296 | 0.0041 | 0.7702 | ||
| Soy bean | 0.9888 | 0.9864 | 0.4845 | 0.9309 | 0.6332 | 0.5975 | 0.0286 | 0.0288 | 0.0096 | 0.3968 | ||
| Ionosphere | 0.9733 | 0.6763 | 0.9292 | 0.7239 | 0.9589 | 0.9209 | 0.8954 | 0.9137 | 0 | 0.9397 | ||
| Room Occupancy | 1 | 1 | 0.9971 | 0.9962 | 0.9971 | 0.9936 | 0.9943 | 1 | 0 | 0.9948 | ||
| Harth | 0.9980 | 0.9976 | 0.9750 | 0.7336 | 0.9828 | 0.9190 | 0.9800 | 0.9317 | 0 | 0.9749 | ||
| Rocket League | 0.9336 | 0.9362 | 0.0759 | 0.5431 | 0.3916 | 0.0890 | 0.1240 | 0 | 0 | 0 | ||
| Sirtuin6 | 0.9554 | 0.9589 | 0.6373 | 0.7638 | 0.9509 | 0.7683 | 0.9200 | 0.9235 | 0 | 0.9182 | ||
| Toxicity | 0.9323 | 0.9314 | 0.0386 | 0.5924 | 0.4527 | 0.1049 | 0.0100 | 0 | 0 | 0 | ||
| Dry bean | 0.9961 | 0.9952 | 0.8161 | 0.9503 | 0.9639 | 0.9702 | 0.0353 | 0.9283 | 0 | 0.8880 | ||
| Kc2 | 0.9948 | 0.9952 | 0.4885 | 0.9793 | 0.9853 | 0.9925 | 0.8869 | 0.9895 | 0.7053 | 0.9869 | ||
| Dataset | OTE | RF(smote) | RF(over) | RF(under) | k-NN | SVM | ANN | Tree | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Breast Cancer | 0.0261 | 0.0108 | 0.1827 | 0.0229 | 0.0069 | 0.0072 | 0.0259 | 0.0358 | 0.5105 | 0.0241 |
| Credit Card | 0.0312 | 0.0028 | 0.1343 | 0.0177 | 0.0134 | 0.006 | 0.0224 | 0.033 | 0.4165 | 0.026 |
| Drug Classification | 0.0041 | 0.0003 | 0.0042 | 0.0459 | 0.0267 | 0.0046 | 0.0046 | 0.0042 | 0.042 | 0.0026 |
| Eeg eye | 0.0176 | 0.010 | 0.039 | 0.0494 | 0.0207 | 0.0191 | 0.0379 | 0.0374 | 0.2532 | 0.1108 |
| Glass Classification | 0.0048 | 0.0059 | 0.0188 | 0.0245 | 0.047 | 0.0407 | 0.0749 | 0.3964 | 0.0996 | 0.1457 |
| Pc4 | 0.0009 | 0.0038 | 0.0382 | 0.0327 | 0.0349 | 0.0261 | 0.0373 | 0.0063 | 0.0231 | 0.0256 |
| Madelon | 0.0043 | 0.0001 | 0.0077 | 0.016 | 0.0048 | 0.0318 | 0.0062 | 0.0479 | 0.0406 | 0.0186 |
| Turing binary | 0.0018 | 0.0001 | 0.0225 | 0.0004 | 0.0034 | 0.0064 | 0.0163 | 0.0295 | 0.0182 | 0.0434 |
| KDD | 0.0019 | 0.0009 | 0.0336 | 0.0001 | 0.003 | 0.0128 | 0.0122 | 0.0456 | 0.015 | 0.0464 |
| Liver disorder | 0.0002 | 0.005 | 0.0072 | 0.0008 | 0.0009 | 0.0366 | 0.0231 | 0.0221 | 0.0406 | 0.0282 |
| Wine | 0.0026 | 0.0356 | 0.0217 | 0.0087 | 0.0384 | 0.0091 | 0.0405 | 0.0241 | 0.0146 | 0.0306 |
| Soy bean | 0.0005 | 0.0413 | 0.0137 | 0.033 | 0.0152 | 0.0141 | 0.0454 | 0.2376 | 0.1533 | 0.4282 |
| Ionosphere | 0.0007 | 0.0014 | 0.0017 | 0.0035 | 0.0006 | 0.0004 | 0.0015 | 0.0015 | 0.0008 | 0.001 |
| Room Occupancy | 0.0001 | 0.0046 | 0.0029 | 0.0028 | 0.004 | 0.0002 | 0.0013 | 0.0045 | 0.0048 | 0.0432 |
| Harth | 0.0003 | 0.011 | 0.0231 | 0.0011 | 0.0337 | 0.002 | 0.0294 | 0.0064 | 0.0285 | 0.0094 |
| Rocket League | 0.0032 | 0.0068 | 0.0334 | 0.0104 | 0.0499 | 0.0233 | 0.0169 | 0.0115 | 0.0513 | 0.0423 |
| Sirtuin6 | 0.0021 | 0.0325 | 0.0291 | 0.0162 | 0.0029 | 0.0325 | 0.0172 | 0.0069 | 0.1549 | 0.0219 |
| Toxicity | 0.0219 | 0.0259 | 0.0143 | 0.1353 | 0.0312 | 0.3115 | 0.1628 | 0.0619 | 0.4473 | 0.4854 |
| Dry bean | 0.0005 | 0.0683 | 0.1644 | 0.0418 | 0.3127 | 0.0238 | 0.1366 | 0.437 | 0.2924 | 0.2128 |
| Kc2 | 0.0001 | 0.0499 | 0.042 | 0.0021 | 0.0144 | 0.012 | 0.0076 | 0.0438 | 0.0066 | 0.0265 |
| Dataset | OTE | RF(smote) | RF(over) | RF(under) | k-NN | SVM | ANN | Tree | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Breast Cancer | 0.9947 | 0.9947 | 0.6517 | 0.839 | 0.8804 | 0.838 | 0.8338 | 0.913 | 0.2983 | 0.8999 |
| Credit Card | 0.9896 | 0.9956 | 0.6501 | 0.8423 | 0.8884 | 0.8429 | 0.8354 | 0.9143 | 0.3128 | 0.9013 |
| Drug Classification | 0.9859 | 0.9898 | 0.7349 | 0.8767 | 0.9354 | 0.8716 | 0.8433 | 0.9266 | 0.4999 | 0.9079 |
| Eeg eye | 0.9785 | 0.9896 | 0.7333 | 0.8822 | 0.9386 | 0.8639 | 0.8457 | 0.9136 | 0.5117 | 0.9074 |
| Glass Classification | 0.9999 | 1 | 0.9998 | 0.8915 | 0.9406 | 0.9334 | 0.9542 | 0.97 | 0.7993 | 0.6966 |
| Pc4 | 0.9036 | 0.9139 | 0.8982 | 0.9492 | 0.9718 | 0.8841 | 0.8656 | 0.98 | 0.7396 | 0.8371 |
| Madelon | 1 | 0.9604 | 0.4013 | 0.9645 | 0.9104 | 0.7624 | 0.2103 | 0.6461 | 0.0445 | 0.5508 |
| Turing binary | 1 | 1 | 0.9197 | 0.6508 | 0.8596 | 0.6509 | 0.9591 | 0.6373 | 0.9565 | 0.8013 |
| KDD | 1 | 0.9999 | 0.9962 | 0.7037 | 0.949 | 0.8174 | 0.8721 | 0.8737 | 0.9815 | 0.8699 |
| Liver disorder | 0.9803 | 0.9798 | 0.2393 | 0.8568 | 0.9942 | 0.9056 | 0.1289 | 0.1582 | 0.0196 | 0.6431 |
| Wine | 0.9338 | 0.9264 | 0.2349 | 0.9372 | 0.8929 | 0.6402 | 0.3683 | 0.2322 | 0.1582 | 0.3683 |
| Soy bean | 1 | 0.994 | 0.9536 | 0.8912 | 0.9757 | 0.9665 | 0.8608 | 0.9704 | 0.7479 | 0.9216 |
| Ionosphere | 0.9922 | 0.9937 | 0.8841 | 0.0706 | 0.9889 | 0.9364 | 0.9586 | 0.958 | 0.9572 | 0.9765 |
| Room Occupancy | 0.9413 | 0.9988 | 0.5066 | 0.283 | 0.6602 | 0.5903 | 0.2519 | 0.8307 | 0.1087 | 0.5031 |
| Harth | 0.9999 | 1 | 0.9999 | 0.0034 | 0.9998 | 0.9992 | 1 | 0.9999 | 0.9788 | 0.9999 |
| Rocket League | 0.9775 | 0.9786 | 0.6987 | 0.2698 | 0.8552 | 0.7579 | 0.8854 | 0 | 0.0681 | 0.8709 |
| Sirtuin6 | 0.9995 | 0.9976 | 0.9505 | 0.4525 | 0.958 | 0.9407 | 0.9396 | 0.9383 | 0.9385 | 0.9386 |
| Toxicity | 0.9428 | 0.9394 | 0.09 | 0.268 | 0.6491 | 0.2142 | 0 | 0 | 0.3481 | 0 |
| Dry bean | 0.9994 | 0.9986 | 0.9074 | 0.3828 | 0.9502 | 0.8546 | 0.9369 | 0.9429 | 0.9305 | 0.9313 |
| Kc2 | 0.9999 | 1 | 0.9911 | 0.8849 | 0.9251 | 0.9725 | 0.942 | 0.004 | 0.6507 | 0.5571 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).