Submitted:
24 June 2026
Posted:
25 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
3. Results
3.1. Comparative Performance of Machine Learning Classifiers
3.2. Controlled Analysis of Class Cardinality Effects
3.3. Real-Variable Validation of Class Cardinality Effects
3.4. Impact of Class Imbalance and Statistical Significance Testing
4. Discussion
5. Business Implications
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Valencia-Arias, A.; Uribe-Bedoya, H.; González-Ruiz, J. D.; Santos, G. S.; Ramírez, E. C.; Rojas, E. M. Artificial intelligence and recommender systems in e-commerce. Trends and research agenda. Intell. Syst. With Appl. 2024, 24, 200435. [Google Scholar] [CrossRef]
- Madanchian, M. The Role of Complex Systems in Predictive Analytics for E-Commerce Innovations in Business Management. Systems 2024, 12(10), 415. [Google Scholar] [CrossRef]
- Anthoniraj, S.; Kumar, A.N.; Hemakumar Reddy, G.; Raju, M. Classification of Imbalanced Data in E-Commerce. International Conference on Smart and Sustainable Technologies in Energy and Power Sectors (SSTEPS), Mahendragarh, India, 2022; pp. 204–209. [Google Scholar] [CrossRef]
- Chen, W.; Yang, K.; Yu, Z.; Shi, Y.; Philip Chen, C.L. A survey on imbalanced learning: latest research, applications and future directions. Artif. Intell. Rev. 2024, 57, 137. [Google Scholar] [CrossRef]
- Suguna, R.; Suriya Prakash, J.; Aditya Pai, H.; Mahesh, T. R.; Vinoth Kumar, V.; Yimer, T. E. Mitigating class imbalance in churn prediction with ensemble methods and SMOTE. Sci. Rep. 2025, 15(1), 16256. [Google Scholar] [CrossRef] [PubMed]
- Xu, F.; Pan, Z.; Xia, R. E-commerce product review sentiment classification based on a naïve Bayes continuous learning framework. Inf. Process. Manag. 2020, 57(5), 102221. [Google Scholar] [CrossRef]
- Deniz, E.; Erbay, H.; Coşar, M. Multi-Label Classification of E-Commerce Customer Reviews via Machine Learning. Axioms 2022, 11(9), 436. [Google Scholar] [CrossRef]
- Lei, B.; Wang, J.; Shen, C. Automatic classification method of e-commerce commodity raw materials through the introduction of self-supervised concepts and the construction of domain ontology. Sci. Rep. 2026, 16(1), 8058. [Google Scholar] [CrossRef] [PubMed]
- Rodriguez, J.D.; Perez, A.; Lozano, J.A. Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32(3), 569–575. [Google Scholar] [CrossRef] [PubMed]
- Abedin, T.; Xu, H.; Uddin, S. The impact of K selection in K-fold cross-validation on bias and variance in supervised learning models. Sci. Rep. 2026, 16(1), 6084. [Google Scholar] [CrossRef] [PubMed]
- Rainio, O.; Teuho, J.; Klén, R. Evaluation metrics and statistical tests for machine learning. Sci. Rep. 2024, 14(1), 6086. [Google Scholar] [CrossRef] [PubMed]
- Farhadpour, S.; Warner, T.A.; Maxwell, A.E. Selecting and Interpreting Multiclass Loss and Accuracy Assessment Metrics for Classifications with Class Imbalance: Guidance and Best Practices. Remote Sens. 2023, 16(3), 533. [Google Scholar] [CrossRef]
- Stern, H.S. Bayesian Statistics; Smelser, Neil J., Baltes, Paul B., Eds.; International Encyclopedia of the Social & Behavioral Sciences: Pergamon, 2001; pp. Pages 1052–1056. ISBN 9780080430768. [Google Scholar] [CrossRef]
- Ramos, D.; Franco-Pedroso, J.; Lozano-Diez, A.; Gonzalez-Rodriguez, J. Deconstructing Cross-Entropy for Probabilistic Binary Classifiers. Entropy 2018, 20(3), 208. [Google Scholar] [CrossRef] [PubMed]
- Warren, E.M.; Handley, J.C.; Sheets, H.D. Cross entropy and log likelihood ratio cost as performance measures for multi-conclusion categorical outcomes scales. J. Forensic Sci. 2025, 70(2), 589–606. [Google Scholar] [CrossRef] [PubMed]
- Cleophas, T.J.; Zwinderman, A.H. Paired Continuous Data (Paired T-Test, Wilcoxon Signed Rank Test, 10 Patients). In SPSS for Starters and 2nd Levelers; Springer: Cham, 2016. [Google Scholar] [CrossRef]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, 2006. [Google Scholar]
- Cover, T. M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley, 2006. [Google Scholar] [CrossRef]
- Balasundaram, E.; Aranganathan, P.; Annavajjala, K.S.; Sivakumar, R.; Arumugam, M.; Vinoth, A. A Hybrid Approach for Customer Segmentation and Loyalty Prediction in E-Commerce. Prabandhan Indian J. Manag. 2024, 17(10), 56–69. [Google Scholar] [CrossRef]
- Zhang, J.; Qiu, Y.; Dong, L. Conformal deep forest for uncertainty-aware classification. J. King Saud. Univ. Comput. Inf. Sci. 2025, 37, 155. [Google Scholar] [CrossRef]




| Category | Season | Size | |||
| Classes | Frequency | Classes | Frequency | Classes | Frequency |
| Accessories | 1240 (31.79%) | Fall | 975 (25.00%) | L | 1053 (27.00%) |
| Clothing | 1737 (44.54%) | Spring | 999 (25.62%) | M | 1755 (45.00%) |
| Footwear | 599 (15.36%) | Summer | 955 (24.49%) | S | 663 (17.00%) |
| Outerwear | 324 (8.31%) | Winter | 971 (24.90%) | XL | 429 (11.00%) |
| Payment method | Shipping type | Frequency of purchases | |||
| Classes | Frequency | Classes | Frequency | Classes | Frequency |
| Bank Transfer | 612 (15.69%) | 2-Day Shipping | 627 (16.08%) | Annually | 572 (14.67%) |
| Cash | 670 (17.18%) | Express | 646 (16.56%) | Bi-Weekly | 547 (14.03%) |
| Credit Card | 671 (17.21%) | Free Shipping | 675 (17.31%) | Every 3 months | 584 (14.97%) |
| Debit Card | 636 (16.31%) | Next Day Air | 648 (16.62%) | Fortnightly | 542 (13.90%) |
| PayPal | 677 (17.36%) | Standard | 654 (16.77%) | Monthly | 553 (14.18%) |
| Venmo | 634 (16.26%) | Store Pickup | 650 (16.67%) | Quarterly | 563 (14.44%) |
| Weekly | 539 (13.82%) | ||||
| Category | Size | Season | |||||||
| Model | Accuracy | F1Score | LogLoss | Accuracy | F1Score | LogLoss | Accuracy | F1Score | LogLoss |
| GNB | 0.4454 | 0.1540 | 1.2204 | 0.2531 | 0.2357 | 1.3868 | 0.4500 | 0.1551 | 1.2594 |
| LR | 0.4454 | 0.1540 | 36.0437 | 0.2613 | 0.2399 | 36.0437 | 0.4500 | 0.1551 | 36.0437 |
| DT | 0.3608 | 0.2346 | 8.7996 | 0.2559 | 0.2551 | 11.0777 | 0.3415 | 0.2391 | 9.2554 |
| RF | 0.3438 | 0.2386 | 3.8950 | 0.2592 | 0.2586 | 3.6113 | 0.3395 | 0.2617 | 3.6257 |
| SVM | 0.4454 | 0.1540 | 36.0437 | 0.2551 | 0.2497 | 36.0437 | 0.4500 | 0.1551 | 36.0437 |
| Payment Method | Shipping Type | Frequency of Purchase | |||||||
| Model | Accuracy | F1Score | LogLoss | Accuracy | F1Score | LogLoss | Accuracy | F1Score | LogLoss |
| GNB | 0.1633 | 0.1208 | 1.7960 | 0.1690 | 0.1398 | 1.7948 | 0.1395 | 0.1030 | 1.9508 |
| LR | 0.1536 | 0.0974 | 36.0437 | 0.1669 | 0.1363 | 36.0437 | 0.1405 | 0.0967 | 36.0437 |
| DT | 0.1662 | 0.1653 | 16.0100 | 0.1692 | 0.1681 | 15.8230 | 0.1456 | 0.1439 | 17.4117 |
| RF | 0.1615 | 0.1609 | 7.1804 | 0.1649 | 0.1648 | 6.8832 | 0.1526 | 0.1522 | 9.0245 |
| SVM | 0.1608 | 0.1368 | 36.0437 | 0.1774 | 0.1707 | 36.0437 | 0.1467 | 0.1343 | 36.0437 |
| Number of Classes | Mean Confidence | Mean Entropy | Log Loss |
| 2 | 0.576442871 | 0.681177521 | 0.682370808 |
| 4 | 0.296332834 | 1.348275726 | 1.352318492 |
| 6 | 0.284825849 | 1.746593782 | 1.752378223 |
| 7 | 0.156502523 | 1.9436029 | 1.950618192 |
| Variable | Number of Classes | Mean Confidence | Mean Entropy | Log Loss |
| Category | 4 | 0.445714555 | 1.217489845 | 1.219867201 |
| Size | 4 | 0.271413979 | 1.383760718 | 1.387446215 |
| Season | 4 | 0.449923314 | 1.255456913 | 1.258927761 |
| Payment Method | 6 | 0.182427439 | 1.789578089 | 1.795229106 |
| Shipping Type | 6 | 0.182400476 | 1.789331466 | 1.793801015 |
| Frequency Purchases | 7 | 0.156502523 | 1.9436029 | 1.950618192 |
| Variable | Macro F1 Original | Macro F1 Balanced | Prob Variance Original | Prob Variance Balanced | Confusion Entropy Original | Confusion Entropy Balanced |
| Category | 0.154035241 | 0.293478374 | 0.000155742 | 0.000165245 | 1.217532035 | 1.382238689 |
| Frequency Purchases | 0.100641123 | 0.111198672 | 3.23E-05 | 3.21E-05 | 1.943562775 | 1.944018522 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.