Submitted:
20 July 2025
Posted:
21 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Ensemble Learning for Fraud Detection
2.2. Class Imbalance Handling in Fraud Detection
2.3. IEEE-CIS Dataset Studies
2.4. Deep Learning and Advanced Approaches
3. Methodology
3.1. Dataset Description and Characteristics
3.2. Data Preprocessing and Feature Engineering
3.2.1. Feature Preprocessing Pipeline
3.2.2. Missing Value Analysis and Treatment
- Feature Removal: Features with >95% missing values are removed to prevent sparse representations that may mislead ensemble learners.
- Strategic Imputation: For categorical features with <95% missing values, we create explicit "missing" categories to capture the informational content of missingness patterns.
- Numerical Imputation: Numerical features employ median imputation within fraud/legitimate groups separately to preserve class-specific distributions.
- Missingness Indicators: Binary indicators are created for features with >20% missing values to capture missingness patterns as potential fraud signals.
3.2.3. Feature Engineering Strategy
3.3. Handling Class Imbalance
3.3.1. Synthetic Oversampling Techniques
3.4. Ensemble Model Architecture
3.4.1. Base Learner Selection and Configuration
3.4.2. Meta-Learning Strategy
4. Experimental Setup
4.1. Implementation Environment and Tools
4.2. Evaluation Metrics and Performance Assessment
4.2.1. Primary Evaluation Metrics
4.3. Cross-Validation and Model Selection
5. Results and Analysis
5.1. Overall Performance Comparison
5.2. Individual Algorithm Performance Analysis
5.3. Class Imbalance Handling Effectiveness
5.4. Feature Engineering Impact Assessment
5.5. Ensemble Architecture Analysis
6. Discussion
7. Conclusions
References
- Khalid, A.R. Owoh, N., Uthmani, O., Ashawa, M., Osamor, J., and Adejoh, J. Enhancing credit card fraud detection: an ensemble machine learning approach. Big Data and Cognitive Computing, 8(1):6, 2024. MDPI.
- Homaei, M.H. Caro Lindo, A., Sancho Núñez, J.C., Mogollón Gutiérrez, O., and Alonso Díaz, J. The Role of Artificial Intelligence in Digital Twin’s Cybersecurity. In Proceedings of the XVII Reunión Española sobre Criptología y Seguridad de la Información (RECSI 2022), 2022. Editorial Universidad de Cantabria.
- Homaei, M. Mogollón-Gutiérrez, O., Sancho, J.C., Ávila, M., and Caro, A. A review of digital twins and their application in cybersecurity based on artificial intelligence. Artificial Intelligence Review, 57(8), 2024. Springer Science and Business Media LLC.
- Gandhar, A. Gupta, K., Pandey, A.K., and Raj, D. Fraud detection using machine learning and deep learning. SN Computer Science, 5(5):453, 2024. Springer.
- Moradi, F. Tarif Hokmabadi, M., and Homaei, M. A Systematic Review of Machine Learning in Credit Card Fraud Detection. Preprints, 2025.
- Mienye, I.D. and Jere, N. Deep learning for credit card fraud detection: A review of algorithms, challenges, and solutions. IEEE Access, 2024. IEEE.
- Chen, Y. Zhao, C., Xu, Y., and Nie, C. Year-over-Year Developments in Financial Fraud Detection via Deep Learning: A Systematic Literature Review. arXiv preprint arXiv:2502.00201, 2025.
- Fernández, A. García, S., Galar, M., Prati, R.C., Krawczyk, B., and Herrera, F. Learning from imbalanced data sets, volume 10, 2018. Springer.
- Talukder, M.A. Khalid, M., and Uddin, M.A. An integrated multistage ensemble machine learning model for fraudulent transaction detection. Journal of Big Data, 11(1), 2024. Springer Science and Business Media LLC.
- Vesta Corporation. IEEE-CIS Fraud Detection Dataset. Kaggle Competition, 2019. https://www.kaggle.com/c/ieee-fraud-detection. Dataset provided for IEEE Computational Intelligence Society fraud detection competition.
- Suganya, S.S. , Nishanth, S., and Mohanadevi, D. Ensemble Learning Approaches for Fraud Detection in Financial Transactions. In 2023 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS), pages 805–810, 2023. IEEE.
- Almalki, F. and Masud, M. Financial Fraud Detection Using Explainable AI and Stacking Ensemble Methods. arXiv preprint arXiv:2505.10050, 2025.
- Chawla, N.V. Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
- Elreedy, D. Atiya, A.F., and Kamalov, F. A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Machine Learning, 113(7):4903–4923, 2024. Springer.
- Salehi, A.R. and Khedmati, M. A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data. Scientific Reports, 14(1):5152, 2024. Nature Publishing Group UK London.
- Papers with Code. IEEE CIS Fraud Detection Dataset. Online Repository, 2024. https://paperswithcode.com/dataset/ieee-cis-fraud-detection-1. Community-maintained dataset documentation.
- Zhao, X. , Zhang, Q., and Zhang, C. Enhancing Transaction Fraud Detection with a Hybrid Machine Learning Model. In 2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI), pages 427–432, 2024. IEEE.
- Saito, T. and Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10(3):e0118432, 2015. Public Library of Science San Francisco, CA USA.
| Year/Ref | Ensemble Method | Base Classifiers | Dataset | Key Contribution |
|---|---|---|---|---|
| 2024 [1] | Bagging + Boosting + Individual | SVM, KNN, RF, Bagging, Boosting | Custom | Imbalance-aware ensemble integration across multiple classifiers |
| 2024 [11] | Random Forest | Decision Trees | Financial Transactions | Advanced feature engineering for fraud signal enhancement |
| 2024 [12] | Stacking Ensemble | XGBoost, LightGBM, CatBoost | Financial Dataset | Integration of explainable AI with high-performing ensemble |
| Study | Dataset | Size | Fraud Rate | Features | Best Performance |
|---|---|---|---|---|---|
| [1] | Custom | Not Specified | Imbalanced | Not Specified | Improved accuracy through ensemble and sampling strategies |
| [11] | Financial Transactions | Large Dataset | Not Specified | Engineered Features | Detection enhanced by feature construction |
| [12] | Financial Dataset | Not Specified | Not Specified | Not Specified | 99% Accuracy, 0.99 AUC-ROC with explainable stacking |
| [17] | IEEE-CIS | 590,540 | 3.5% | 431 | Significant improvement via hybrid learning |
| IEEE-CIS | |||||
| Benchmark | IEEE-CIS | 590,540 | 3.5% (20,663) | 431 | Baseline reference for model validation |
| Preprocessing Step | Features Remaining | Features Removed |
|---|---|---|
| Original IEEE-CIS Dataset | 431 | - |
| Remove Features >95% Missing | 298 | 133 |
| Remove Zero-Variance Features | 276 | 22 |
| Remove Highly Correlated (>0.98) | 203 | 73 |
| Remove Low Information Gain (<0.001) | 167 | 36 |
| Baseline Feature Set | 167 | 264 total |
| Method | AUC-ROC | AUC-PR | F1-Score | Balanced Acc. | G-Mean |
|---|---|---|---|---|---|
| Proposed Stacking | 0.918 | 0.891 | 0.856 | 0.923 | 0.918 |
| XGBoost | 0.887 | 0.834 | 0.798 | 0.887 | 0.882 |
| LightGBM | 0.882 | 0.828 | 0.791 | 0.881 | 0.876 |
| Random Forest | 0.869 | 0.802 | 0.765 | 0.864 | 0.859 |
| Weighted Voting | 0.901 | 0.847 | 0.812 | 0.898 | 0.893 |
| Simple Voting | 0.878 | 0.823 | 0.784 | 0.876 | 0.871 |
| CatBoost | 0.873 | 0.821 | 0.775 | 0.869 | 0.864 |
| Logistic Regression | 0.829 | 0.743 | 0.701 | 0.821 | 0.815 |
| Algorithm | AUC-ROC | AUC-PR | Training Time | Inference Time |
|---|---|---|---|---|
| XGBoost | 0.887 | 0.834 | 18.3 min | 45 ms |
| LightGBM | 0.882 | 0.828 | 12.7 min | 38 ms |
| CatBoost | 0.873 | 0.821 | 24.1 min | 52 ms |
| Random Forest | 0.869 | 0.802 | 8.9 min | 28 ms |
| Neural Network | 0.841 | 0.786 | 15.6 min | 35 ms |
| K-NN | 0.826 | 0.771 | 2.1 min | 125 ms |
| Logistic Regression | 0.829 | 0.743 | 1.8 min | 12 ms |
| Technique | AUC-ROC | AUC-PR | FPR@95%R | Cost Reduction |
|---|---|---|---|---|
| SMOTE + Stacking | 0.918 | 0.891 | 0.94% | 52.3% |
| Borderline-SMOTE + Stacking | 0.912 | 0.884 | 1.02% | 48.7% |
| ADASYN + Stacking | 0.908 | 0.876 | 1.15% | 45.1% |
| SMOTE + Tomek + Stacking | 0.915 | 0.888 | 0.98% | 50.8% |
| No Sampling + Stacking | 0.863 | 0.812 | 1.68% | 31.2% |
| SMOTE + XGBoost | 0.887 | 0.834 | 1.25% | 42.6% |
| No Sampling + XGBoost | 0.821 | 0.758 | 2.14% | 24.8% |
| Feature Set | AUC-ROC | AUC-PR | F1-Score | Features | AUC-PR |
|---|---|---|---|---|---|
| Complete Pipeline | 0.918 | 0.891 | 0.856 | 247 | - |
| - Interaction Features (25) | 0.905 | 0.873 | 0.834 | 222 | -0.018 |
| - Aggregation Features (28) | 0.892 | 0.856 | 0.812 | 219 | -0.035 |
| - Temporal Features (15) | 0.883 | 0.847 | 0.801 | 232 | -0.044 |
| - Amount Engineering (12) | 0.897 | 0.863 | 0.825 | 235 | -0.028 |
| Baseline Features Only | 0.851 | 0.789 | 0.743 | 167 | -0.102 |
| Ensemble Method | AUC-ROC | AUC-PR | Training Time | p-value |
|---|---|---|---|---|
| Stacking (Proposed) | 0.918 | 0.891 | 45.7 min | - |
| Weighted Voting | 0.901 | 0.847 | 32.4 min | < 0.001 |
| Blending | 0.895 | 0.842 | 38.9 min | < 0.001 |
| Simple Voting | 0.878 | 0.823 | 31.8 min | < 0.001 |
| Bagging (RF) | 0.869 | 0.808 | 26.3 min | < 0.001 |
| AdaBoost | 0.841 | 0.785 | 41.2 min | < 0.001 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).