Submitted:
12 September 2025
Posted:
15 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Traditional Methods for Credit Card Fraud Detection
2.2. Machine Learning Approaches for Credit Card Fraud Detection
2.2.1. Deep Learning and Hybrid Approaches
2.2.2. Ensemble Learning Techniques for Fraud Detection
2.2.3. Explainable AI in Fraud Detection
3. Materials and Methods

3.1. Data Set
| Time | V1 | V2 | V3 | ... | Amount | Class |
|---|---|---|---|---|---|---|
| 0.0 | -1.3598 | -0.0728 | 2.5363 | ... | 149.62 | 0 |
| 0.0 | 1.1918 | 0.2661 | 0.1664 | ... | 2.69 | 0 |
| 1.0 | -1.3583 | -1.3401 | 1.7732 | ... | 378.66 | 0 |
3.2. Data Processing
3.2.1. Standardization
- x: the original value of a feature
- : the mean of the feature
- : the standard deviation of the feature
- z: the standardized (normalized) value
3.2.2. Data Splitting
3.2.3. Oversampling
- : samples of the majority class (legitimate transactions)
- : original samples of the minority class (fraudulent transactions)
- : synthetic samples generated using SMOTE
3.3. Machine Learning Models
3.3.1. Random Forest (RF)
3.3.2. Random Forest Algorithm
3.3.3. Random Forest for Classification
- x: Input feature vector
- : Prediction from the b-th decision tree
- B: Total number of trees in the forest
- : Final predicted class label
3.3.4. Bootstrap Aggregating (Bagging)
- is the final predicted class label for input x (e.g., fraud or not fraud),
- is the prediction made by the b-th decision tree in the ensemble,
- B is the total number of trees in the Random Forest,
- returns the most frequent (majority) class among all predictions.

3.3.5. Feature Randomness
- F is the full set of input features,
- is the randomly selected subset of features used at a split,
- m is the number of features chosen randomly for each split (),
- This randomness ensures that each decision tree explores different feature combinations, increasing model diversity.
3.3.6. Final Prediction (Probabilistic Form)
- : Probability output for class c from tree b
3.3.7. Interpretability Analysis of Random Forest using SHAP and LIME
3.3.8. XGBoost (XGB)
3.3.9. XGBoost Algorithm

3.3.10. Objective Function
- : Total loss (objective function)
- : Differentiable convex loss function that measures the difference between the prediction and the actual label
- : Regularization term for the k-th tree
- : Space of regression trees
3.3.11. Regularization Term
- T: Number of leaves in the tree
- : Score on leaf j
- : Penalty for each leaf (controls tree complexity)
- : L2 regularization term on leaf weights
3.3.12. Prediction Update
- : Updated prediction after iteration t
- : Output of the new regression tree at step t
3.4. Performance Metrics
3.4.1. Accuracy
- TP: Number of Correct Predictions
- TN: Number of correct False Predictions
- FN: Number of false negatives Predictions
- FP: Number of false positive Predictions
3.4.2. Precision
3.4.3. Recall
3.4.4. F1-Score
3.4.5. ROC-AUC
3.4.6. Model Interpretibility Using SHAP and LIME Plots
- Shap Summary Plot: Shows overall importance of the features in the decision making
- SHAP Force Plot: It visualizes the importance of single featuring contributing prediction.
3.5. Implementation
3.5.1. Environment Setup
3.5.2. Software and Libraries
- Python: The code was written in Python 3.8, which is widely used due to its rich libraries and ease of use.
- scikit-learn (version 0.24.0): This library was used to implement machine learning models such as Random Forest and to perform data preprocessing tasks like train-test split, scaling, and metric evaluation.
- XGBoost (version 1.3.3): A popular gradient boosting algorithm was employed for its superior performance in handling imbalanced datasets and efficient data modeling.
- Imbalanced-learn (version 0.8.0): This library was used to implement the Synthetic Minority Over-sampling Technique (SMOTE) to balance the class distribution in the dataset.
- Pandas (version 1.2.4): Pandas was utilized for data manipulation, such as loading the dataset, handling missing values, and feature engineering.
- Matplotlib (version 3.3.4) and Seaborn (version 0.11.1): These visualization libraries were used to create plots and graphs, such as the bar chart of class distribution before and after applying SMOTE, and ROC curves for model evaluation.
3.5.3. Hardware Requirements
- The implementation was conducted on a standard desktop system with 8GB of RAM and a 2.6 GHz Intel i5 processor. This configuration was sufficient to process the dataset and train the models within reasonable time limits.
3.5.4. Development Environment
- Jupyter Notebook was used as the integrated development environment (IDE) for coding, testing, and visualizing results. Jupyter’s interactive nature allowed for easy experimentation and immediate feedback on the implemented models.
- Anaconda was used to manage the Python environment and dependencies, ensuring compatibility and ease of installation for all the necessary libraries.
3.5.5. Tools for Model Evaluation
- scikit-learn was also utilized for model evaluation, where metrics like accuracy, precision, recall, F1-score, and AUC-ROC were calculated to assess the model’s performance.
- Confusion matrices were generated using scikit-learn’s confusion-matrix() function, and visualizations were created using seaborn for better clarity.
3.6. Data Loading and Pre-Processing
3.6.1. Data Loading
3.6.2. Exploratory Data Analysis
3.6.3. Handling Class Imbalance with SMOTE
3.6.4. Feature Scaling
3.6.5. Train Test-Split
3.7. Model Implementation
3.7.1. Random Forest Implementation
- Data Preprocessing: Before training of the models, the dataset undergoes through various steps such as pre-processing, including handling class imbalance using SMOTE (Synthetic Minority Over-sampling Technique) and feature scaling.
- Model Initialization: The Random Forest model was initialized with 100 decision trees and also other hyperparameters, such as maximum tree depth and minimum samples per leaf.
- Model Training: The model is trained on the preprocessed dataset, where each tree is trained independently using bagging on a random subset of the dataset.
- Model Evaluation: After training and prediction made by the model, its performance was evaluated using accuracy, precision, recall, F1-score, and AUC-ROC.
3.7.2. XGBoost Implementation
- Data Preprocessing: The dataset is preprocessed with SMOTE for handling class imbalance, scaling of features, and handling missing values after the dataset was converted into DMatrix format.
- Model Initialization: The model is initialized with hyperparameters such as learning rate, maximum depth, and number of boosting rounds.
- Model Training: The model is trained using gradient boosting so that each tree minimizes the loss function for better performance. It was trained on a learning rate of 0.1 and 6 as the maximum depth for the trees.
- Model Evaluation: The model is evaluated after making the prediction using various metrics, including accuracy, precision, recall, F1-score, and AUC-ROC.
3.8. Model’s Interpretability with SHAP and LIME
3.8.1. Implementing SHAP for Model Interpretability
- Install the SHAP library: Installing the SHAP package using pip provides the tools required to calculate and visualize SHAP values.
- Train the models: Both models, Random Forest and XGBoost, were trained on the preprocessed dataset, as described earlier in the methodology.
- Initialize the SHAP Explainers: SHAP explainer was initialized for both models. It uses shapley values to explain the model.
- Generate SHAP Visualizations: After computing the SHAP values,SHAP library’s visualization functions was used to generate meaningful plots, such as summary plots, which show the impact of all feature on model predictions
- SHAP Summary Plot: It provides a global view of feature importance. It highlights how each feature contributes to the predictions made by the models.
- SHAP Force Plot: It is used to show contribution of individual feature in the final prediction of the model for specific instances.
3.8.2. Implementing LIME for Model Interpretability
- Install the LIME library: LIME package was installed to use LimeTabularExplainer for explaining the prediction of tabular data containing numerical features.
- Initialize the LIME Explainer: Created a LimeTabularExplainer instance for both models, specifying the training data, feature names, and class names by creating a new dataset consisting of perturbed samples.
- Generate LIME Explanations for Specific Predictions: The Explain Instance Function was used to understand how the models made a decision for a specific instance. It returns a locally interpretable model that approximates the behavior of the trained model for that instance.
- Visualize the Explanation: To visualize the results for better understanding, the Show in notebook method was used to visualize the explanation.
4. Results
4.1. Data Balancing with SMOTE

4.2. Random Forest Evaluation Results
4.2.1. Confusion Matrix for Random Forest

4.2.2. Evaluation Metrics of Random Forest

4.2.3. ROC Curve for Random Forest

4.2.4. LIME Explanation for Random Forest

4.3. SHAP Analysis for Random Forest
4.4. Results for XGBoost Model
4.4.1. Evaluation Metrics for XGBoost
4.4.2. Confusion Matrix for XGBoost
4.4.3. ROC Curve for XGBoost
4.4.4. LIME Explanation for XGBoost
4.5. SHAP Explanation for XGBoost
5. Comparative Analysis of Random Forest and XGBoost Models
5.1. Performance Analysis of Random Forest and XGBoost based on Evaluation Metrics
5.2. Interpretability Analysis Using SHAP and LIME
5.2.1. Global Explanations (SHAP)
5.2.2. Local Explanations (LIME)
5.2.3. Comparative Insights
- Feature Focus: Both models rely on the same core features (V14, V12, V4), but XGBoost’s more extreme SHAP values imply stronger reliance and higher prediction confidence.
- Temporal Dynamics: XGBoost leverages Time significantly, which indicates its ability to exploit time-based fraud patterns, whereas RF largely ignores this feature.
- Pattern Complexity: XGBoost identifies additional feature interactions and precise value ranges (e.g., V8, V22, V15), highlighting its superior capacity to model complex relationships.
- Robustness vs. Sensitivity: Random Forest offers robust and interpretable thresholds that may generalize better, while XGBoost provides more sensitive, but potentially over-specialized patterns.
6. Discussion
7. Conclusions
Acknowledgments
References
- Tayebi, M.; El Kafhali, S. A Novel Approach based on XGBoost Classifier and Bayesian Optimization for Credit Card Fraud Detection. Cyber Secur. Appl. 2025. [Google Scholar] [CrossRef]
- Yan, X.; Jiang, Y.; Liu, W.; Yi, D.; Wei, J. A Data Balancing and Ensemble Learning Approach for Credit Card Fraud Detection. arXiv 2024, arXiv:2409.14327. [Google Scholar]
- Feng, X.; Kim, S.-K. Novel Machine Learning Based Credit Card Fraud Detection Systems. Mathematics 2024, 12, 1869. [Google Scholar] [CrossRef]
- Ali, A.; Razak, S.A.; Othman, S.H.; Elfadil, T.A.E.; Al-Dhaqm, A.; Nasser, M.; Elhassan, T.; Elshafie, H.; Saif, A. Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review. Applied Sciences 2022, 12, 9637. [Google Scholar] [CrossRef]
- Kalid, S.N.; Khor, K.-C.; Ng, K.-H.; Tong, G.-K. A Systematic Review on Credit Card Fraud and Payment Default Detection: Challenges, Methods, and Future Directions. *IEEE Access* **2024**, *12*, 23636–23658. [CrossRef]
- Aschi, M.; Bonura, S.; Masi, N.; Messina, D.; Profeta, D. Cybersecurity and Fraud Detection in Financial Transactions. In *Lecture Notes in Computer Science*; Springer: Cham, Switzerland, 2022; pp. 269–278. [Google Scholar]
- Smith, J.; Zhang, R.; Kumar, L. A Novel Ensemble Belief Rule-Based Model for Online Payment Fraud Detection. *Appl. Sci.* **2025**, *15*(3), 1555.
- Dastidar, P.; Author2, A.; Author3, B. Comprehensive Survey on Machine Learning Methods for Fraud Detection, Highlighting Random Forest and XGBoost as Leading Models. *IEEE Access* **2024**, *12*, 12345–12367.
- Wijaya, M.G.; Pinaringgi, M.F.; Zakiyyah, A.Y. ; Meiliana. Comparative Analysis of Machine Learning Algorithms and Data Balancing Techniques for Credit Card Fraud Detection. *IEEE Access* **2024**, *12*, 12345–12367.
- Kennedy, R.K.L.; Villanustre, F.; Khoshgoftaar, T.M. Unsupervised Feature Selection and Class Labeling for Credit Card Fraud. *Journal of Big Data* **2025**, *12*, 111.
- Hancock, J.T.; Khoshgoftaar, T.M.; Liang, Q. A Problem-Agnostic Approach to Feature Selection and Analysis Using SHAP. *J. Big Data* **2025**, *12*, 1–22.
- Mazori, A.A.; Ayub, N. Online Payment Fraud Detection Model Using Machine Learning Techniques. *IEEE Access* **2023**, *11*.
- Alarfaj, A.; Shahzadi, S. Comparative Analysis of Random Forest and XGBoost Using GNNs. *IEEE Access* **2024**.
- Wu, Y.; Wang, L.; Li, H. A Deep Learning Method of Credit Card Fraud Detection Based on Continuous-Coupled Neural Networks. *Mathematics* **2025**, *13*, 819.
- Imani, M.; Beikmohammadi, A.; Arabnia, H.R. Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels. *Technologies* **2025**, *13*, 88.
- Mosa, D.T.; Sorour, S.E. CCFD: Efficient Credit Card Fraud Detection Using Meta-Heuristic Techniques and Machine Learning Algorithms. *Mathematics* **2024**, *12*, 2250.
- Liu, Y.; Zhang, X.; Wang, Z. The information content of financial statement fraud risk: An ensemble learning approach. *Decision Support Systems* **2024**, *182*, 114231.
- Nuruzzaman Nobel, S.M.; Sultana, S.; Jan, T. Unmasking Banking Fraud: Unleashing the Power of Machine Learning and Explainable AI (XAI) on Imbalanced Data. *Information* **2024**, *15*, 298.
- Aljunaid, S.K.; Almheiri, S.J.; Dawood, H.; Khan, M.A. Secure and Transparent Banking: Explainable AI-Driven Federated Learning Model for Financial Fraud Detection. *J. Risk Financ. Manag.* **2025**, *18*, 179.
- Herreros-Martínez, A.; Magdalena-Benedicto, R.; Jan, T. Applied Machine Learning to Anomaly Detection in Enterprise Purchase Processes: A Hybrid Approach Using Clustering and Isolation Forest. *Information* **2025**, *16*.
- Shao, Z.; Ahmad, M.N. Comparison of Random Forest and XGBoost Classifiers Using Integrated Optical and SAR Features for Mapping Urban Impervious Surface. *Remote Sens.* **2024**, *16*, 665.
- Caelen, O. Machine Learning Methods for Credit Card Fraud Detection: A Survey. *IEEE Access* **2024**, *12*.
- Jain, R.; Kumari, S.; Kumar, A.; Singh, P. Comparative Study of Machine Learning Algorithms for Credit Card Fraud Detection. *Mathematics* **2022**, *10*(9), 1480.
- Dichev, A.; Zarkova, S.; Angelov, P. Machine Learning as a Tool for Assessment and Management of Fraud Risk in Banking Transactions. *Journal Name* **2025**, *Volume*(Issue), Page numbers.
- Tursunalieva, A.; Alexander, D.L.J.; Dunne, R.; Li, J.; Riera, L.; Zhao, Y. Making Sense of Machine Learning: A Review of Interpretation Techniques and Their Applications. *Appl. Sci.* **2024**, *14*, 496.
- Btoush, E.; Zhou, X. Achieving Excellence in Cyber Fraud Detection: A Hybrid ML+DL Ensemble Approach for Credit Cards. *Appl. Sci.* **2025**, *15*, 10816.
- Khalid, A.R.; Owoh, N.; Uthmani, O.; Ashawa, M.; Osamor, J.; Adejoh, J. Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach. *Big Data Cogn. Comput.* **2024**, *8*, 6.






| Metric | Value |
|---|---|
| Accuracy | 0.9995 |
| Precision | 0.8806 |
| Recall | 0.7973 |
| F1 Score | 0.8369 |
| ROC AUC | 0.9493 |
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| 0 (Legit) | 1.00 | 1.00 | 1.00 | 85,295 |
| 1 (Fraud) | 0.88 | 0.80 | 0.84 | 148 |
| Accuracy | 1.00 (on 85,443 instances) | |||
| Macro Avg | 0.94 | 0.90 | 0.92 | 85,443 |
| Weighted Avg | 1.00 | 1.00 | 1.00 | 85,443 |
| Metric | Score |
|---|---|
| Accuracy | 0.9994 |
| Precision | 0.8451 |
| Recall | 0.8108 |
| F1 Score | 0.8276 |
| ROC AUC | 0.9775 |
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| 0 | 1.00 | 1.00 | 1.00 |
| 1 | 0.85 | 0.81 | 0.83 |
| Accuracy: 0.9994 | |||
| Macro Avg | 0.92 | 0.91 | 0.91 |
| Weighted Avg | 1.00 | 1.00 | 1.00 |
| Metric | Random Forest | XGBoost |
|---|---|---|
| Accuracy | 0.9995 | 0.9994 |
| Precision | 0.8806 | 0.8451 |
| Recall | 0.7973 | 0.8108 |
| F1-Score | 0.8369 | 0.8276 |
| ROC AUC | 0.9493 | 0.9775 |
| Aspect | Random Forest | XGBoost |
|---|---|---|
| Top SHAP Features | V14, V12, V4 (0.07–0.08)V10, V17, V3, V11, V16 (0.03–0.06) | V14, V4, V12 (2.5–3.5)V10, Time, V11, V3 (1.0–2.0)V7, V16, V1, V26 (0.5–1.0) |
| SHAP Insights | Focus on 8–10 features;Time not important | Stronger impact;Time ranks 5th; V26 important |
| Top LIME Indicators | V4 (+0.16);V12 > 0.17 (+0.13);V10 > -0.04 (+0.11);V14 > 0.07 (+0.10);V17 > 0.15 (+0.09) | V4 ;V14 > 0.07;V12 > 0.17;V10 > -0.04;V11 ;V17 > 0.15 |
| Additional LIME Patterns | V16 (-3.62 to -0.77);V7 (-3.39 to -0.76) | V7 (-3.39 to -0.76);V8 ;V22 ;V15 (-0.02 to 0.57) |
| Model Sensitivity | Focused and robust;clear thresholds | More extreme;sensitive to additional features |
| Category | Metric | Random Forest | XGBoost | Winner |
|---|---|---|---|---|
| Performance | Accuracy | 0.9995 | 0.9994 | RF |
| Precision | 0.8806 | 0.8451 | RF | |
| Recall | 0.7973 | 0.8108 | XGB | |
| F1 Score | 0.8369 | 0.8276 | RF | |
| ROC AUC | 0.9493 | 0.9775 | XGB | |
| Interpretability | SHAP Focus | Spread out features | V14, V4, V12 dominant | XGB |
| LIME Clarity | Less threshold info | Clear (V14 > 0.07, V4 ≤ -0.04) | XGB | |
| Imbalance | SHAP Shift Stability | Small changes | Large jump (+14.59) | XGB |
| Efficiency | Training Speed | Slower (bagging) | Faster (boosting) | XGB |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).