Submitted:
30 April 2026
Posted:
05 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Development of a hybrid predictive framework that integrates LSTM and XGBoost to improve the accuracy of credit default prediction by capturing both temporal dependencies and nonlinear relationships within financial datasets.
- Integration of explainable artificial intelligence (XAI) through SHAP analysis, enabling transparent interpretation of model predictions and identification of key determinants of borrower default risk.
- Provision of policy-relevant insights for financial institutions and regulators, demonstrating how interpretable hybrid AI models can support responsible lending decisions, improve risk governance, and enhance compliance with regulatory requirements related to algorithmic transparency.
- Empirical evaluation of the proposed model using credit risk datasets to demonstrate improvements in predictive performance, interpretability, and decision support compared with conventional machine learning and deep learning approaches.
2. Materials and Methods
2.1. Dataset and Data Preprocessing
2.2. Feature Engineering
2.3. Logistic Regression Model
2.4. Cox Proportional Hazards Model
2.4. Extreme Gradient Boosting
2.5. Long Short-Term Memory
2.6. DeepSurv Model
2.7. Hybrid LSTM–XGBoost Framework with SHAP
2.8. Model Evaluation
3. Results
3.1. Exploratory Data Analysis (EDA) Interpretation
3.2. Model Performance Evaluation
3.3. Confusion Matrix Analysis
3.4. Feature Importance Analysis
3.5. Model Interpretability
4. Discussion
5. Conclusions
6. Patents
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| LSTM | Long Short-Term Memory |
| XAI | explainable artificial intelligence |
| XGBoost | Extreme Gradient Boosting |
| SHAP | Shapley Additive Explanations |
| ROC–AUC | Receiver Operating Characteristic – Area Under the Curve |
References
- Ahmad, T.; Katari, P.; Pamidi Venkata, A.K.; Ravi, C.; Shaik, M. Explainable AI: Interpreting deep learning models for decision support. Advances in Deep Learning Techniques 2024, 4(1), 80–108. [Google Scholar]
- El-Qadi, A.; Trocan, M.; Frossard, T.; Díaz-Rodríguez, N. Credit Risk Scoring Forecasting Using a Time Series Approach. In Physical Sciences Forum; MDPI, December 2022; Vol. 5, No. 1, p. 16. [Google Scholar]
- Ishtiaq, W. Explainable AI Models for Credit Card Default Prediction: Balancing Accuracy and Interpretability. Global Research Repo 2025, 1(3), 1–16. [Google Scholar] [CrossRef]
- Mathibela, M.R.; Maposa, D. Predictive Modelling of Credit Default Risk Using Machine Learning and Ensemble Techniques. Mathematical and Computational Applications 2026, 31(2), 45. [Google Scholar] [CrossRef]
- Gao, J.; Sun, W.; Sui, X. Research on Default Prediction for Credit Card Users Based on XGBoost-LSTM Model. Discrete Dynamics in Nature and Society 2021, 2021(1), 5080472. [Google Scholar] [CrossRef]
- Gao, X.; Yang, X.; Zhao, Y. Rural micro-credit model design and credit risk assessment via improved LSTM algorithm. PeerJ Computer Science 2023, 9, e1588. [Google Scholar] [CrossRef] [PubMed]
- Guo, K.; Luo, S.; Liang, M.; Zhang, Z.; Yang, H.; Wang, Y.; Zhou, Y. Credit default prediction on time-series behavioral data using ensemble models. In 2023 International Joint Conference on Neural Networks (IJCNN); IEEE, June 2023; pp. 01–09. [Google Scholar]
- Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: a review on explainable artificial intelligence. Cognitive Computation 2024, 16(1), 45–74. [Google Scholar] [CrossRef]
- Hoang, A.; Phan, H.; Nguyen, V.D. Explainable AI in Finance: Enhancing Transparency and Interpretability of AI Models in Financial Decision-Making. Data Science in Finance and Accounting 2026, 193–211. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural computation 1997, 9(8), 1735–1780. [Google Scholar] [CrossRef]
- Kandi, K.; García-Dopico, A. Enhancing performance of credit card model by utilizing LSTM networks and XGBoost algorithms. Machine Learning and Knowledge Extraction 2025, 7(1), 20. [Google Scholar] [CrossRef]
- Li, Y.; Stasinakis, C.; Yeo, W. M. A hybrid XGBoost-MLP model for credit risk assessment on digital supply chain finance. Forecasting 2022, 4(1), 184–207. [Google Scholar] [CrossRef]
- Liang, L.; Cai, X. Forecasting peer-to-peer platform default rate with LSTM neural network. Electronic Commerce Research and Applications 2020, 43, 100997. [Google Scholar] [CrossRef]
- Lin, J. Research on loan default prediction based on logistic regression, randomforest, xgboost and adaboost. In SHS web of conferences; EDP Sciences, 2024; Vol. 181, p. 02008. [Google Scholar]
- Lin, Kang; Gao, Yuzhuo. Model interpretability of financial fraud detection by group SHAP. Expert Systems with Applications 2022, 210, 118354. [Google Scholar] [CrossRef]
- Liu, J.; Zhang, S.; Fan, H. A two-stage hybrid credit risk prediction model based on XGBoost and graph-based deep neural network. Expert Systems with Applications 2022, 195, 116624. [Google Scholar] [CrossRef]
- Nallakaruppan, M.K.; Balusamy, B.; Shri, M.L.; Malathi, V.; Bhattacharyya, S. An explainable AI framework for credit evaluation and analysis. Applied Soft Computing 2024, 153, 111307. [Google Scholar] [CrossRef]
- Perera, C.L.; Premaratne, S.C. An ensemble machine learning approach for forecasting credit risk of loan applications. WSEAS Transactions on Systems 2024, 23, 31–46. [Google Scholar] [CrossRef]
- Punukollu, P.; Burugu, S.; Yerneni, R.P.; Punukollu, M.; Gudekota, S. Developing AI-Driven Predictive Models for Credit Risk Forecasting: Leveraging Machine Learning Techniques for Enhancing Decision-Making in Lending Practices. European Journal of Quantum Computing and Intelligent Agents 2022, 6, 135–169. [Google Scholar]
- Qin, C.; Zhang, Y.; Bao, F.; Zhang, C.; Liu, P.; Liu, P. XGBoost optimized by adaptive particle swarm optimization for credit scoring. Mathematical Problems in Engineering 2021, 2021(1), 6655510. [Google Scholar] [CrossRef]
- Wang, L.; Yu, Z.; Ma, J.; Chen, X.; Wu, C. A Two-Stage Interpretable Model to Explain Classifier in Credit Risk Prediction. Journal of Forecasting 2025, 44(7), 2132–2150. [Google Scholar] [CrossRef]
- Wang, M.; Zhang, X.; Yang, Y.; Wang, J. Explainable Machine Learning in Risk Management: Balancing Accuracy and Interpretability. Journal of Financial Risk Management 2025, 14(3), 185–198. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, L.; Wang, J.; Liu, Z.; Niu, X. Profit-oriented loan default prediction for the financial industry: a fusion framework with interpretability. Financial Innovation 2026, 12(1), 6. [Google Scholar] [CrossRef]
- Wang, Z.; Liang, J. Comparative analysis of interpretability techniques for feature importance in credit risk assessment. Spectrum of Research 2024, 4(2). [Google Scholar]
- Yang, M.; Zhang, Y.; Li, Y.; Hong, F.; Wang, T. Predicting Financial Distress via Static and Dynamic Features: A Boruta-Enhanced XGBoost Approach with SHAP Interpretability. Computational Economics 2026, 1–28. [Google Scholar] [CrossRef]
- Yu, C.; Jin, Y.; Xing, Q.; Zhang, Y.; Guo, S.; Meng, S. Advanced user credit risk prediction model using lightgbm, xgboost and tabnet with smoteenn. arXiv 2024, arXiv:2408.03497. [Google Scholar] [CrossRef]
- Zhang, J.; Zhao, Z. Corporate ESG rating prediction based on XGBoost-SHAP interpretable machine learning model. Expert Systems with Applications 2026, 295, 128809. [Google Scholar] [CrossRef]
- Zhu, M.; Zhang, Y.; Gong, Y.; Xing, K.; Yan, X.; Song, J. Ensemble methodology: Innovations in credit default prediction using lightgbm, xgboost, and localensemble. In 2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI); IEEE, May 2024; pp. 421–426. [Google Scholar]
- Katzman, J. L.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC medical research methodology 2018, 18(1), 24. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; August 2016; pp. 785–794. [Google Scholar]
- Zheng, Y. A default prediction method using XGBoost and lightgbm. In 2022 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML); IEEE, October 2022; pp. 210–213. [Google Scholar]
- Sharma, A.K.; Li, L.H.; Ahmad, R. Default risk prediction using random forest and xgboosting classifier. 2021 International Conference on Security and Information Technologies with AI, Internet Computing and Big-data Applications, November; Cham; Springer International Publishing, 2022; pp. 91–101. [Google Scholar]
- Ouyang, Y. Loan Default Prediction Based on Logistic Regression and XGBoost Modeling. In 2024 IEEE 2nd International Conference on Control, Electronics and Computer Technology (ICCECT); IEEE, April 2024; pp. 1145–1149. [Google Scholar]
- Qiu, Y.; Wang, J. Credit default prediction using time series-based machine learning models. In Artificial Intelligence and Applications; March 2025; Vol. 3, No. 3, pp. 284–294. [Google Scholar]
- Ahya, P.; Bamel, I.; Chandra, S. Hybrid Optimization and Explainability-Driven Framework for Creditworthiness Assessment. In 2025 IEEE 4th International Conference for Advancement in Technology (ICONAT); IEEE, September 2025; pp. 1–6. [Google Scholar]
- Sasikumar, A.; Nareshkumar, R. Mitigating Loan-Default Risk with Ensemble Models and Explainable AI (XAI). In 2025 Tenth International Conference on Science Technology Engineering and Mathematics (ICONSTEM); IEEE, November 2025; pp. 1–7. [Google Scholar]
- Yu, J. Implementation of XGBoost Ensemble Learning Algorithm in Enterprise Default Risk Assessment. In Proceedings of the 2025 2nd International Conference on Economic Data Analytics and Artificial Intelligence; November 2025; pp. 148–153. [Google Scholar]











| Model | Accuracy | Precision | Recall | F1-score | AUC |
| Logistic Regression | 0.7420 | 0.7124 | 0.7018 | 0.6923 | 0.7815 |
| Cox Model | 0.7615 | 0.7542 | 0.8423 | 0.6675 | 0.8427 |
| XGBoost | 0.9142 | 0.8875 | 0.8721 | 0.8797 | 0.9421 |
| LSTM | 0.9025 | 0.8762 | 0.8618 | 0.8689 | 0.9332 |
| DeepSurv | 0.8931 | 0.8654 | 0.8513 | 0.8583 | 0.9215 |
| LSTM–XGBoost (SHAP) | 0.9510 | 0.9312 | 0.9187 | 0.9249 | 0.9754 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).