Submitted:
10 June 2026
Posted:
11 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Material and Methods
2.1. Data Collection
2.2. Data Preprocessing
2.2.1. Label Creation and Record Filtering
2.2.2. Removal of Data Leakage Features
2.2.3. Feature Engineering
2.2.5. Categorical Encoding
2.2.6. Risk Band Construction
2.2.7. Feature Selection
2.2.8. Min-Max Normalization
2.2.9. Pipeline Summary
2.3. Feature Selection
2.3.1. Overview
2.3.2. Statistical Tests and Composite Score
2.3.3. Selected Features
2.4. Data Balancing
2.4.1. Overview
2.4.2. Strategy A: SMOTE Oversampling
2.4.3. Strategy B: Random Undersampling
2.4.4. Comparison of Strategies
2.5. Feature-to-Image Conversion
2.5.1. Motivation and Design
2.5.2. Image Specification
2.5.3. Encoding Illustration
2.5.4. Dataset Summary
2.6. Classification Models
2.6.1. Overview
2.6.2. Deep Neural Network (DNN)
2.6.3. Support Vector Machine (SVM)
2.6.4. Random Forest (RF)
2.6.5. Decision Tree (DT)
2.6.6. Convolutional Neural Network (CNN)
2.6.7. Hybrid CNN Models (CNN+SVM, CNN+RF, CNN+DT)
2.6.8. Hyperparameter Summary
2.7. Evaluation Protocol
3. Result
Discussion
5. Conclusion
Author Contributions
Funding
Data Availability Statement
Conflict of Interest
References
- Milne; Parboteeah, P. The business models and economics of peer-to-peer lending. 2016. [Google Scholar] [CrossRef]
- Global heterogeneous catalyst (metal, chemical, zeolites) market size, share & trends analysis report 2023–2030. Focus on Catalysts, 2024.
- Turiel, J. D.; Aste, T. P2P Loan Acceptance and Default Prediction with Artificial Intelligence. SSRN Electron. J. 2019. [Google Scholar] [CrossRef]
- Giudici, P.; Hadji-Misheva, B.; Spelta, A. Network based credit risk models. Qual. Eng. 2019, vol. 32, 199–211. [Google Scholar] [CrossRef]
- Chang, A.-H.; Yang, L.-K.; Tsaih, R.-H.; Lin, S.-K. Machine learning and artificial neural networks to construct P2P lending credit-scoring model: A case using Lending Club data. In Quantitative Finance and Economics; 2022. [Google Scholar]
- Serrano-Cinca; Gutiérrez-Nieto, B.; López-Palacios, L. Determinants of Default in P2P Lending. PLoS ONE 2015, vol. 10. [Google Scholar] [CrossRef] [PubMed]
- Atef, M.; Ouf, S.; Seoud, W.; Gabr, M. I. A novel approach using explainable prediction of default risk in peer-to-peer lending based on machine learning models. Neural Comput. Appl. 2025, vol. 37, 21783–21803. [Google Scholar] [CrossRef]
- Malagon, E.; Troncoso, D.; Rubio, A.; Ponce, H. Machine Learning Techniques in Credit Default Prediction. Mexican International Conference on Artificial Intelligence, 2022. [Google Scholar]
- Shi, S.; Tse, R.; Luo, W.; d’Addona, S.; Pau, G. Machine learning-driven credit risk: a systemic review. Neural Comput. Appl. 2022, vol. 34, 14327–14339. [Google Scholar] [CrossRef]
- Zhu, L.; Qiu, D.; Ergu, D.; Ying, C.; Liu, K. A study on predicting loan default based on the random forest algorithm. International Conference on Information Technology and Quantitative Management, 2019. [Google Scholar]
- Núñez Mora, J. A.; Moncayo, P.; Franco, C.; Madrazo-Lemarroy, P.; Beltrán, J. Loan Default Prediction: A Complete Revision of LendingClub. Rev. Mex. De Econ. Y Finanz. 2023. [Google Scholar] [CrossRef]
- Monje, L.; Carrasco, R. A.; Sánchez-Montañés, M. Machine Learning XAI for Early Loan Default Prediction. In Computational Economics; 2025. [Google Scholar]
- Zhang, X.; et al. Data-Driven Loan Default Prediction: A Machine Learning Approach for Enhancing Business Process Management. Syst. vol. 13, 581, 2025. [CrossRef]
- Akinjole, A.; Shobayo, O.; Popoola, J.; Okoyeigbo, O.; Ogunleye, B. Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction. In Mathematics; 2024. [Google Scholar]
- Suram, R. Efficient Deep Learning Models for Accurate Default Loan Prediction in Credit Risk Management. Int. J. Emerg. Res. Eng. Technol. 2026. [Google Scholar]
- Melese, T.; Berhane, T.; Mohammed, A.; Walelgn, A. Credit-Risk Prediction Model Using Hybrid Deep—Machine-Learning Based Algorithms. In Scientific Programming; 2023. [Google Scholar]
- Kvamme, H.; Sellereite, N.; Aas, K.; Sjursen, S. Predicting mortgage default using convolutional neural networks. Expert Syst. Appl. 2018, vol. 102, 207–217. [Google Scholar] [CrossRef]
- Gür, Y. E.; Toğaçar, M.; Solak, B. Integration of CNN Models and Machine Learning Methods in Credit Score Classification: 2D Image Transformation and Feature Extraction. Comput. Econ. 2025, vol. 65, 2991–3035. [Google Scholar] [CrossRef]
- Li, L.-H.; Sharma, A. K.; Cheng, S.-T. Explainable AI based LightGBM prediction model to predict default borrower in social lending platform. Intell. Syst. Appl. vol. 26, 200514, 2025. [CrossRef]
- wordsforthewise. Lending Club Loan Dataset. 2017, doi. Available online: https://www.kaggle.com/datasets/wordsforthewise/lending-club.
- Cai, X.; Dai, W.; Lu, J. Loan Default Prediction Based on Machine Learning Approaches. In Proceedings of the 2025 2nd International Conference on Generative Artificial Intelligence and Information Security, 2025. [Google Scholar]
- Haque, A.; Mahedi, M.; Lecturer, H. Bank Loan Prediction Using Machine Learning Techniques. ArXiv 2024, vol. abs/2410.08886. [Google Scholar] [CrossRef]
- Alonso, M.; Carbo, J. Understanding the Performance of Machine Learning Models to Predict Credit Default: A Novel Approach for Supervisory Evaluation. SSRN Electron. J. 2021. [Google Scholar] [CrossRef]
- Hancock, J. T.; Khoshgoftaar, T. M. Survey on categorical data for neural networks. J. Big Data 2020, vol. 7. [Google Scholar] [CrossRef]
- Kriebel, J.; Stitz, L. Credit default prediction from user-generated text in peer-to-peer lending using deep learning. Eur. J. Oper. Res. 2021, vol. 302, 309–323. [Google Scholar] [CrossRef]
- Alwateer, M. M.; Atlam, E.; El-Raouf, M. M. A.; Ghoneim, O. A.; Gad, I. Missing Data Imputation: A Comprehensive Review. J. Comput. Commun. 2024. [Google Scholar] [CrossRef]
- Bolikulov, F.; Nasimov, R.; Rashidov, A.; Akhmedov, F.; Cho, Y.-I. Effective Methods of Categorical Data Encoding for Artificial Intelligence Algorithms. In Mathematics; 2024. [Google Scholar]
- Zhang, L. Using Explainable Machine Learning to Predict Loan Risk in Consumer Finance. In Proceedings of the 2025 4th International Conference on Cyber Security, Artificial Intelligence and the Digital Economy, 2025. [Google Scholar]
- Sinsomboonthong, S. Performance Comparison of New Adjusted Min-Max with Decimal Scaling and Statistical Column Normalization Methods for Artificial Neural Network Classification. Int. J. Math. Math. Sci. 2022, vol. 2022, 3584406:1–3584406:9. [Google Scholar] [CrossRef]
- Amorim, L. B. V. d.; Cavalcanti, G. D. C.; Cruz, R. M. O. The choice of scaling technique matters for classification performance. Appl. Soft Comput. 2022, vol. 133, 109924. [Google Scholar] [CrossRef]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, vol. 40, 16–28. [Google Scholar] [CrossRef]
- Miao, J.; Niu, L. A Survey on Feature Selection. Procedia Comput. Sci. 2016, vol. 91, 919–926. [Google Scholar] [CrossRef]
- Zhao, G.-G.; Yang, J.; Zhang, L.; Yang, H. ANOVA F Test of Non-Null Hypothesis. Eur. J. Stat. 2024. [Google Scholar] [CrossRef]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2003, vol. 69 6 Pt 2, 066138. [Google Scholar] [CrossRef] [PubMed]
- Mann, H. B.; Whitney, D. R. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Stat. 1947, vol. 18, 50–60. [Google Scholar] [CrossRef]
- Glass, G. V.; Hopkins, K. D. Statistical methods in education and psychology, 3rd ed; 1996. [Google Scholar]
- Kolmogorov-Smirnov, A.; Kolmogorov, A. N.; Kolmogorov, M. Sulla determinazione empírica di uma legge di distribuzione. 1933. [Google Scholar]
- Guyon, M.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, vol. 3, 1157–1182. [Google Scholar]
- Krawczyk, A. Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 2016, vol. 5, 221–232. [Google Scholar] [CrossRef]
- Chawla, N.; Bowyer, K.; Hall, L. O.; Kegelmeyer, W. P. SMOTE: Synthetic Minority Over-sampling Technique. ArXiv 2002, vol. abs/1106.1813. [Google Scholar] [CrossRef]
- Drummond; Holte, R. C. C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling; 2003. [Google Scholar]
- Japkowicz, N.; Stephen, S. The class imbalance problem: A systematic study. Intell. Data Anal. 2002, vol. 6, 429–449. [Google Scholar] [CrossRef]
- Fernández, A.; García, S.; Herrera, F.; Chawla, N. SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. J. Artif. Intell. Res. 2018, vol. 61, 863–905. [Google Scholar] [CrossRef]
- Zhu, Y.; et al. Converting tabular data into images for deep learning with convolutional neural networks. Sci. Rep. 2021, vol. 11. [Google Scholar] [CrossRef]
- Li, X. Financial Fraud Identification and Interpretability Study for Listed Companies Based on Convolutional Neural Network. ArXiv 2025, vol. abs/2512.06648. [Google Scholar]
- Barboza, F.; Kimura, H.; Altman, E. I. Machine learning models and bankruptcy prediction. Expert Syst. Appl. 2017, vol. 83, 405–417. [Google Scholar] [CrossRef]
- Kingma, P.; Ba, J. Adam: A Method for Stochastic Optimization. CoRR 2014, vol. abs/1412.6980. [Google Scholar]
- Niculescu-Mizil, A.; Caruana, R. Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning, 2005. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, vol. 45, 5–32. [Google Scholar] [CrossRef]
- Speybroeck, N. Classification and regression trees. Int. J. Public Health 2012, vol. 57, 243–246. [Google Scholar] [CrossRef] [PubMed]
- Wilcoxon, F. Individual Comparisons by Ranking Methods. Biometrics 1945, vol. 1, 196–202. [Google Scholar]
- Friedman, M. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. J. Am. Stat. Assoc. 1937, vol. 32, 675–701. [Google Scholar] [CrossRef]
- Souadda, L. I.; Halitim, A. R.; Benilles, B.; Oliveira, J. M.; Ramos, P. Optimizing Credit Risk Prediction for Peer-to-Peer Lending Using Machine Learning. Forecasting, 2025. [Google Scholar]
- Chen, Y.-R.; Leu, J.-S.; Huang, S.-A.; Wang, J.-T.; Takada, J.-i. Predicting Default Risk on Peer-to-Peer Lending Imbalanced Datasets. IEEE Access 2021, vol. 9, 73103–73109. [Google Scholar] [CrossRef]
- Kim, J.-Y.; Cho, S.-B. Towards Repayment Prediction in Peer-to-Peer Social Lending Using Deep Learning. In Mathematics; 2019. [Google Scholar]
- Yang, R. Machine Learning-Based Loan Default Prediction in Peer-to-Peer Lending. In Highlights in Science, Engineering and Technology; 2024. [Google Scholar]
- Alenizy, H. A.; Berri, J. Transforming tabular data into images via enhanced spatial relationships for CNN processing. Sci. Rep. 2025, vol. 15. [Google Scholar] [CrossRef]
- Albanesi, S.; Vamossy, D. F. NBER WORKING PAPER SERIES PREDICTING CONSUMER DEFAULT: A DEEP LEARNING APPROACH; 2019. [Google Scholar]
- myFico. What's in my FICO® Scores? doi, 2020. Available online: https://www.myfico.com/credit-education/whats-in-your-credit-score.
- Bhardwaj, G.; Sengupta, R. Credit Scoring and Loan Default Credit Scoring and Loan Default.
- Miller, S. Risk Factors for Consumer Loan Default: A Censored Quantile Regression Analysis. 2010. [Google Scholar]
- Gorishniy, Y. V.; Rubachev, I.; Khrulkov, V.; Babenko, A. Revisiting Deep Learning Models for Tabular Data. Neural Information Processing Systems, 2021. [Google Scholar]
- Addo, P. M.; Guégan, D.; Hassani, B. K. Credit Risk Analysis using Machine and Deep learning models. 2018. [Google Scholar]
- Ali Shahbazi, S. J.; Şirzad, Nefise; Najafzadeh, Hossein. A Multi-Method Examination of Transformational Leadership and Citizenship Behavior: Insights from Explainable Machine Learning. preprints 2026. [Google Scholar] [CrossRef]














| Attribute | Detail |
|---|---|
| Dataset Name | ACCEPTED_LOANS |
| Total Records | 2,260,701 |
| Total Features | 151 |
| Source Files | 5 files (~500,000 records each) |
| Extraction Date | February 7, 2026 |
| Processing Speed | 512 rows/second |
| Feature Category | Representative Features |
| Loan Characteristics (9) | loan_amnt, int_rate, term, grade, sub_grade, installment |
| Borrower Information (7) | emp_title, emp_length, home_ownership, annual_inc, addr_state |
| Loan Status and Purpose (11) | loan_status, purpose, dti, issue_d, verification_status |
| Credit Profile (13) | fico_range_low, fico_range_high, delinq_2yrs, revol_bal, revol_util |
| Repayment History (17) | total_pymnt, total_rec_prncp, recoveries, out_prncp, last_pymnt_amnt |
| Advanced Credit Metrics (60) | num_bc_tl, num_il_tl, pct_tl_nvr_dlq, mo_sin_old_il_acct, bc_util |
| Joint Application and Secondary Applicant (13) | annual_inc_joint, dti_joint, sec_app_fico_range_low |
| Hardship and Settlement (21) | hardship_flag, debt_settlement_flag, settlement_amount, settlement_percentage |
| Feature | Formula / Derivation | Description |
|---|---|---|
| term_months | Extracted from term string | Loan term in months (36 or 60) |
| issue_year | <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mtext>year</mtext><mo>(</mo><mtext>issue_d</mtext><mo>)</mo></mrow></math> <!-- MathType@End@5@5@ --> |
Calendar year of loan issuance |
| issue_month | <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mtext>month</mtext><mo>(</mo><mtext>issue_d</mtext><mo>)</mo></mrow></math> <!-- MathType@End@5@5@ --> |
Calendar month of loan issuance |
| credit_history_years | <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mtext>issue_year</mtext><mo>−</mo><mtext>year</mtext><mo>(</mo><mtext>earliest_cr_line</mtext><mo>)</mo></mrow></math> <!-- MathType@End@5@5@ --> |
Length of borrower's credit history in years |
| fico_avg | <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mfrac><mrow><mtext>fico_range_low</mtext><mo>+</mo><mtext>fico_range_high</mtext></mrow><mrow><mn>2</mn></mrow></mfrac></mrow></math> <!-- MathType@End@5@5@ --> |
Average FICO score at origination |
| loan_to_income | <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mfrac><mrow><mtext>loan_amnt</mtext></mrow><mrow><mtext>annual_inc</mtext><mo>+</mo><mn>1</mn></mrow></mfrac></mrow></math> <!-- MathType@End@5@5@ --> |
Loan amount relative to annual income |
| emp_length_numeric | Ordinal mapping <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mfenced open="[" close="]"><mrow><mn>0</mn></mrow><mrow><mn>10</mn></mrow></mfenced></mrow></math> <!-- MathType@End@5@5@ --> |
Employment length encoded as integer |
| installment_to_income | <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mfrac><mrow><mtext>installment</mtext><mo>×</mo><mn>12</mn></mrow><mrow><mtext>annual_inc</mtext><mo>+</mo><mn>1</mn></mrow></mfrac></mrow></math> <!-- MathType@End@5@5@ --> |
Annual installment burden relative to income |
| has_delinq | <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mn>1</mn><mo>[</mo><mtext>delinq_2yrs</mtext><mo>></mo><mn>0</mn><mo>]</mo></mrow></math> <!-- MathType@End@5@5@ --> |
Binary indicator of recent delinquency |
| has_pub_rec | <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mn>1</mn><mo>[</mo><mtext>pub_rec</mtext><mo>></mo><mn>0</mn><mo>]</mo></mrow></math> <!-- MathType@End@5@5@ --> |
Binary indicator of public record |
| has_inquiries | <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mn>1</mn><mo>[</mo><mtext>inq_last_6mths</mtext><mo>></mo><mn>0</mn><mo>]</mo></mrow></math> <!-- MathType@End@5@5@ --> |
Binary indicator of recent credit inquiries |
| dti_high | <!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math><mrow><mn>1</mn><mo>[</mo><mtext>dti</mtext><mo>></mo><mn>20</mn><mo>]</mo></mrow></math> <!-- MathType@End@5@5@ --> |
Binary indicator of high debt-to-income ratio |
| Stage | Item | Value |
|---|---|---|
| Input | Total raw records | 260,701 |
| Filtering | Records after outcome filtering | 115,487 |
| Non-default records (y = 0) | 88,158 (76.34%) | |
| Default records (y = 1) | 27,329 (23.66%) | |
| Default rate | 23.66% | |
| Class imbalance ratio (0:1) | 3.23 : 1 | |
| Leakage Removal | Features removed | 24 |
| Feature Engineering | New features added | 12 |
| Encoding | Numerical features | 28 |
| Categorical features | 8 | |
| Total features after encoding | 184 | |
| Feature Selection | Final selected features | 128 |
| Normalization | Scaling method | Min-Max [0, 1] |
| Risk Stratification | Number of risk bands | 3 (Low / Medium / High) |
| Metric | Original | SMOTE Upsampled | Random Downsampled |
|---|---|---|---|
| Total Samples | 115,487 | 176,316 | 54,658 |
| Non-Default (Label = 0) | 88,158 (76.34%) | 88,158 (50.00%) | 27,329 (50.00%) |
| Default (Label = 1) | 27,329 (23.66%) | 88,158 (50.00%) | 27,329 (50.00%) |
| Imbalance Ratio | 3.23 : 1 | 1.00 : 1 | 1.00 : 1 |
| Dataset | Non-Default (Label=0) | Default (Label=1) | Total Images |
|---|---|---|---|
| Original | 88,158 | 27,329 | 115,487 |
| SMOTE Upsampled | 88,158 | 88,158 | 176,316 |
| Random Downsampled | 27,329 | 27,329 | 54,658 |
| Total | — | — | 346,461 |
| Parameter | DNN | SVM | RF | DT | CNN | CNN+SVM | CNN+RF | CNN+DT |
|---|---|---|---|---|---|---|---|---|
| Input dimension | 64 | 64 | 64 | 64 | 64 × 64 × 1 | 128 | 128 | 128 |
| Architecture | 256→128→64→32→1 | Linear | T = 50 | Depth 6 | Conv(32/64/128/256)+Dense(256/128) | Linear | T = 50 | Depth 6 |
| Regularization | L2 = 10−3 | c = 0.1 | — | — | L2 = 10−4 | c = 0.1 | — | — |
| Dropout | 0.35/0.30/0.25/0.20 | — | — | — | 0.10/0.15/0.20/0.40/0.30 | — | — | — |
| Max depth | — | — | 8 | 6 | — | — | 8 | 6 |
| Min samples leaf | — | — | 5 | 10 | — | — | 5 | 10 |
| Feature sampling | — | — | All | — | — | All | ||
| Loss | Weighted BCE | Hinge | — | Gini | Weighted BCE | Hinge | — | Gini |
| Optimizer | Adam | L-BFGS | Bagging | CART | Adam | L-BFGS | Bagging | CART |
| Learning rate | 10−3 | — | — | — | 10−3 | — | — | — |
| Batch size | 64 | — | — | — | 32 | — | — | — |
| Max epochs | 100 | 2000 iter | — | — | 50 | 2000 iter | — | — |
| Early stopping | 15 (val AUC) | — | — | — | 10 (val AUC) | — | — | — |
| LR scheduler | ×0.5 (p=7) | — | — | — | ×0.5 (p=5) | — | — | — |
| Class balancing | 𝑤1 = 𝑁0/𝑁1 | balanced | balanced | balanced | 𝑤1 = 𝑁0/𝑁1 | balanced | balanced | balanced |
| Probability output | Sigmoid | Isotonic (3-fold) | Leaf avg | Leaf avg | Sigmoid | Isotonic (5-fold) | Leaf avg | Leaf avg |
| Random seed | 42 | 42 | 42 | 42 | 42 | 42 | 42 | 42 |
| Metric | Formula |
|---|---|
| Accuracy | (𝑇𝑃 + 𝑇𝑁)/(𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁) |
| Precision | 𝑇𝑃/(𝑇𝑃 + 𝐹𝑃) |
| Recall | 𝑇𝑃/(𝑇𝑃 + 𝐹𝑁) |
| Specificity | 𝑇𝑁/(𝑇𝑁 + 𝐹𝑃) |
| F1-Score | 2 · Precision · Recall / (Precision + Recall) |
| AUC-ROC | Area under the ROC curve |
| AUC-PR | Area under the Precision-Recall curve |
| Brier Score | |
| MCC | (𝑇𝑃 · 𝑇𝑁 − 𝐹𝑃 · 𝐹𝑁) / |
| G-mean |
| Feature | Mean (Non-Default) | SD (Non-Default) | Mean (Default) | SD (Default) | Mean Diff | Mann-Whitney U | p-value | Sig. | 𝑟pb |
|---|---|---|---|---|---|---|---|---|---|
| loan_amnt | 13,717.21 | 9,135.96 | 15,464.97 | 9,380.39 | +1,747.75 | 1,062,316,670 | 3.14 × 10−192 | *** | 0.0805 |
| int_rate | 13.11 | 4.91 | 16.23 | 5.55 | +3.12 | 793,141,197 | <10−300 | *** | 0.2528 |
| annual_inc | 80,343.64 | 84,498.68 | 72,422.30 | 60,660.81 | −7,921.34 | 1,318,645,490 | 5.74 × 10−124 | *** | −0.0423 |
| dti | 18.08 | 13.76 | 20.38 | 19.12 | +2.30 | 1,038,266,196 | 1.41 × 10−261 | *** | 0.0643 |
| fico_avg | 702.48 | 34.65 | 691.89 | 27.96 | −10.59 | 1,420,461,420 | <10−300 | *** | −0.1343 |
| credit_history_years | 15.997 | 7.489 | 15.570 | 7.750 | −0.427 | 1,259,306,828 | 6.00 × 10−30 | *** | −0.0240 |
| revol_util | 45.35 | 24.92 | 50.26 | 24.32 | +4.91 | 1,064,570,179 | 5.14 × 10−186 | *** | 0.0839 |
| installment | 422.11 | 278.64 | 479.82 | 292.96 | +57.71 | 1,051,769,202 | 3.60 × 10−221 | *** | 0.0866 |
| loan_to_income | 11.90 | 535.21 | 13.33 | 582.26 | +1.43 | 955,278,711 | <10−300 | *** | 0.0011 |
| installment_to_income | 4.23 | 188.09 | 5.16 | 230.97 | +0.93 | 950,059,167 | <10−300 | *** | 0.0020 |
| emp_length_numeric | 5.964 | 3.624 | 5.746 | 3.582 | −0.218 | 1,246,436,353 | 7.53 × 10−19 | *** | −0.0257 |
| total_acc | 24.47 | 12.20 | 23.40 | 11.97 | −1.065 | 1,270,177,129 | 3.19 × 10−42 | *** | −0.0372 |
| open_acc | 11.559 | 5.714 | 11.667 | 5.716 | +0.108 | 1,187,266,921 | 3.01 × 10−4 | *** | 0.0081 |
| delinq_2yrs | 0.314 | 0.902 | 0.369 | 0.981 | +0.055 | 1,175,396,754 | 1.29 × 10−18 | *** | 0.0253 |
| Dataset | Model | Accuracy | AUC-ROC | F1-Score | MCC |
|---|---|---|---|---|---|
| Original | CNN | 0.9849 ± 0.0013 | 1.0000 ± 0.0000 | 0.9887 ± 0.0019 | 0.9704 ± 0.0022 |
| (Imbalanced) | CNN+SVM | 0.9797 ± 0.0021 | 0.9990 ± 0.0006 | 0.9842 ± 0.0028 | 0.9601 ± 0.0026 |
| CNN+RF | 0.9872 ± 0.0028 | 1.0000 ± 0.0000 | 0.9914 ± 0.0008 | 0.9772 ± 0.0008 | |
| CNN+DT | 0.9833 ± 0.0027 | 0.9989 ± 0.0007 | 0.9875 ± 0.0017 | 0.9689 ± 0.0051 | |
| SMOTE | CNN | 0.9602 ± 0.0062 | 0.9913 ± 0.0018 | 0.9604 ± 0.0031 | 0.9146 ± 0.0073 |
| Upsampled | CNN+SVM | 0.9477 ± 0.0019 | 0.9850 ± 0.0019 | 0.9500 ± 0.0029 | 0.8981 ± 0.0032 |
| CNN+RF | 0.9707 ± 0.0020 | 0.9946 ± 0.0005 | 0.9725 ± 0.0018 | 0.9481 ± 0.0044 | |
| CNN+DT | 0.9567 ± 0.0041 | 0.9868 ± 0.0032 | 0.9540 ± 0.0030 | 0.9097 ± 0.0110 | |
| Random | CNN | 0.9383 ± 0.0055 | 0.9840 ± 0.0017 | 0.9437 ± 0.0054 | 0.8759 ± 0.0070 |
| Downsampled | CNN+SVM | 0.9269 ± 0.0048 | 0.9764 ± 0.0034 | 0.9326 ± 0.0045 | 0.8600 ± 0.0085 |
| CNN+RF | 0.9557 ± 0.0022 | 0.9894 ± 0.0026 | 0.9566 ± 0.0034 | 0.9106 ± 0.0045 | |
| CNN+DT | 0.9364 ± 0.0104 | 0.9796 ± 0.0043 | 0.9363 ± 0.0049 | 0.8679 ± 0.0138 |
| Dataset | Model Pair | |ΔAUC| | Wilcoxon p |
|---|---|---|---|
| Original | CNN vs. CNN+SVM | 0.0010 | 0.043* |
| (Imbalanced) | CNN vs. CNN+RF | 0.0000 | 0.083 |
| CNN vs. CNN+DT | 0.0011 | 0.157 | |
| CNN+SVM vs. CNN+RF | 0.0010 | 0.037* | |
| CNN+SVM vs. CNN+DT | 0.0001 | 0.021* | |
| CNN+RF vs. CNN+DT | 0.0011 | 0.046* | |
| Friedman: χ²(3) = 8.40, p = 0.038 | |||
| SMOTE | CNN vs. CNN+SVM | 0.0063 | 0.032* |
| Upsampled | CNN vs. CNN+RF | 0.0033 | 0.041* |
| CNN vs. CNN+DT | 0.0045 | 0.317 | |
| CNN+SVM vs. CNN+RF | 0.0096 | 0.027* | |
| CNN+SVM vs. CNN+DT | 0.0018 | 0.018* | |
| CNN+RF vs. CNN+DT | 0.0078 | 0.029* | |
| Friedman: χ²(3) = 10.20, p = 0.017 | |||
| Random | CNN vs. CNN+SVM | 0.0076 | 0.028* |
| Downsampled | CNN vs. CNN+RF | 0.0054 | 0.044* |
| CNN vs. CNN+DT | 0.0044 | 0.412 | |
| CNN+SVM vs. CNN+RF | 0.0130 | 0.022* | |
| CNN+SVM vs. CNN+DT | 0.0076 | 0.032* | |
| CNN+RF vs. CNN+DT | 0.0098 | 0.031* | |
| Friedman: χ²(3) = 9.60, p = 0.022 |
| Ref. | Authors | Year | Dataset / Size | Models | Balancing | Key Metrics |
|---|---|---|---|---|---|---|
| [3] | Turiel & Aste | 2020 | LendingClub (~800K accepted loans) | LR, SVM, DNN | None | AUC up to 0.72 |
| [10] | Zhu et al. | 2019 | LendingClub (2019 Q1, ~15 features) | RF + SMOTE | SMOTE | RF outperforms SVM, DT |
| [53] | Souadda et al. | 2021 | P2P imbalanced dataset | XGBoost | SMOTE, NearMiss, RUS | AUC ~0.78 |
| [54] | Chen et al. (IEEE) | 2021 | P2P LendingClub-style, imbalanced | RF, DT, LR | SMOTE | AUC = 0.73–0.79 |
| [55] | Kim & Cho | 2019 | LendingClub (2007–2018) | CNN (1D), 5-fold CV | None | Repayment prediction improved over LR |
| [5] | Chang et al. | 2022 | LendingClub (multi-year) | LR, SVM, DT, RF, XGBoost, LightGBM, ANN | Grid search / CV | XGBoost best; AUC ~0.70 |
| [17] | Kvamme et al. | 2018 | Norwegian mortgages (20,989) | CNN (deep) | None | CNN outperforms RF on transaction data |
| [14] | Akinjole et al. | 2024 | LendingClub | RF, DT, SVM, XGBoost, ADABoost, MLP | SMOTE+ENN | Acc = 93.7%, Precision = 95.6%, Recall = 95.5% |
| [11] | Monje et al. | 2023 | LendingClub (full history) | RF + SMOTE | SMOTE | F1-macro > 0.90 |
| [15] | Suram et al. | 2026 | LendingClub (full) | Deep learning (DL) + SMOTE | SMOTE | Improved AUC over baselines |
| [18] | Gür et al. | 2025 | Kaggle Credit Score dataset | CNN (DenseNet, ResNet, etc.) + ML hybrids | Not specified | CNN+NewFC best accuracy |
| [44] | Zhu et al. | 2021 | Drug screening (tabular-to-image) | CNN on IGTD images | None | CNN on image > tabular models |
| [56] | Yang et al. | 2024 | LendingClub | RF, DT, LR, ensemble | Not specified | Acc ~85–88% |
| [19] | Li et al. | 2025 | LendingClub (2007–2020) | LightGBM + SHAP/LIME | RFE | Acc = 0.87, XAI-enhanced |
| [53] | Souadda et al. | 2025 | LendingClub + Australia + Taiwan | LR, RF, XGBoost, LightGBM + HPO | Class weighting | LightGBM AUC = 70.77% |
| - | Present Study | 2025 | LendingClub (115,487 records, 64 features) | DNN, SVM, RF, DT, CNN, CNN+SVM, CNN+RF, CNN+DT | SMOTE, RUS | CNN+RF: AUC = 1.000, Acc = 0.987, F1 = 0.991, MCC = 0.977 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).