Submitted:
10 August 2023
Posted:
11 August 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and methods
2.1. Aim
2.1.1. Research Questions
2.2. Research Strategy
2.2.1. Research string

2.3. Selection Criteria
2.4. Data extraction from study
2.5. Risk of bias
3. Current Research
4. Results
4.1. Answer to RQ1: What are the algorithms, methods, and models used to predict credit risk?


4.2. Answer to RQ2: Which are the metrics to evaluate the performance of algorithms, methods, or models?
4.3. Answer to RQ3: What are these models’ accuracy, precision, F1 measure, and AUC?
4.4. Answer to RQ4: What datasets are using in the prediction of credit risk?
4.5. Answer to RQ5: What variables or features are using to prediction credit risk?
4.6. Answer to RQ6: What are the main problems or limitations of predicting credit risk?
5. Additional findings
5.1. Dataset balancing techniques
5.2. Techniques for determination of hyperparameters
6. Discussion
7. Conclusion
- Boosted Category is the family of ML models that are being researched the most in Ass and N-Ass, due to the best results it is showing, although the trend is its use in Ass.
- The five most used metrics are AUC, ACC, Recall, F1 measure, and Precision, although, in practice, the problem must consider choosing the most appropriate metrics.
- Public data sets are more used; of this group, the most used are UCI German Dataset and Landing Club Dataset. Its main use is to validate the behavior against other models under the same conditions. Private Datasets generate knowledge from the application to a specific situation.
- For the evaluation through ML of credits, demographic and operational variables are mainly used since they are oriented to identify patterns to predict behaviors. However, external variables and those related to unstructured data should consider the hyper-connectivity and development of DDI and BIG DATA processing.
- The main problems are the representativeness of reality, the imbalance of data for the training, and the inconsistency in recording information; both cases arise due to biases, errors, or problems in recording the information.
- The most widely used method to solve the imbalance problem is SMOTE to optimize the performance of ML models, while the methods to determine the hyperparameters are KFold-CV and Grid Search to guide their optimization.
8. Future Research
Funding
Conflicts of Interest
References
- Lombardo, G.; Pellegrino, M.; Adosoglou, G.; Cagnoni, S.; Pardalos, P.M.; Poggi, A. Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks. Future Internet 2022, 14, 244. [Google Scholar] [CrossRef]
- Ziemba, P.; Becker, J.; Becker, A.; Radomska-Zalas, A.; Pawluk, M.; Wierzba, D. Credit decision support based on real set of cash loans using integrated machine learning algorithms. Electronics 2021, 10, 2099. [Google Scholar] [CrossRef]
- Liu, C.; Ming, Y.; Xiao, Y.; Zheng, W.; Hsu, C.H. Finding the next interesting loan for investors on a peer-to-peer lending platform. IEEE Access 2021, 9, 111293–111304. [Google Scholar] [CrossRef]
- Shih, D.H.; Wu, T.W.; Shih, P.Y.; Lu, N.A.; Shih, M.H. A Framework of Global Credit-Scoring Modeling Using Outlier Detection and Machine Learning in a P2P Lending Platform. Mathematics 2022, 10, 2282. [Google Scholar] [CrossRef]
- FED20230403 Consumer Credit - G.19. https://www.federalreserve.gov/releases/g19/current/. Accedido: 2023-02-28.
- Zhang, Z.; Jia, X.; Chen, S.; Li, M.; Wang, F. Dynamic Prediction of Internet Financial Market Based on Deep Learning. Computational Intelligence and Neuroscience 2022, 2022. [Google Scholar] [CrossRef] [PubMed]
- BM Panorama general. https://www.bancomundial.org/es/topic/financialsector/overview. Accedido: 2021-12-22.
- Deloitte The future of retail banking: The hyper-personalisation imperative. https://www2.deloitte.com/content/dam/Deloitte/uk/Documents/financial-services/deloitte-uk-hp-the-future-of-retail-banking-1.pdf. Accedido: 2023-01-20.
- SBS Informe de Estabilidad del Sistema Financiero – mayo 2022. https://www.sbs.gob.pe/Portals/0/jer/pub-InformeEstabilidad/InfEstFin-2022-1-v2.pdf. Accedido: 2023-01-20.
- Hani, U.; Wickramasinghe, A.; Kattiyapornpong, U.; Sajib, S. The future of data-driven relationship innovation in the microfinance industry. Annals of Operations Research 2022, 1–27. [Google Scholar] [CrossRef]
- Zhang, C.; Zhong, H.; Hu, A. A Method for Financial System Analysis of Listed Companies Based on Random Forest and Time Series. Mobile Information Systems 2022, 2022. [Google Scholar] [CrossRef]
- Yıldırım, M.; Okay, F.Y.; Øzdemir, S. Big data analytics for default prediction using graph theory. Expert Systems with Applications 2021, 176, 114840. [Google Scholar] [CrossRef]
- Bi, W.; Liang, Y. Risk Assessment of Operator’s Big Data Internet of Things Credit Financial Management Based on Machine Learning. Mobile Information Systems 2022, 2022. [Google Scholar] [CrossRef]
- Hariri, R.H.; Fredericks, E.M.; Bowers, K.M. Uncertainty in big data analytics: survey, opportunities, and challenges. Journal of Big Data 2019, 6, 1–16. [Google Scholar] [CrossRef]
- Chen, Z.; Chen, W.; Shi, Y. Ensemble learning with label proportions for bankruptcy prediction. Expert Systems with Applications 2020, 146, 113155. [Google Scholar] [CrossRef]
- SBS Resolución S.B.S. N° 00053-2023. https://intranet2.sbs.gob.pe/dv_int_cn/2240/v1.0/Adjuntos/0053-2023.R.pdf. Accedido: 2023-04-20.
- Fan, S.; Shen, Y.; Peng, S. Improved ML-based technique for credit card scoring in internet financial risk control. Complexity 2020, 2020, 1–14. [Google Scholar] [CrossRef]
- García, V.; Marques, A.I.; Sánchez, J.S. Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Information Fusion 2019, 47, 88–101. [Google Scholar] [CrossRef]
- Wang, M.; Yang, H. Research on personal credit risk assessment model based on instance-based transfer learning. In Proceedings of the Intelligence Science III: 4th IFIP TC 12 International Conference, ICIS 2020, Durgapur, India, February 24–27, 2021, Revised Selected Papers 4. Springer, 2021, pp. 159–169. [CrossRef]
- Teles, G.; Rodrigues, J.J.; Rabêlo, R.A.; Kozlov, S.A. Comparative study of support vector machines and random forests machine learning algorithms on credit operation. Software: Practice and Experience 2021, 51, 2492–2500. [Google Scholar] [CrossRef]
- Orlova, E.V. Decision-making techniques for credit resource management using machine learning and optimization. Information 2020, 11, 144. [Google Scholar] [CrossRef]
- Zou, Y.; Gao, C.; Gao, H. Business failure prediction based on a cost-sensitive extreme gradient boosting machine. IEEE Access 2022, 10, 42623–42639. [Google Scholar] [CrossRef]
- Fritz-Morgenthal, S.; Hein, B.; Papenbrock, J. Financial risk management and explainable, trustworthy, responsible AI. Frontiers in Artificial Intelligence 2022, 5, 5. [Google Scholar] [CrossRef] [PubMed]
- Sun, M.; Li, Y.; et al. Credit Risk Simulation of Enterprise Financial Management Based on Machine Learning Algorithm. Mobile Information Systems 2022, 2022. [Google Scholar] [CrossRef]
- Mousavi, M.M.; Lin, J. The application of PROMETHEE multi-criteria decision aid in financial decision making: Case of distress prediction models evaluation. Expert Systems with Applications 2020, 159, 113438. [Google Scholar] [CrossRef]
- Zhao, L.; Yang, S.; Wang, S.; Shen, J. Research on PPP Enterprise Credit Dynamic Prediction Model. Applied Sciences 2022, 12, 10362. [Google Scholar] [CrossRef]
- Pandey, M.K.; Mittal, M.; Subbiah, K. Optimal balancing & efficient feature ranking approach to minimize credit risk. International Journal of Information Management Data Insights 2021, 1, 100037. [Google Scholar] [CrossRef]
- Pławiak, P.; Abdar, M.; Acharya, U.R. Application of new deep genetic cascade ensemble of SVM classifiers to predict the Australian credit scoring. Applied Soft Computing 2019, 84, 105740. [Google Scholar] [CrossRef]
- Cho, S.H.; Shin, K.s. Feature-Weighted Counterfactual-Based Explanation for Bankruptcy Prediction. Expert Systems with Applications 2023, 216, 119390. [Google Scholar] [CrossRef]
- Bao, W.; Lianju, N.; Yue, K. Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Systems with Applications 2019, 128, 301–315. [Google Scholar] [CrossRef]
- Mitra, R.; Goswami, A.; Tiwari, M.K. Financial supply chain analysis with borrower identification in smart lending platform. Expert Systems with Applications 2022, 208, 118026. [Google Scholar] [CrossRef]
- Jemai, J.; Zarrad, A. Feature Selection Engineering for Credit Risk Assessment in Retail Banking. Information 2023, 14, 200. [Google Scholar] [CrossRef]
- Chen, S.F.; Chakraborty, G.; Li, L.H. Feature selection on credit risk prediction for peer-to-peer lending. In Proceedings of the New Frontiers in Artificial Intelligence: JSAI-isAI 2018 Workshops, JURISIN, AI-Biz, SKL, LENLS, IDAA, Yokohama, Japan, November 12–14, 2018, Revised Selected Papers. Springer, 2019, pp. 5–18. [CrossRef]
- Si, Z.; Niu, H.; Wang, W. Credit Risk Assessment by a Comparison Application of Two Boosting Algorithms. In Fuzzy Systems and Data Mining VIII; IOS Press, 2022; pp. 34–40. [Google Scholar] [CrossRef]
- Merćep, A.; Mrčela, L.; Birov, M.; Kostanjčar, Z. Deep neural networks for behavioral credit rating. Entropy 2020, 23, 27. [Google Scholar] [CrossRef]
- Bussmann, N.; Giudici, P.; Marinelli, D.; Papenbrock, J. Explainable machine learning in credit risk management. Computational Economics 2021, 57, 203–216. [Google Scholar] [CrossRef]
- Moscato, V.; Picariello, A.; Sperlí, G. A benchmark of machine learning approaches for credit score prediction. Expert Systems with Applications 2021, 165, 113986. [Google Scholar] [CrossRef]
- Ariza-Garzón, M.J.; Arroyo, J.; Caparrini, A.; Segovia-Vargas, M.J. Explainability of a machine learning granting scoring model in peer-to-peer lending. Ieee Access 2020, 8, 64873–64890. [Google Scholar] [CrossRef]
- Dumitrescu, E.; Hué, S.; Hurlin, C.; Tokpavi, S. Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. European Journal of Operational Research 2022, 297, 1178–1192. [Google Scholar] [CrossRef]
- Ma, X.; Lv, S. Financial credit risk prediction in internet finance driven by machine learning. Neural Computing and Applications 2019, 31, 8359–8367. [Google Scholar] [CrossRef]
- Karn, A.L.; Sachin, V.; Sengan, S.; Gandhi, I.; Ravi, L.; Sharma, D.K.; Subramaniyaswamy, V. DESIGNING A DEEP LEARNING-BASED FINANCIAL DECISION SUPPORT SYSTEM FOR FINTECH TO SUPPORT CORPORATE CUSTOMER’S CREDIT EXTENSION. Malaysian Journal of Computer Science 2022, 116–131. [Google Scholar] [CrossRef]
- Zheng, B. Financial default payment predictions using a hybrid of simulated annealing heuristics and extreme gradient boosting machines. International Journal of Internet Technology and Secured Transactions 2019, 9, 404–425. [Google Scholar] [CrossRef]
- Mancisidor, R.A.; Kampffmeyer, M.; Aas, K.; Jenssen, R. Learning latent representations of bank customers with the variational autoencoder. Expert Systems with Applications 2021, 164, 114020. [Google Scholar] [CrossRef]
- Wang, T.; Liu, R.; Qi, G. Multi-classification assessment of bank personal credit risk based on multi-source information fusion. Expert Systems with Applications 2022, 191, 116236. [Google Scholar] [CrossRef]
- Liu, W.; Fan, H.; Xia, M.; Pang, C. Predicting and interpreting financial distress using a weighted boosted tree-based tree. Engineering Applications of Artificial Intelligence 2022, 116, 105466. [Google Scholar] [CrossRef]
- Andrade Mancisidor, R.; Kampffmeyer, M.; Aas, K.; Jenssen, R. Deep generative models for reject inference in credit scoring. Knowledge-Based Systems 2020. [Google Scholar] [CrossRef]
- Wu, Z. Using machine learning approach to evaluate the excessive financialization risks of trading enterprises. Computational Economics 2021, 1–19. [Google Scholar] [CrossRef]
- Liu, J.; Zhang, S.; Fan, H. A two-stage hybrid credit risk prediction model based on XGBoost and graph-based deep neural network. Expert Systems with Applications 2022, 195, 116624. [Google Scholar] [CrossRef]
- Shu, R. Deep Representations with Learned Constraints; Stanford University, 2022. [Google Scholar]
- Tripathi, D.; Edla, D.R.; Kuppili, V.; Bablani, A. Evolutionary extreme learning machine with novel activation function for credit scoring. Engineering Applications of Artificial Intelligence 2020, 96, 103980. [Google Scholar] [CrossRef]
- Uj, A.; Nmb, E.; Ks, C.; Skl, D. Financial crisis prediction model using ant colony optimization-ScienceDirect. International Journal of Information Management 2020, 50, 538–556. [Google Scholar] [CrossRef]
- Feng, Y. Bank Green Credit Risk Assessment and Management by Mobile Computing and Machine Learning Neural Network under the Efficient Wireless Communication. Wireless Communications and Mobile Computing 2022, 2022. [Google Scholar] [CrossRef]
- Tian, J.; Li, L. Digital universal financial credit risk analysis using particle swarm optimization algorithm with structure decision tree learning-based evaluation model. Wireless Communications and Mobile Computing 2022, 2022. [Google Scholar] [CrossRef]
- Chen, X.; Li, S.; Xu, X.; Meng, F.; Cao, W. A novel GSCI-based ensemble approach for credit scoring. IEEE Access 2020, 8, 222449–222465. [Google Scholar] [CrossRef]
- Koç, O.; Başer, F.; Kestel, S.A. Credit Risk Evaluation Using Clustering Based Fuzzy Classification Method. Expert Systems With Applications 2023. [Google Scholar] [CrossRef]
- Rishehchi Fayyaz, M.; Rasouli, M.R.; Amiri, B. A data-driven and network-aware approach for credit risk prediction in supply chain finance. Industrial Management & Data Systems 2021, 121, 785–808. [Google Scholar] [CrossRef]
- Muñoz-Cancino, R.; Bravo, C.; Ríos, S.A.; Graña, M. On the combination of graph data for assessing thin-file borrowers’ creditworthiness. Expert Systems with Applications 2023, 213, 118809. [Google Scholar] [CrossRef]
- Li, Y.; Stasinakis, C.; Yeo, W.M. A hybrid XGBoost-MLP model for credit risk assessment on digital supply chain finance. Forecasting 2022, 4, 184–207. [Google Scholar] [CrossRef]
- Haro, B.; Ortiz, C.; Armas, J. Predictive Model for the Evaluation of Credit Risk in Banking Entities Based on Machine Learning. In Proceedings of the Brazilian Technology Symposium. Springer, 2018, pp. 605–612. [CrossRef]
- de Castro Vieira, J.R.; Barboza, F.; Sobreiro, V.A.; Kimura, H. Machine learning models for credit analysis improvements: Predicting low-income families’ default. Applied Soft Computing 2019, 83, 105640. [Google Scholar] [CrossRef]
- Li, D.; Li, L. Research on Efficiency in Credit Risk Prediction Using Logistic-SBM Model. Wireless Communications and Mobile Computing 2022, 2022. [Google Scholar] [CrossRef]
- Qian, H.; Wang, B.; Yuan, M.; Gao, S.; Song, Y. Financial distress prediction using a corrected feature selection measure and gradient boosted decision tree. Expert Systems with Applications 2022, 190, 116202. [Google Scholar] [CrossRef]
- Alam, T.M.; Shaukat, K.; Hameed, I.A.; Luo, S.; Sarwar, M.U.; Shabbir, S.; Li, J.; Khushi, M. An investigation of credit card default prediction in the imbalanced datasets. IEEE Access 2020, 8, 201173–201198. [Google Scholar] [CrossRef]
- Song, Y.; Peng, Y. A MCDM-based evaluation approach for imbalanced classification methods in financial risk prediction. IEEE Access 2019, 7, 84897–84906. [Google Scholar] [CrossRef]
- Li, Z.; Zhang, J.; Yao, X.; Kou, G. How to identify early defaults in online lending: a cost-sensitive multi-layer learning framework. Knowledge-Based Systems 2021, 221, 106963. [Google Scholar] [CrossRef]
- Chrościcki, D.; Chlebus, M. The Advantage of Case-Tailored Information Metrics for the Development of Predictive Models, Calculated Profit in Credit Scoring. Entropy 2022, 24, 1218. [Google Scholar] [CrossRef] [PubMed]
- Biswas, N.; Mondal, A.S.; Kusumastuti, A.; Saha, S.; Mondal, K.C. Automated credit assessment framework using ETL process and machine learning. Innovations in Systems and Software Engineering 2022, 1–14. [Google Scholar] [CrossRef]
- Machado, M.R.; Karray, S. Assessing credit risk of commercial customers using hybrid machine learning algorithms. Expert Systems with Applications 2022, 200, 116889. [Google Scholar] [CrossRef]


| Research Topics | Motivation |
| The algorithms, methods, and models used to predict credit risk. | We wish to know what models the industry and academics use to predict credit risk. |
| The metrics to evaluate the performance of algorithms, methods, or models. | We wish to know what metrics to use in the industry and academics to evaluate the performance of algorithms, methods or models predict credit risk. |
| The models’ accuracy, precision, F1 measure, and AUC. | We wish to know the metrics accuracy, precision, F1 measure, and AUC of algorithms, methods, or models predict credit risk. |
| The datasets are using in the prediction of credit risk. | We wish to know what datasets to use in the industry and academics to predict credit risk. |
| The variables or features are using to prediction credit risk. | We wish to know what variables or features to use in the industry and academics to predict credit risk. |
| The main problems or limitations of predicting credit risk. | We wish to know the main problems or limitations to predict credit risk. |
| Inclusion criteria | Exclusion criteria | # | % |
| Article of conference | 2 | 0.73% | |
| Article of journal | 50 | 18.18% | |
| Article duplicated | 77 | 28.00% | |
| No related | 15 | 5.45% | |
| Review article | 1 | 0.36% | |
| Without access to the full document | 57 | 20.73% | |
| Without rank in Scimagojr | 73 | 26.55% | |
| Total | 275 | 100.00% |
| It. | Family | # | % | |||||
| Ass | N-Ass | Total | Ass | N-Ass | Total | |||
| 1 | Boosted Category | 36 | 46 | 82 | 11.96% | 15.28% | 27.24% | |
| 2 | Collective Intelligence | 7 | 7 | 2.33% | 0.00% | 2.33% | ||
| 3 | Fuzzy Logic | 10 | 10 | 3.32% | 0.00% | 3.32% | ||
| 4 | NN / DL | 8 | 28 | 36 | 2.66% | 9.30% | 11.96% | |
| 5 | Other Model | 3 | 10 | 13 | 1.00% | 3.32% | 4.32% | |
| 6 | Traditional | 18 | 135 | 153 | 5.98% | 44.85% | 50.83% | |
| Total | 82 | 219 | 301 | 27.24% | 72.76% | 100.00% | ||
| It. | Metrics | # | % | It. | Metrics | # | % | |
| 1 | AUC | 34 | 16.11% | 9 | KS | 7 | 3.32% | |
| 2 | ACC | 30 | 14.22% | 10 | BS | 6 | 2.84% | |
| 3 | F1 Measure | 24 | 11.37% | 11 | GINNI | 5 | 2.37% | |
| 4 | Precision | 22 | 10.43% | 12 | RMSE | 2 | 0.95% | |
| 5 | RECALL | 19 | 9.00% | 13 | KAPPA | 1 | 0.47% | |
| 6 | TPR | 14 | 6.64% | 14 | MAE | 1 | 0.47% | |
| 7 | TNR | 13 | 6.16% | 15 | Other | 24 | 11.37% | |
| 8 | GMEAN | 9 | 4.27% | |||||
| Total | 211 | 100.00% |
| It. | Dataset | Author | ACC? | Precis? | F1? | Recall? | AUC? |
| 1 | UCI Taiwan | [31] | 85.00 | 70.00 | 50.00 | 62.00 | |
| 2 | UCI German | [63] | 83.50 | 82.10 | 84.40 | 86.80 | 91.00 |
| 3 | UCI German | [27] | 82.80 | 91.20 | |||
| 4 | UCI German | [50] | 81.18 | 85.38 | |||
| 5 | UCI German | [51] | 76.60 | 84.74 | |||
| 6 | UCI German | [30] | 75.80 | 54.20 | 82.00 | 85.90 | |
| 7 | UCI German | [55] | 74.90 | 75.80 | |||
| 8 | UCI German | [18] | 79.40 | ||||
| 9 | Lending Club | [34] | 92.60 | 97.90 | 92.20 | 97.00 | |
| 10 | Lending Club | [67] | 84.40 | 88.99 | 91.42 | 93.98 | |
| 11 | Lending Club | [32] | 76.10 | 75.98 | 75.95 | 76.35 | 76.80 |
| 12 | Lending Club | [48] | 88.77 | 94.14 | |||
| 13 | Lending Club | [33] | 74.90 | ||||
| 14 | Lending Club | [37] | 64.00 | 71.70 | |||
| 15 | Lending Club | [38] | 63.60 | 85.30 | 73.50 | 64.50 | 67.40 |
| 16 | Lending Club | [46] | 18.25 | 46.88 | 63.63 | ||
| 17 | Lending Club | [65] | 2.72 | 75.86 | |||
| 18 | K Prosper | [3] | 78.50 | 54.70 | |||
| 19 | K Prosper | [19] | 79.00 | 71.00 | 65.00 | 80.00 | |
| 20 | K Give Me | [59] | 88.30 | 78.50 | 77.60 | 76.70 | 93.30 |
| 21 | RenRenDai | [54] | 93.35 | 73.12 | 82.64 | ||
| 22 | BR | [60] | 96.68 | 89.63 | |||
| 23 | AVG Used | [12] | 92.80 | 31.60 | 33.40 | 35.50 | 82.80 |
| 24 | AVG Used | [64] | 91.89 | 96.19 | |||
| 25 | UCI Austr... | [28] | 97.39 | ||||
| 26 | Tsinghua | [52] | 91.23 | ||||
| 27 | Tsinghua | [62] | 77.20 | 75.90 | 77.54 | 79.38 | 85.01 |
| 28 | Private Data | [20] | 98.34 | 100.00 | 96.00 | ||
| 29 | Private Data | [53] | 98.00 | ||||
| 30 | Private Data | [58] | 97.80 | 98.90 | 98.70 | 98.90 | |
| 31 | Private Data | [17] | 90.10 | ||||
| 32 | Private Data | [29] | 84.29 | 82.63 | 84.68 | 86.83 | 84.29 |
| 33 | Private Data | [44] | 84.15 | 82.15 | 83.40 | 84.68 | |
| 34 | Private Data | [56] | 83.00 | 83.50 | 83.00 | 83.00 | 83.30 |
| 35 | Private Data | [61] | 77.49 | 79.87 | 85.59 | 92.18 | 79.00 |
| 36 | Private Data | [26] | 87.15 | 84.56 | 83.91 | 83.59 | |
| 37 | Private Data | [66] | 46.10 | ||||
| 38 | Private Data | [1] | 75.40 | ||||
| 39 | Private Data | [39] | 85.68 | ||||
| 40 | Private Data | [35] | 93.39 | ||||
| 41 | Private Data | [36] | 93.00 | ||||
| 42 | Private Data | [42] | 42.81 | 52.00 | 67.01 | 78.00 | |
| 43 | Private Data | [40] | 71.32 | ||||
| 44 | Private Data | [2] | 91.40 | ||||
| 45 | Private Data | [41] | 88.00 | 88.00 | 88.00 | 93.00 | |
| 46 | Private Data | [43] | 77.56 | ||||
| 48 | Private Data | [22] | 95.50 |
| It. | Features Group | # | % |
| 1 | Demographic | 291 | 54.09% |
| 2 | Operation | 157 | 29.18% |
| 3 | Payment behavior | 41 | 7.62% |
| 4 | External factors | 36 | 6.69% |
| 5 | Unstructured data | 7 | 1.30% |
| 6 | Transaction | 6 | 1.12% |
| Total | 538 | 100.00% |
| It. | Features Group | Feature | # | % |
| 1 | Demographic | External Debt Value / historical | 27 | 5.02% |
| 2 | Demographic | Domestic Debt Value / historical | 27 | 5.02% |
| 3 | Operation | Loan value | 24 | 4.46% |
| 4 | Demographic | Average / Total revenue | 20 | 3.72% |
| 5 | Demographic | Residence / Registered Assets | 19 | 3.53% |
| 6 | Demographic | Economic Activity / Experience | 18 | 3.35% |
| 7 | Demographic | Family Income | 18 | 3.35% |
| 8 | Payment behavior | Days in arrears / Range Days in arrears | 17 | 3.16% |
| 9 | Operation | Historical use of debt | 16 | 2.97% |
| 10 | Operation | Destination of the Credit / Purpose | 16 | 2.97% |
| 11 | Operation | Interest Rate | 16 | 2.97% |
| 12 | External factors | Debt Profitability | 16 | 2.97% |
| 13 | Demographic | Total Debt / Income / DTI | 15 | 2.79% |
| 14 | Demographic | Gender / Sex | 14 | 2.60% |
| 15 | Demographic | Risk Segment / Buro Rating / Score | 14 | 2.60% |
| 16 | Demographic | Age / Date of Birth | 13 | 2.42% |
| 17 | Operation | Checking / Savings Account | 13 | 2.42% |
| 18 | Operation | Credit Line Limit | 13 | 2.42% |
| 19 | Demographic | Civil Status | 12 | 2.23% |
| 20 | Demographic | Mortgage Debt | 12 | 2.23% |
| 21 | Operation | Monthly Fees | 12 | 2.23% |
| 22 | Payment behavior | Collection status | 11 | 2.04% |
| 23 | Payment behavior | Unpaid Installment Number | 11 | 2.04% |
| 24 | Demographic | Financial maturity | 9 | 1.67% |
| 25 | Demographic | Residence type | 9 | 1.67% |
| 26 | Demographic | Fee value | 9 | 1.67% |
| 27 | External factors | Inventory turnover | 9 | 1.67% |
| 28 | Demographic | Labor Old | 7 | 1.30% |
| 29 | Demographic | Education Level | 7 | 1.30% |
| 30 | Others | Others | 114 | 21.21% |
| Total | 538 | 100.00% |
| It. | Limits Identified | # | % |
| 1 | Representativeness of reality | 39 | 31.71% |
| 2 | Unbalanced data | 35 | 28.46% |
| 3 | Inconsistency in information recording | 21 | 17.07% |
| 4 | Lack of ability to explain the proposed results | 16 | 13.01% |
| 5 | Availability of information and centralized processing | 7 | 5.69% |
| 6 | Adaptability in processing struct. and unstruct. information | 5 | 4.07% |
| Total | 123 | 100.00% |
| It. | Method | # | % | It. | Method | # | % | |
| 1 | SMOTE | 24 | 28.24% | 8 | CC | 2 | 2.35% | |
| 2 | KFold | 17 | 20.00% | 9 | CS-Classifiers | 2 | 2.35% | |
| 3 | ROS | 10 | 11.76% | 10 | KN-SMOTE | 2 | 2.35% | |
| 4 | RUS | 10 | 11.76% | 11 | NMISS | 2 | 2.35% | |
| 5 | ADASYN | 4 | 4.71% | 12 | RESAMPLE | 2 | 2.35% | |
| 6 | SMOTEBoost | 4 | 4.71% | 13 | SMOTE-T | 2 | 2.35% | |
| 7 | B-SMOT | 2 | 2.35% | 14 | Under-Bagging | 2 | 2.35% | |
| Total | 85 | 100.00% |
| It. | Method | # | % |
| 1 | KFold CV | 21 | 58.33% |
| 2 | Grid Search Method | 8 | 22.22% |
| 3 | LightGBM Bayesian Optimisation | 2 | 5.56% |
| 4 | Genetic Algorithm (GA) | 2 | 5.56% |
| 5 | Random Search | 1 | 2.78% |
| 6 | Ant Colony Optimiation (ACO) | 1 | 2.78% |
| 7 | Other | 1 | 2.78% |
| Total | 36 | 100.00% |
| It. | Family | 2019 | 2020 | 2021 | 2022 | 2023 | Total |
| 1 | Boosted Category | 4 | 4 | 5 | 10 | 1 | 24 |
| 2 | Traditional | 4 | 1 | 5 | 4 | 1 | 15 |
| 3 | NN / DL | 1 | 1 | 2 | 2 | 1 | 7 |
| 4 | Collective Intelligence | 2 | 2 | 4 | |||
| 5 | Fuzzy Logic | 1 | 1 | 2 | |||
| Total | 9 | 9 | 12 | 18 | 4 | 52 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
