Submitted:
23 January 2025
Posted:
24 January 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2.1. Data Description
| Features | Feature Description |
|---|---|
| id | A unique LC assigned ID for the loan listing. |
| acc_now_delinq | The number of accounts on which the borrower is now delinquent. |
| acc_open_past_24mths | Number of trades opened in past 24 months. |
| addr_state | The state provided by the borrower in the loan application. |
| annual_inc | The self-reported annual income provided by the borrower during registration. |
| application_type | Indicates whether the loan is an individual application or a joint application with two co-borrowers. |
| avg_cur_bal | Average current balance of all accounts. |
| bc_open_to_buy | Total open to buy on revolving bankcards. |
| bc_util | Ratio of total current balance to high credit/credit limit for all bankcard accounts. |
| chargeoff_within_12_mths | Number of charge-offs within 12 months. |
| collections_12_mths_ex_med | Number of collections in 12 months excluding medical collections. |
| delinq_2yrs | The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past. |
| delinq_amnt | The past-due amount owed for the accounts on which the borrower is now delinquent. |
| dti | A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations. |
| emp_length | Employment length in years. Possible values are between 0 and 10 where 0 means less than one year. |
| emp_title | The job title supplied by the Borrower when applying for the loan. |
| fico_range_high | The upper boundary ranges the borrower’s FICO at loan origination belongs to. |
| fico_range_low | The lower boundary ranges the borrower’s FICO at loan origination belongs to. |
| funded_amnt | The total amount committed to that loan at that point in time. |
| funded_amnt_inv | The total amount committed by investors for that loan at that point in time. |
| grade | LC assigned loan grade |
| home_ownership | The home ownership status provided by the borrower during registration. |
| initial_list_status | The initial listing status of the loan. Possible values are – W, F |
| inq_last_6mths | The number of inquiries in past 6 months (excluding auto and mortgage inquiries) |
| installment | The monthly payment owed by the borrower if the loan originates. |
| int_rate | Interest Rate on the loan. |
| last_fico_range_high | The upper boundary ranges the borrower’s last FICO pulled belongs to. |
| last_fico_range_low | The lower boundary ranges the borrower’s last FICO pulled belongs to. |
| last_pymnt_amnt | Last total payment amount received. |
| loan_amnt | The listed amount of the loan applied for by the borrower. |
| mo_sin_old_il_acct | Months since oldest bank installment account opened. |
| mo_sin_old_rev_tl_op | Months since oldest revolving account opened. |
| mo_sin_rcnt_rev_tl_op | Months since most recent revolving account opened. |
| mo_sin_rcnt_tl | Months since most recent account opened. |
| mort_acc | Number of mortgage accounts. |
| mths_since_recent_bc | Months since most recent bankcard account opened. |
| mths_since_recent_inq | Months since most recent inquiry. |
| num_accts_ever_120_pd | Number of accounts ever 120 or more days past due. |
| num_actv_bc_tl | Number of currently active bankcard accounts. |
| num_actv_rev_tl | Number of currently active revolving trades. |
| num_bc_sats | Number of satisfactory bankcard accounts. |
| num_bc_tl | Number of bankcard accounts. |
| num_il_tl | Number of installment accounts. |
| num_op_rev_tl | Number of open revolving accounts. |
| num_rev_accts | Number of revolving accounts. |
| num_rev_tl_bal_gt_0 | Number of revolving trades with balance >0. |
| num_sats | Number of satisfactory accounts. |
| num_tl_120dpd_2m | Number of accounts currently 120 days past due (updated in past 2 months). |
| num_tl_30dpd | Number of accounts currently 30 days past due (updated in past 2 months). |
| num_tl_90g_dpd_24m | Number of accounts 90 or more days past due in last 24 months. |
| num_tl_op_past_12m | Number of accounts opened in past 12 months. |
| open_acc | The number of open credit lines in the borrower's credit file. |
| out_prncp | Remaining outstanding principal for total amount funded |
| out_prncp_inv | Remaining outstanding principal for portion of total amount funded by investors. |
| pct_tl_nvr_dlq | Percent of trades never delinquent. |
| percent_bc_gt_75 | Percentage of all bankcard accounts > 75% of limit. |
| policy_code | publicly available policy_code=1 new products not publicly available policy_code=2. |
| pub_rec | Number of derogatory public records. |
| pub_rec_bankruptcies | Number of public record bankruptcies. |
| purpose | A category provided by the borrower for the loan request. |
| pymnt_plan | Indicates if a payment plan has been put in place for the loan. |
| revol_bal | Total credit revolving balance. |
| revol_util | Revolving line utilization rate. |
| sub_grade | LC assigned loan subgrade. |
| tax_liens | Number of tax liens. |
| term | The number of payments on the loan. Values are in months and can be either 36 or 60. |
| tot_coll_amt | Total collection amounts ever owed. |
| tot_cur_bal | Total current balance of all accounts. |
| tot_hi_cred_lim | Total high credit limit. |
| total_acc | The total number of credit lines currently in the borrower's credit file. |
| total_bal_ex_mort | Total credit balance excluding mortgage. |
| total_bc_limit | Total bankcard high credit/credit limit. |
| total_il_high_credit_limit | Total installment high credit/credit limit. |
| total_pymnt | Payments received to date for total amount funded. |
| total_pymnt_inv | Payments received to date for portion of total amount funded by investors. |
| total_rec_int | Interest received to date. |
| total_rec_late_fee | Late fees received to date. |
| total_rec_prncp | Principal received to date. |
| total_rev_hi_lim | Total revolving high credit/credit limit. |
| verification_status | Indicates if income was verified by LC, not verified, or if the income source was verified. |
| hardship_flag | Flags whether or not the borrower is on a hardship plan |
| disbursement_method | The method by which the borrower receives their loan. Possible values are: CASH, DIRECT_PAY. |
| debt_settlement_flag | Flags whether or not the borrower, who has charged-off, is working with a debt-settlement company. |
| loan_status | Status of the loan (Current/Ongoing or Late Payments). |
2.2. Descriptive Statistics
2.2.1. Missing Values
2.2.2. Descriptive Analysis of Numerical Features
| Features | Mean | Std. Dev | Min Value | 25 %ile | 50 %ile | 75 %ile | Max Value |
|---|---|---|---|---|---|---|---|
| loan_amnt | 16472 | 9776 | 1000 | 9600 | 15000 | 23450 | 40000 |
| funded_amnt | 16472 | 9776 | 1000 | 9600 | 15000 | 23450 | 40000 |
| funded_amnt_inv | 16468 | 9774 | 750 | 9600 | 15000 | 23431 | 40000 |
| int_rate | 14 | 5 | 5 | 10 | 14 | 17 | 31 |
| installment | 479 | 283 | 8 | 271 | 405 | 640 | 1715 |
| annual_inc | 78547 | 81756 | 0 | 46000 | 65000 | 94000 | 9573072 |
| dti | 20 | 18 | 0 | 12 | 19 | 26 | 999 |
| delinq_2yrs | 0 | 1 | 0 | 0 | 0 | 0 | 36 |
| fico_range_low | 698 | 32 | 660 | 670 | 690 | 715 | 845 |
| fico_range_high | 702 | 32 | 664 | 674 | 694 | 719 | 850 |
| inq_last_6mths | 1 | 1 | 0 | 0 | 0 | 1 | 6 |
| open_acc | 12 | 6 | 0 | 8 | 11 | 15 | 62 |
| pub_rec | 0 | 1 | 0 | 0 | 0 | 0 | 45 |
| revol_bal | 16182 | 22897 | 0 | 5595 | 11061 | 19773 | 1392002 |
| revol_util | 49 | 25 | 0 | 30 | 48 | 68 | 143 |
| total_acc | 23 | 12 | 2 | 14 | 21 | 29 | 148 |
| out_prncp | 10837 | 8385 | 0 | 4276 | 8749 | 15418 | 40000 |
| out_prncp_inv | 10835 | 8384 | 0 | 4275 | 8747 | 15412 | 40000 |
| total_pymnt | 8533 | 7694 | 0 | 2971 | 6102 | 11682 | 60511 |
| total_pymnt_inv | 8530 | 7691 | 0 | 2971 | 6101 | 11679 | 60424 |
| total_rec_prncp | 5635 | 5344 | 0 | 1867 | 3906 | 7609 | 40000 |
| total_rec_int | 2891 | 3089 | 0 | 822 | 1788 | 3882 | 27394 |
| total_rec_late_fee | 8 | 34 | 0 | 0 | 0 | 0 | 1484 |
| last_pymnt_amnt | 512 | 739 | 0 | 264 | 402 | 646 | 38341 |
| last_fico_range_high | 659 | 75 | 0 | 599 | 669 | 714 | 850 |
| last_fico_range_low | 647 | 109 | 0 | 595 | 665 | 710 | 845 |
| collections_12_mths_ex_med | 0 | 0 | 0 | 0 | 0 | 0 | 6 |
| policy_code | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
| acc_now_delinq | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
| tot_coll_amt | 220 | 1712 | 0 | 0 | 0 | 0 | 197765 |
| tot_cur_bal | 137087 | 160456 | 0 | 27349 | 68989 | 204094 | 4170862 |
| total_rev_hi_lim | 34290 | 34783 | 0 | 14500 | 25600 | 43100 | 1656900 |
| acc_open_past_24mths | 5 | 3 | 0 | 2 | 4 | 6 | 43 |
| avg_cur_bal | 12847 | 15910 | 0 | 2940 | 6581 | 17599 | 370743 |
| bc_open_to_buy | 11577 | 16591 | 0 | 2021 | 5798 | 14476 | 284588 |
| bc_util | 56 | 29 | 0 | 33 | 57 | 81 | 216 |
| chargeoff_within_12_mths | 0 | 0 | 0 | 0 | 0 | 0 | 3 |
| delinq_amnt | 12 | 689 | 0 | 0 | 0 | 0 | 112524 |
| mo_sin_old_il_acct | 124 | 55 | 1 | 91 | 129 | 153 | 686 |
| mo_sin_old_rev_tl_op | 176 | 101 | 2 | 109 | 157 | 227 | 800 |
| mo_sin_rcnt_rev_tl_op | 14 | 18 | 0 | 4 | 8 | 17 | 368 |
| mo_sin_rcnt_tl | 8 | 9 | 0 | 3 | 6 | 11 | 368 |
| mort_acc | 1 | 2 | 0 | 0 | 1 | 2 | 23 |
| mths_since_recent_bc | 24 | 32 | 0 | 6 | 14 | 28 | 487 |
| mths_since_recent_inq | 7 | 6 | 0 | 2 | 5 | 10 | 24 |
| num_accts_ever_120_pd | 1 | 1 | 0 | 0 | 0 | 0 | 34 |
| num_actv_bc_tl | 4 | 2 | 0 | 2 | 3 | 5 | 25 |
| num_actv_rev_tl | 6 | 4 | 0 | 3 | 5 | 7 | 44 |
| num_bc_sats | 5 | 3 | 0 | 3 | 4 | 6 | 37 |
| num_bc_tl | 7 | 5 | 0 | 4 | 6 | 9 | 49 |
| num_il_tl | 8 | 7 | 0 | 3 | 6 | 11 | 99 |
| num_op_rev_tl | 8 | 5 | 0 | 5 | 7 | 11 | 60 |
| num_rev_accts | 13 | 8 | 2 | 8 | 12 | 17 | 116 |
| num_rev_tl_bal_gt_0 | 6 | 3 | 0 | 3 | 5 | 7 | 41 |
| num_sats | 12 | 6 | 0 | 7 | 11 | 15 | 62 |
| num_tl_120dpd_2m | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| num_tl_30dpd | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
| num_tl_90g_dpd_24m | 0 | 0 | 0 | 0 | 0 | 0 | 36 |
| num_tl_op_past_12m | 2 | 2 | 0 | 1 | 2 | 3 | 18 |
| pct_tl_nvr_dlq | 94 | 10 | 0 | 91 | 98 | 100 | 100 |
| percent_bc_gt_75 | 40 | 36 | 0 | 0 | 33 | 67 | 100 |
| pub_rec_bankruptcies | 0 | 0 | 0 | 0 | 0 | 0 | 7 |
| tax_liens | 0 | 0 | 0 | 0 | 0 | 0 | 45 |
| tot_hi_cred_lim | 172769 | 179167 | 0 | 49018 | 104864 | 249270 | 4562297 |
| total_bal_ex_mort | 52443 | 52269 | 0 | 20842 | 38610 | 66228 | 1392002 |
| total_bc_limit | 23092 | 23117 | 0 | 8300 | 16400 | 30000 | 711400 |
| total_il_high_credit_limit | 45790 | 46841 | 0 | 15443 | 34084 | 61867 | 976075 |
2.2.3. Loan Amount vs Interest Rate vs Loan Status
2.2.4. Loan Status Distribution
2.3. Method
2.3.1. SXI Score Calculation
2.3.2. Correlation of SXI w.r.t Late Payments
- is the target outcome,
- is the SXI scores,
- are the coefficients representing the weights of the polynomial terms,
- is the degree of the polynomial, and
- is the error term.
2.3.3. Model Training and Evaluation
- is the tuning parameter, ranging from 0.5 to 1.5 in increments of 0.1,
- s the Benchmark SXI score, and
- is the new SXI score after applying the alpha adjustment.
- TP is the number of true positives,
- TN is the number of true negatives,
- FP is the number of false positives, and
- FN is the number of false negatives.
- and are the true positive rates at consecutive thresholds, and
- and are the false positive rates at consecutive thresholds.
2.3.4. Actionable Insights for Reducing Late Payments
3. Results and Discussions





| Performance Metrics | Other Studies | This Study | |||||
|---|---|---|---|---|---|---|---|
| NN | Randomforest | BPNN | MOE | SVM Linear | XGBoost | SXI | |
| Accuracy | 78.60% | 78.80% | 84.05% | 92.10% | NA | 84.52% | 99.80% |
| Precision | NA | NA | NA | NA | NA | 86.29% | 98.95% |
| AUC | NA | NA | 0.79 | NA | 0.935 | 0.92 | 0.998 |
4. Conclusions
Abbreviations
| P2P: | Peer-to- Peer Lending |
| FICO: | Fair Isaac Corporation |
| SXI: | Sriya Expert Index |
| SVM: | Support Vector Machines |
| RBF: | Radial Basis Function |
| MOE: | Mixture of Experts |
| AUC: | Area Under Curve |
| LC: | Lending Club |
| ML: | Machine Learning |
| AI: | Artificial Intelligence |
| NN: | Neural Network |
| BPNN: | Back Propagation Neural Network |
| DNN: | Deep Neural Network |
| SEC: | Securities and Exchange Commission |
| TP: | True Positives |
| TN: | True Negatives, |
| FP: | False Positives |
| FN: | False Negatives. |
| FPR: | False Positive Rate |
| TPR: | True Positive Rate |
| ROC: | Receiver Operating Characteristic |
References
- Liu D., Brass D. J., Brass Y., Lu Y., Chen D. Friendship in online peer-to-peer lending: pipes, prisms, and relational herding. MIS Quarterly. 2015;39(3):729–742. doi: 10.25300/misq/2015/39.3.11.
- Ge R., Feng J., Gu B., Zhang P. Predicting and deterring default with social media information in peer-to-peer lending. Journal of Management Information Systems. 2017;34(2):401–424. doi: 10.1080/07421222.2017.1334472.
- Peer-to-Peer Lending in the United States: Surviving after Dodd-Frank Notes & Comments: I. The Dodd-Frank Wall Street Reform and Consumer Protection Act Magee, Jack R. Page 139.
- Zhang, DunGang, et al. “The Credit Risk Assessment of P2P Lending Based on BP Neural Network.” Industrial Engineering and Management Science, 1st ed., CRC Press, 2014, pp. 90–94.
- Fu, Y. (2017) Combination of Random Forests and Neural Networks in Social Lending. Journal of Financial Risk Management, 6, 418-426. doi: 10.4236/jfrm.2017.64030. 19.
- Chawla, N.V., Japkowicz, N., and Kotcz, A.: ‘Special issue on learning from imbalanced data sets’, ACM Sigkdd Explorations Newsletter, 2004, 6, (1), pp. 1-6.
- Chawla, Nitesh & Bowyer, Kevin & O. Hall, Lawrence & Philip Kegelmeyer, W. (2002). SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. (JAIR). 16.321-357. 10.1613/jair.95.
- Milad Malekipirbazari, Vural Aksakalli, Risk assessment in social lending via random forests, Expert Systems with Applications, Volume 42, Issue 10, 2015, Pages 4621-4631, ISSN 0957-4174.
- Freedman S, Jin GZ. Do social networks solve information problems for peer-to-peer lending? Evidence from Prospercom. SSRN Electron. J. 2008; 15:15. doi: 10.2139/ssrn.1936057.
- Li B. Online Loan Default Prediction Model Based on Deep Learning Neural Network. Computational Intelligence and Neuroscience. 2022; 2022:9. doi: 10.1155/2022/4276253.4276253.
- Feller, J., Gleasure, R., & Treacy, S. (2017). Information sharing and user behavior in internet-enabled peer-to-peer lending systems: an empirical study. Journal of Information Technology, 32(2), 127–146. doi:10.1057/jit.2016.1.
- Saputra, O., Faturohman, T., & Kaderi Wiryono, S. (2021). Social Media Data to Improve Credit Scoring Accuracy with A Data Mining Approach Based on Support Vector Machine: Case Study of An Online Peer to Peer Lending in Indonesia. International Journal of Accounting, Finance and Business (IJAFB), 6 (32), 1 -14.
- Miller-Janny Ariza-Garzón, Javier Arroyo, María-Jesús Segovia-Vargas, Antonio Caparrini. Profit-sensitive machine learning classification with explanations in credit risk: The case of small businesses in peer-to-peer lending. Electronic Commerce Research and Applications, Volume 67, 2024, 101428, ISSN 1567-4223. doi: 10.1016/j.elerap.2024.101428.
- Machado, M.R. and Karray, S. (2022) Assessing Credit Risk of Commercial Customers Using Hybrid Machine Learning Algorithms. Expert Systems with Applications, 200, 116889. doi: 10.1016/j.eswa.2022.116889.
- Makokha, C. W., Kube, A. and Ngesa, O. (2024) A Hybrid Approach for Predicting Probability of Default in Peer-to-Peer (P2P) Lending Platforms Using Mixture-of-Experts Neural Network. Journal of Data Analysis and Information Processing, 12, 151-162. doi: 10.4236/jdaip.2024.122009.
- S. Kilambi (2024) AI Square Enabled by Sriya Expert Index (SXI): Method of Determining and Use. U.S. Provisional Patent Application No. 63/549,553, 4 Feb 2024.
- S. Kilambi (2024) AI Square Enabled by Sriya Expert Index (SXI) Plus Reinforcement Learning. U.S. Provisional Patent Application No. 63/553,335, 14 Feb 2024.
- S. Kilambi (2024) AI Square Enabled by Sriya Expert Index (SXI): Generative AI. U.S. Provisional Patent Application No. 63/554,252, 16 Feb 2024.
- S. Kilambi (2024) Processing of Large Numerical Models (LNM) by AI2 enabled SXI. U.S. Provisional Patent Application No. 63/575,991, 8 Apr 2024.










| Loan Status | Training | Testing | Validation | Total |
|---|---|---|---|---|
| Late Payments | 18146 (50.03%) | 5088 (49.10%) | 2582 (49.83%) | 25816 (49.82%) |
| Current Payments | 18125 (49.97%) | 5275 (50.90%) | 2600 (50.17%) | 26000 (50.18%) |
| Total | 36271 | 10363 | 5182 | 51816 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).