Preprint
Article

This version is not peer-reviewed.

High-Dimensional Multi-Source Feature Fusion for Early Default Prediction in Consumer Credit Portfolios

Submitted:

10 January 2026

Posted:

12 January 2026

You are already at the latest version

Abstract
This study develops a multi-source feature-fusion framework that combines transaction histories, mobile-behavior data, credit-bureau information, and merchant-level attributes. The feature space contains over 4,800 engineered variables derived from 3.5 million customer records. A three-stage selection pipeline—correlation filtering, mutual-information ranking, and stability-selection LASSO—reduces dimensionality by 92%. The selected features train a LightGBM model optimized for early-stage (0–30 day) delinquency prediction. The model achieves an ROC-AUC of 0.91 and reduces false-negative early defaults by 37.5% compared with baseline logistic regression. Feature-importance patterns reveal strong interactions between merchant category instability and device-behavior anomalies. The results show the effectiveness of multi-source feature fusion for fine-grained default prediction.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated