Recommender System for Term Deposit Likelihood Prediction using Cross-validated Neural Network

For enhancing the maximized profit from bank as well as customer perspective, term deposit can accelerate finance fields. This paper focuses on likelihood of term deposit subscription taken by the customers. Bank campaign efforts and customer details are influential while considering possibilities of taking term deposit subscription. An automated system is provided in this paper that approaches towards prediction of term deposit investment possibilities in advance. Neural network along with stratified 10-fold cross-validation methodology is proposed as predictive model which is later compared with other benchmark classifiers such as k-Nearest Neighbor (k-NN), Decision tree classifier (DT), and Multi-layer perceptron classifier (MLP). Experimental study concluded that proposed model provides significant prediction results over other baseline models with an accuracy of 88.32% and MSE of 0.1168.


INTRODUCTION
While considering socio-economic structure, banking sector plays significant role to boost up that structure. Generally, banks provide numerous products as well as services to clients. Deposits are one of products those are served to clients and from bank's perspective deposits are essential key points when bank finance topic comes into play. Bank campaign may occur either through direct marketing or mass campaigns. Mass campaigns target at general indiscriminate public and direct marketing campaigns are instigated with the target of a specific group. The problem of direct marketing is very low positive number of responses c. Direct marketing is not popular because of its privacy intrusion insecurity which may elicit negative attitude towards bank. Due to evolving structure of telemarketing through Computer-Telephony-Integration techniques, it became quite common and easy to generate a wide variety of reports from marketing campaigns and so to add-up other types of information available for the organizations. Bank offers several deposit schemes like term deposit, recurring deposit, fixed deposit, and deposits in savings account, current account and many more [1]. In this paper, term deposit structure is considered as one of the important investing scheme since it facilitates the bank as well as the customers with subtle amount of profit. The impact of telemarketing campaign on term deposit subscription is taken into consideration by this paper.
A recommender system is proposed in this paper that provides automatic predictions regarding the possibilities of term deposits from client side. A term deposit account is held at bank where money is locked up for definite period of time with higher interest rates than traditional saving accounts. This will definitely benefit customers with maximized profit and the bank sectors will get benefitted in terms of investment. Term deposit is often seen as an outcome of bank market campaigns. Data mining and knowledge discovery processes often play interest role while analyzing and identifying hidden patterns and/or relationship in an enormous amount of data. Bank campaign data can be analyzed using data mining techniques and term deposit possibilities of clients may be determined beforehand. Knowing term deposit possibilities decided by clients at an early stage assist the bank sectors to look into the matter from different perspective to attract clients towards their term deposit schemes. Recommender systems are designed in an efficient way that analyses the significant dependencies between user and item-centric activity. The entity to which the recommendation is provided is referred to as the user, and the product being recommended is referred as an item. In this context, analyzing the dependencies between customer (user) and term-deposit (item) may be recommended to the bank. Hence likelihood of term deposit subscription prediction may help the bank to make informed decisions in order to attract more customers. Due to this dependency, we can consider it as recommender system.
A neural network [2] based framework is proposed in this paper for determining term deposit probabilities agreed by customers. To address the mentioned problem, neural network followed by 10-fold stratified cross validation methodology is implemented as recommender system. Strength of campaign results, customer loan history, job profile, marital status etc. are considered as influential factors while identifying whether customer will place term deposit or not. All these features are given as input to the recommended system implemented in this paper. This implemented classifier model is compared with other baseline models such as k-Nearest Neighbor (k-NN) [3], Decision tree classifier (DT) [4], and Multi-layer perceptron classifier (MLP) [5].

RELATED WORKS
Using SPSS Modeler, both classification and clustering models are established in [1]. In classification, boosted C5.0 model shows the best performance with highest accuracy. Next, clustering algorithms are applied to identify clients who have subscribed to a term deposit in order discover and understand customers' behaviours and characteristics, social and economic context attributes. Safia Abbas in [6] focused on improving the efficiency of the marketing campaigns and helping the decision makers by reducing the number of features, and predicting the deposit customer retention criteria based on potential predictive rules. By applying decision tree (DT) and rough set theory (RST) classification module predictive results are obtained. Experimental results conclude that because of feature reduction process, RST obtains a better summarization to the data set.
Three classification models such as Decision Trees, Naïve Bayes and Support Vector Machines are compared in [7] with respect to ROC and Lift curve analysis. Support Vector Machine obtained the best results. An analysis was applied to extract useful knowledge from this classification model [7]. A logistic regression model is constructed in [8] by considering relationship between success and other factors. The classifier model predicts the success of bank telemarketing to identify the top consumer set. To measure the effectiveness of prediction, some basic classification including Bayes, Support Vector Machine, Neural Network and Decision Tree are implemented and compared in this study. As a result, the prediction accuracy and the area under ROC curve prove the logistic regression model performs best in classifying than other models [8].
Moro et al. in [9] approached a concept of lifetime value (LTV) to improve the return and investment about bank marketing. Several parameters including recency, frequency and monetary value are considered for this purpose. The provided results in [9] are particularly useful for contact center companies with an improved predictive performance.
A comparative study is drawn in [10] among four models such as logistic regression, decision trees (DT), neural network (NN) and support vector machine in terms of two metrics AUC and ALIFT. Concluding study states that the NN presented the best results with AUC=0.8 and ALIFT=0.7. Knowledge extraction confirmed the obtained model as credible and valuable for telemarketing campaign managers.

PROPOSED METHODOLOGY
The objective of this study is to determine customer term deposit subscription behaviors in advance. In this context, supervised classification algorithms assist in establishing predictive model by learning and discovering the relationship between a set of feature variables and a target variable. The feature variables include dominant reasons such as customer's age, job profile, marital status, education field, taken personal loan or not, taken home loan or not, has credit details or not, contact details, related details with the last contact of current campaign in terms of day, month, contact duration, related details with contact details, number of days passed, outcome of previous campaign. The above factors are acquired for identifying customer term deposit subscription which is in turn the target variable of the classification. The framework implemented in this paper proceeds through following series of steps.

Dataset Used
In order to fulfill the objective of the study, Portugal bank marketing campaigns results are obtained from kaggle [11] as a collection of 45211 numbers of records and each of having 17 attributes. The attributes infer the related factors that affect campaign results. The target variable identifies whether a customer place term deposit or not. Hence a binary classification problem is addressed in this paper. Histogram representations of the attributes present in the dataset are provided in Fig. 1. Table 1 provides details of attributes present in the dataset in terms of types of attributes and usage of them.
Collected data are preprocessed and a multistep procedure is followed for obtaining a balanced dataset. Pre-processing techniques include missing values handling such as unknown values. In order to fit the data into classifier, nonnumeric data is transformed into numeric data. This will be followed by scaling values of every feature with large set of data points. Feature scaling will assist the classifier to work using normalized data with an enhanced efficiency. Once this feature scaling operation is performed, feature vector is fitted to classifier model as training purpose.

Methodology
Classification procedure is applied in this framework that is applied on the Portugal bank marketing campaigns results dataset in order to obtain term deposit subscription prediction in advance. Classification strategy is implemented by designing neural network model followed by 10-fold cross-validation structure. Neural network mimics human brain like operations in order to inferring complex problem solving approach. It recognizes underlying relationships in a set of data which the provision of necessary adaptation of changing input in order to generate the best possible result without altering the output criteria [12]. Neural network proposed in this paper is comprised of several neurons. Each of these neurons will accept necessary parameters and apply some activation functions in order to produce outputs. Activation functions [13] are useful to perform diverse computations and produce outputs within a certain range. In other words, activation function is a step that maps input signal into output signal. Among several types of activation function, sigmoid and relu are two popular activation functions. A brief description of the functions are discussed as follows- Sigmoid activation function [13] transforms input data in the range of 0 to 1 and it is shown in equation (1).
 ReLu activation function [13] is a faster learning Activation function which is the most successful and widely used function. It performs a threshold operation to each input element where values less than zero are set to zero whereas the values greater or equal to zeros kept as intact and it is shown in equation (2).
After configuring this neural model, training process is executed. The training process goes through one cycle known as an epoch where the dataset is partitioned into smaller sections. An iterative process is executed through a couple of batch size that considers subsections of training dataset for completing epoch execution. Age, duration(last contact duration, in seconds), campaign(number of contacts performed during this campaign and for this client), pdays(number of days that passed by after the client was last contacted from a previous campaign), previous(number of contacts performed before this campaign and for this client), Output Variable Categorical Attributes y (has the client subscribed a term deposit or not)

Implementation
While designing this model it is necessary to tune hyper-parameters in order to achieve maximized efficiency. This section describes specification of the model along with its hyper-parameters. This model consists of three Dense layers with 32,16,1 number of nodes respectively. In this context, sigmoid and relu activation functions are applied in each of these specified layer. The first two layers apply relu as activation function and the final layer applies sigmoid activation function.
Finally these aforementioned layers are compiled using adam solver [14] through 30 epochs and with a batch size of 10. Adjustment of the hyperparameters assists the model to obtain best predictive result. The neural network receives a total of 1,089 parameters and trains those parameters in order to obtain prediction. Components of the model in terms of layers, shape of output data from each layers, and number of parameters received in each layers are described in Fig. 1.
This implementation is followed by 10-fold crossvalidation method [15] for estimating the skill of the model. It is a resampling methodology where the dataset is partitioned into 10 groups and in each iteration one group is considered as the test data and the remaining nine folds are considered as training data. The above mentioned model is fitted into the training dataset and it is evaluated against the test dataset during each fold separately. Later evaluation scores for each of these iterations are accumulated and mean score is calculated. Rather than splitting data randomly using k-fold validation method, stratified k-fold mechanism [15] is employed in this framework. In stratified k-fold cross validation method, class distributions are managed in such a way that each fold approximately matches the proportion of all labels as the original data.

Other baseline classifier models
Classification is a supervised machine learning technique that analyses specified set of features and identifies data as belonging to a particular class. Different classification algorithms such as decision trees, K-nearest neighbour classifier are used to predict the target class. For these classifier models the pre-processed and transformed data are partitioned into training and testing dataset with the ratio of 7:3.
K nearest neighbour [3] is often considered as lazy learner which considers instances during classification process [16]. It is known as lazy learners because during training phase it just stores training samples. This identifies objects based on closest proximity of training examples in the feature space. The classifier considers k number of objects as the nearest object while determining the class. The main challenge of this classification technique relies on choosing the appropriate value of k [16] [3].
A Decision Tree (DT) [4] is a classifier that exemplifies the use of tree-like structure. It gains knowledge on classification. Each target class is denoted as a leaf node of DT and non-leaf nodes of DT are used as a decision node that indicates certain test. The outcomes of those tests are identified by either of the branches of that decision node. Classification results are obtained by starting from the beginning at the root this tree and going through it until a leaf node is reached [4] [16].
Multi-layer perceptron (MLP) [5] can be used as supervised classification tool by incorporating optimized training parameters. This classification algorithm is inclined to having neural network structure. MLP is often considered to be a feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs. For a given problem, the number of hidden layers in a multilayer perceptron and the number of nodes in each layer can differ. The decision of choosing the parameters depends on the training data and the network architecture [5].
These aforementioned classifiers are implemented in this framework with necessary parameter tuning. The decision tree classifier implemented in this paper uses Gini index while choosing objects from dataset. The nodes of the decision tree are expanded until all leaves are pure or until all leaves contain less than minimum number of samples. In this case, minimum number of samples is assigned a value as 2. The K-NN classifier gives a promising result for the value k=3 considering all the evaluating metric. The MLP classifier is implemented by incorporating hidden layers sizes 128, 64, 32,16,8 respectively.
The proposed neural network with 10-fold stratified cross validation methodology as well as decision tree, k-NN classifiers, MLP classifier are implemented and evaluated in terms of some pre-defined metrics. These metrics will support the comparison platform while inferring the best problem-solving approach.

Performance Measure Metrics
In order to justify performance skill of a model, it is necessary to employ metrics to estimate the evaluation. For this purpose, following metrics are taken into consideration in order to identify the best relevant problem-solving approach. Accuracy [17] is a metric that detects the ratio of true predictions over the total number of instances considered. Mathematically, accuracy can be defined as equation (1)  A model that exhibits lower MSE value and higher accuracy, f1-score result turns out to be the best problem-solving approach.  Table 3. This indicates that proposed method provides better performance with respect to other classifiers. Accuracy and MSE obtained in each fold with respect to training and testing dataset is shown in Fig. 2.

CONCLUSIONS
This study applies data mining techniques to forecast customers' term deposit subscription behaviours and comprehend customers' features to improve the effectiveness and accuracy of bank marketing. Neural network along with 10fold cross-validation methodology is implemented under a single platform that determines term deposit subscription behaviours. This implementation is incorporated with necessary parameter tuning as well as data oriented operations. However, this method is compared with baseline classifiers such as MLP, k-NN and DT. The comparative study concludes that implemented method indicates superior result with an accuracy of 88.32% and MSE 0.1168. Predictive results provided by the proposed method assist bank financial sectors to take informed decision in customer attraction process towards term deposit subscription.