A New stacking based ensemble Predictive Model to Predict the Mortality after heart surgery on a highly imbalanced dataset

: Nowadays, according to spectacular improvement in health care and biomedical level, a tremendous amount of data is recorded by hospitals. In addition, the most effective approach to reduce disease mortality is to diagnose it as soon as possible. As a result, data mining by applying machine learning in the field of diseases provides good opportunities to examine the hidden patterns of this collection. An exact forecast of the mortality after heart surgery will cause Successful medical treatment and fewer costs. This research wants to recommend a new stacking predictive model after utilizing the random forest feature importance method to foresee the mortality after heart surgery on a highly unbalanced dataset by using the most practical features. To solve the unbalanced data problem, a combination of the SVM-SMOTE over-sampling algorithm and the Edited-Nearest-Neighbor under sampling algorithm is used. This research compares the introduced model with some other machine learning classifiers to ensure efficiency through shuffle hold-out and 10-fold cross-validation strategies. In order to validate the performance of the implemented machine learning methods in this research, both shuffle hold-out, and 10-fold cross-validation results indicated that our model had the highest efficiency compared to the other models. Furthermore, the Friedman statistical test is applied to survey the differences between models. The result demonstrates that the introduced stacking model reached the most accurate predicting performance after Logistic Regression.


INTRODUCTION
Today, in medical knowledge, collecting an immense amount of data about different diseases is very important. Because it can lead to saving many lives and improving quality of living by discovering relations and hidden patterns among disease's features [1]. The healthcare industry is permanently generating large amounts of data, and there is a wide gap between data collection and data interpretation. Machine learning is a helpful tool that can benefit the industry from in-depth data analysis to the development of medical research and scientific decisions in the field of diagnosis and treatment. Diagnosis and determination of appropriate treatment for patients are of great importance in medical science. In addition to wasting time and money, choosing the wrong treatment for the patient, can also have detrimental effects and, in some cases, can even lead to the death of the patient. Therefore, it is compulsory to provide a model for finding a suitable treatment [2]. Machine learning is a powerful tool that uses previously recorded data as input to predict future incidents [3]. Risk prediction plays a critical role in a clinical prescription for patients undergoing heart surgery. heart surgery departments of hospitals are producing remarkable recorded data during a day, which should be utilized by data scientists to quantify the patient's health and foresee future incidents [4]. The most important parameter after heart surgery is mortality because earlier diagnosis will raise the probability of patient's survival [5]. For this reason, different machine learning methods have been extended during past decades to predict mortality and, scholars have made myriad endeavors to enhance mortality prediction accuracy [6].
In 1999 J. Martinez-Alario et al. investigated and assessed the performance of general severity systems and compared them with the Parsonnet score to predict mortality after cardiac surgery [5]. In 2021 Dimitris Bertsimas et al. used a Decision tree for its interpretability and accuracy to predict mortality, postoperative mechanical ventilatory support time (MVST), and hospital length of stay (LOS) for patients who underwent Congenital Heart Surgery. They compared it to Logistic regression, Random Forest and Gradient Boosting. It turned out that Optimal classification trees outperform across all three models by reporting area under the curve [7]. There is a measure called the Euro score for predicting the mortality and probability of heart failure. Although most of the time it causes an overestimation, it continues to be used in the United Kingdom as regards a lack of alternative validated models. Machine learning models can be used to improve this overestimation. So, the Euro score is considered one of the essential features in the data set [4]. As it mentioned, Mortality prediction is a critical topic after surgery because with this information, many efforts can be made to save lives. Xia et al. [8] recommended an Artificial Neural Network (ANN) approach using recorded information in the first two days of an ICU to foresee the mortality risk state.
In 2000 Thomas G.Dietterich did an experimental comparison among three models, including bagging, boosting, and randomization, to improve the c4.5 algorithm performance. Moreover, his research's outcome indicates that in conditions with little or no classification noise, randomization can compete with bagging but is not as accurate as boosting. In conditions with substantial classification noise, bagging has a better performance than boosting, and sometimes better than randomization. It motivated us to compare different ML models with our new model to enhance its' performance [9]. In addition to comparison, we wanted to know if machine learning can predict the mortality after heart surgery that Umberto Benedetto et al. have done it recently by the use of a neural network, random forest, naïve Bayes, and retrained LR based on features included in the EuroScore [4]. Also, some researches have been done by researchers about accurate diagnosis and effective treatment in heart disease with the application of ML. In 2017, Amin Poorieh et al. surveyed different ML methods and compared them, including Bayesian, KNN, SVM, Bagging, Boosting, and Stacking to predict heart disease [10]. Making better prediction accuracy of mortality can be crucial. Several researches have been done by scientists to prove that using ensemble methods can modify prediction outcomes. In 2017, Awad et al. [11] recommended an ensemble learning random forest(RF) and ended up that the proposed ensemble model has better prediction than other classification models. Ghose et al. [12] Moreover, Darabi et al. [13] reached similar outcomes by using the ensemble method to foresee the risk of mortality in ICU. These researches demonstrated that using a combination of classifiers can enhance prediction outcomes. Besides, according to Ghorbani and Ghosi's review paper [14] in the field of predictive models in medical diagnosis, it turned out that scholars had achieved better accuracy and prediction results while using ensemble approaches. Other machine learning classifiers also are refined and compared with various models. One of the most efficient ensemble methods, according to research that has been done by Dehkordi and Sajedi in 2017, to predict depression in older adults, is the Stacking technique. This article's outcome, indicated that the proposed stacking model, which is a combination of K-Nearest-Neighbor, logistic regression, support vector machine, and decision tree, had higher accuracy in comparison to each of them [15]. Furthermore, this research aimed to compare different single and ensemble classifiers with the proposed new stacking model as regards Rosaida Rosly's research that shows a comparison of different models based on 10-fold cross-validation such as three popular ensemble methods, which are boosting, bagging, and stacking for the combination and some single classifiers such as Naïve Bayes, Multi-Layer Perception and Decision Tree [16]. In 2019 Yoshihiko Raita et al. used some classification models, including Lasso regression, random forest, gradient boosted decision tree, and deep neural network to achieve clinical prediction outcomes, and then compared their performance [17]. Although the shortage of using ensemble models in predicting mortality after heart surgery should be considered, there are two other factors to improve prediction accuracy that should be managed, which are feature importance and solving unbalanced class distribution problems. As regards Muhammad Waqar et al. [18] used SMOTE to handle imbalanced data set problem for Heart Attack Prediction, which is a common problem especially in mortality prediction case, it is an efficient approach to handle this problem and is used in our research. Furthermore, our research shows that this problem can affect model performance [19]. Jale Bektas et al. worked on the classification of Real unbalanced Cardiovascular Data by Using Feature Selection and resampling Methods [20]. According to the importance of these factors, Roumani et al. [21] compared the performance of some prevalent machine learning approaches working on imbalanced data set problem. If the dataset has too many features, a highdimensional problem may appear. In this situation, there are some features that are not essential and influential enough, so dimensionality reduction is required, as Anna Karen Garate et al. [22], had done in their research for heart disease prediction by PCA and random forest. Mohammad Al Khaldy and Chandrasekhar Kambhampati focused on different feature selection methods and resampling imbalanced class for heart failure data set [23]. There is an apparent lack of using ensemble methods and handling imbalanced data problems. A combination of some classifiers can lead to improving prediction results and accuracy compared to a simple classifier. This paper makes an effort to proposed a new ensemble model using the stacking approach to develop a robust early mortality forecasting model while solving highly unbalanced data problems by a combination of two resampling strategies.
There are some innovations and important processes in this paper as compared to similar researches:  Predicting mortality after heart surgery by a new stacking model.  Proposing a new ensemble classifier using the stacking approach.  Solving the problem of highly imbalanced data set using a combination of SVM-SMOTE and Edited-Nearest-Neighbor approach.  Applying both simple validation measures and 10-fold cross-validation methods to implement the validation test.  Comparing the new stacking classifier with different single and ensemble machine learning classifiers.  Measuring the performance of the proposed stacking model with different evaluation criteria such as accuracy, area under the ROC curve, Recall, Precision, and F1-Score.  Using the Friedman test as a statistical test to analyze the differences among models and demonstrate the best one and validate the results.
The remnant of this article is as follows: section 2 explains all the data cleaning and preprocessing methods like handling imbalanced class problems and feature selection. Section 3 demonstrates the details about the new stacking ensemble model. In section 4, evaluation approaches are explained as a way to interpret the efficiency of the models. To show the acceptable performance of the recommended stacking model compared to other classifiers, Section 5 indicates the results and analysis. Eventually, section 6 shows the conclusions and proposes some directions for future researches.

MATERIAL & METHODS
Preprint does not need approval from a research ethics committee. A vital ingredient of the research, is problem perception. This research aims to develop an ensemble model using a stacking approach to predict mortality after heart surgery. It should be mentioned that all the coding process has been done in python, which is a constructive and beneficial language in the field of machine learning. In addition, all the practical experiments have been implemented 4 by a 2.40 GHz Intel Core i7 Lenovo with 16GB of RAM. The applied procedure to acquire the target of this research is portrayed in Figure 1. This procedure includes, data understanding, data preprocessing, feature selection, stacking model usage, analyzing results.

DATA SET INFORMATION
In this paper, the chosen dataset is gathered and recorded manually from the hospitals related to Shahid Beheshti University of Medical Sciences and Health Services in Iran between 2015 and 2020. This data set is recorded during, and after a heart surgery(recovery) process that patient was hospitalized. Accordingly, the features are considered constant during the entire study. This data set encompasses 1632 records and 46 attributes after the data cleaning step. The number of 45 features are used in mortality prediction, and the other one is the response variable which is named mortality.

DATA PREPROCESSING
Data preprocessing is a necessary part of data mining and machine learning. The learning efficiency of a classifier depends markedly on the quality of the dataset. Consequently, preparing data is a critical step before entering it into a classifier [24]. It should be considered that the introduced data set had some missing data that it is handled after the data cleaning process. In this paper, Random Forest feature importance is used for feature selection. There are some more essential features among different features in a high dimensional data set that can be determined by the feature selection process in the data preprocessing step. This essential concept affects the machine learning model's performance [25]. By This technique, important and influential features can be ascertained to be used in prediction by machine learning models. Besides, feature selection approaches can lead to dimension reduction of a high dimensional data set by dropping the less effective or noisy features so that the classifiers can be more efficient and precise. Feature selection techniques are divided into Wrapper and Filter techniques; which Wrapper method uses a classifier to select an optimal subset of features. In this paper, Random Forest feature importance is used according to its' popularity in medicine. It is a nonlinear technique that can be interpreted well. It provides feature importance measures by calculating the Gini importance [26]. In this paper, the selected data set had 45 features related to mortality prediction after heart surgery. After applying random forest feature importance, the importance of features is determined and listed in table 1. It should be mentioned that the first 18 features are given in the table because the other 27 features had less than 1 percent importance, and they had a slight effect in the performance of machine learning classifiers.
Recognizing the essential features can lead to dimension reduction, which can be helpful in mortality prediction. These 18 features predict Mortality status after heart surgery by different machine learning models and the new Stacking model.

IMBALANCED DATA PROBLEM
A balanced dataset is essential for creating an efficient training set. The unbalanced class distribution is a prevalent problem for real-world medical data and specifically mortality prediction.
There is a minority class in predicting mortality because one of the two classes is significantly underrepresented in the dataset [27]. Moreover, in this problem, the majority class highly dominates the minority class. Because of this reason, machine learning models tend to assign new observations to the majority class [28]. Consequently, this problem can cause poor performance of machine learning models in minority class prediction [29], [30]. It should be mentioned that the heart surgery dataset is markedly unbalanced. It comprises more samples from majority class that are survived patients (1583 cases of survival), while the minority class is too much smaller (only 49 cases of dead patients). According to this highly imbalanced dataset, it is imperative to handle this problem to help classifiers perform more efficiently and accurately. There are  [32], [33]. The synthetic minority Oversampling technique (SMOTE) is one of the most prevalent and helpful over-sampling methods which can help to solve the unbalanced data problem [34]. The synthetic minority Over-sampling technique (SMOTE) is one of the most prevalent and helpful over-sampling strategies which can help to solve the unbalanced data problem [34]. This method produces synthetic data according to the feature space similarities among the minority class samples [35]. SMOTE generates artificial minority class examples to balance the dataset [36]. This paper tries to solve unbalanced data problems using a combination of SVM-SMOTE over-sampling and Edited-Nearest-Neighbor under-sampling algorithms, which are extensively used in machine learning with imbalanced high-dimensional data that are increasingly used in medicine. SVM-SMOTE method has shown more acceptable performance than other resampling methods [37]. it helps predictive models to perform more accurately and reliably [38]. SMOTE resampling technique has different extended methods to handle imbalanced data problems. In 2020, Ghorbani and Ghousi compared them and the results of their research shows that the performance of SVM-SMOTE is better than others, and also, it can improve the performance of classifiers [39]. SVM-SMOTE concentrates on generating new minority class examples near borderlines with SVM to help establish boundaries among classes as regards data and density information which is crucial to synthesize minority classes. This paper combines the SVM-SMOTE with the Edited-Nearest-Neighbor method to prevail unbalanced data problems, and its' destructive influences on machine learning algorithms. It shows how a blend of resampling techniques, including over-sampling and under-sampling, can ameliorate the performance of a model [40].
K-Fold cross-validation technique is a helpful technique for data testing to evaluate the performance of classification models, and it determines how the statistical analysis outcomes are assigned to an independent dataset [41]. This paper uses random hold-out and shuffle 10-Fold cross-validation as two general forms of cross-validation. In the hold-out strategy, 75% of data is randomly assigned into the training set. The remaining 25% are assigned into the test set. Furthermore, in 10-Fold cross-validation, the original sample is randomly divided into ten equal size subsamples. Then a single subsample is kept as a test set, and the remaining subsamples are used as the training set. This method repeats this hold-out strategy ten times. In addition, the fact that all the resampling processes should be implemented on the training set is fundamental because otherwise, the synthetic observations can be seen by the classifier, and it can lead to overfitting and unreliable outcomes. It is genuinely compulsory that models only should be tested on unseen data. Therefore, the proposed combination of two resampling strategies including SVM-SMOTE and Edited-Nearest-neighbor is only used on the training set while using hold-out and 10fold cross-validation.

FEATURE SCALING
As most datasets in real-world contain markedly diverse features in sizes, units, and range, feature scaling comes up to standardize independent variables. This step is called normalization. This process should be done because the performance of many machine learning algorithms can be affected by feature scale diversity [42], [43]. Therefore, in this paper, features are rescaled; as a result, the entire used features have a standard normal distribution with µ=0 and σ=1 where µ is the average, and σ is the standard deviation. The formula of this type of rescaling is given below:

MACHINE LEARNING ALGORITHMS
Many machine learning algorithms have been developed to solve various classification problems in different fields of science [44], such as Logistic Regression [31], Naive Bayes [45], Decision Tree [46], [47], Random Forest [48], K-Nearest-Neighbor [49], Gradient Boosting(XG-boost) [50], [51], Artificial Neural Network [52] and etc. These algorithms are applied after preprocessing process when the dataset is ready to be learned by the computer. In this paper, some of them are used and compared with the Stacking model. All of these machine learning models are given in Table 2. with their efficient parameters.

NEW STACKING MODEL
This research proposed a new combination of models based on the stacking algorithm, which uses a metamodel to combine predictions from contributing members. This new model is a combination of five single and ensemble classifiers, developed to improve the predicting performance of mortality after hurt surgery. Stacking is an efficient ensemble technique because it can balance bias and variance to reduce the total error and achieve better prediction. Ensemble machine learning classifiers aim is to reduce bias and variance of classifiers to build a robust classifier that can achieve better performance. In the stacking method, the output of the base classifiers (Level 0) will be used as training data for another classifier named Meta-classifier (Level1) to estimate the same target function. The target of stacking is combining various strong sets of models. In this paper, the base classifiers of the proposed stacking model are Logistic Regression, Bagging, Decision Tree, Random Forest, and XG-Boost. The Meta-classifier is Logistic Regression which is selected according to their better performance rather than other classifiers. Although the original classifiers predict the minority class weakly, they work efficiently for some parts of the data. So, each model works as a booster to sharpen the efficiency of the ensemble model [53]. In the stacking ensemble approach, Meta-learner tries to discover how best to combine the output of the base learners. Figure 2 demonstrates the levels and outputs of the proposed stacking model. Figure 2. proposed Stacking ensemble model.
As it is seen, this paper links two simple classifiers, including Logistic Regression and Decision Tree, as the best single classifiers with Random Forest, XG-Boost, and Bagging as the three best ensemble classifiers to build a powerful ensemble machine learning model.

EVALUATION METHODS
The evaluating step of machine learning algorithms is an inseparable ingredient of applying classification models. Various measures can be used to assess the performance and validation of machine learning models. In this paper, Accuracy, Area under the ROC curve (AUC), Recall, Precision, and F1-Score are used as metric measurement systems. Besides, a statistical significance test is implemented to survey the differences among different models.

STATISTICAL EVALUATION
Evaluation methods are the tools for comparing different classification methods. In this research, according to the Shapiro normality test, which is used to survey the normality of data, it turns out that the data does not have a normal distribution. Because the p-value of this test is less than α (α=0.05), and the null hypothesis (data is normal) will be rejected. Therefore, in this case, the Friedman test, which is a nonparametric equivalent of the repeated-measures ANOVA, can be used to survey the differences among machine learning models [54]. The null hypothesis of this test is that the performance of all models is similar and rejection of it demonstrates that one or more of the paired Classifiers perform differently. This paper uses the accuracy obtained by 10-fold cross-validation. The Freidman test ranks the data of each fold together; then there are some values of ranks for each model [55]. As a result, this test provides a sum of ranks for each model that can lead to detecting the most effective classifiers among others.

MACHINE LEARNING MODEL RESULTS
There are various classification models. In this paper, the most common and useful models are compared with the proposed stacking ensemble model. Most of the assessment criteria indicate that the Stacking model had better performance and improved the ability of mortality prediction after heart surgery. Model validation procedures used in this research are based on the random hold-out and shuffle 10-fold cross-validation techniques. Table 3    differences among the model performances, Figure 3 is prepared to compare the test accuracy results.

HOLD-OUT METHOD RESULTS
In this paper, the hold-out method assigns 75% of the dataset to the training set. The remnant 25% of the data is assigned to the test set. In addition, as mentioned, a combination of SVM-SMOTE as an over-sample strategy and Edited-Nearest-Neighbor as an under-sample strategy is introduced to solve the highly unbalanced data problem in this case. Accuracy is the most prevalent evaluation criterion to show every classifier's performance. Although it is easy to understand, it is not enough to correctly judge the model because many critical factors should be noticed in evaluating the performance of a model. According to Table 4 results, it turns out that the proposed stacking model performs well based on test set accuracy and has improved the accuracy compared to other models. Furthermore, the introduced combination of resampling methods solved the unbalanced data problem efficiently, so the accuracy is more reliable. Because it works with ones and zeros, so there is uncertainty about quantifying a probability value. AUC is a crucial evaluation metric that indicates how much a model is efficient in distinguishing between the two classes. Therefore, the higher AUC reveals the better capability of this issue. Stacking model with 93.10% of the AUC metric percentage is the best classifier among the other models. This result means that the introduced stacking model has the ability to distinguish between survival and dead patients with a 93.10% of chance which is better than other classifiers, even Random Forest and Logistic Regression. Figure 4 shows a multiple ROC curve which indicates the mentioned outperforming of the stacking model compared to base and meta classifiers. average correctly, and when it ascertains a patient as survival or dead, it is correct 86.5% of the time on average. It should be mentioned that the macro average is used to report the models' performance results because when the dataset was imbalanced, models did not predict well the minority class. Therefore, macro average is used, which results in a bigger penalization and can reveal more fair judgement. The F1-score metric is a measure that can make a more fairly Analysis and comparison among the models by the Recall and the Precision since it is the harmonic average of both metrics. It ascertains how accurate and authoritative is the prediction of the model. In this paper, the macro average of the introduced stacking model achieved the most F1score among the used models. These results indicate the desirable efficiency of the stacking model. Using Random Forest feature importance shows the effect and importance of each feature in predicting mortality and survival of patients. The results demonstrate that it is not necessary to record 27 other medical features, and time can be optimized. The maximum amount of Cardiac rehabilitation (Max-CR), Euro score, and Day-Max-CR are the most essential features. So, it turns out the Euro score as a criterion of heart attack probability and cardiac rehabilitation should be more noticed. As was expected, Age is vital to determine the chance of survival or death after heart surgery.
The information about these features is provided in Table1.

K-FOLD CROSS VALIDATION METHOD RESULTS
k-fold cross-validation is another method of assessing validation [57]. Shuffle 10-fold cross-validation is used in this research, which splits the dataset into ten subsets. The results of applied 10-fold cross-validation results using machine learning models are shown in Table 4, which indicates the obtained accuracy of each fold by different models. According to results shown in Table 4, the introduced stacking model achieved the highest average of 10 folds accuracy with low variance, 96.97%, and it had a slight difference with the hold-out strategy, which means its' accuracy is reliable, and the model performed acceptably. The accuracy resulted from 1st fold, 96.88% was the lowest. The highest accuracy is related to the third fold, 97.03%. Random Forest model achieved the secondhighest accuracy average, 96.87, and according to the whole assessments and validations results, it is the latter most well-performed model compared to others. It is better to be mentioned that the proposed stacking model was the best in almost all validation and assessing metrics. It achieved the best rank in all of them except in Recall that achieved the second stage after the LR model. In addition, Figure 5 shows the comparison among machine learning classifiers while using 10-fold cross-validation and holdout strategies. The difference between Naïve Bayes results by using the two strategies indicates this model's hold-out accuracy is not reliable, and it may not perform as well as it seemed. Other models' 10fold cv accuracy emphasize their hold-out accuracy, especially stacking, Random Forest, and Logistic Regression models. Cross-validation is often the preferred method because it gives a better demonstration of how well the model will perform on unseen data.

STATISTICAL TEST RESULTS
Statistical significance tests can be used as one of the valuable ways to select the best model. The first assumption of the ANOVA test is that the 10-fold cv samples are drawn from a normal distribution. This paper's assumption of the ANOVA test is violated because the Shapiro statistical normality test reached a p-value=0, which is less than α=0.05. Therefore, the null hypothesis which is the normality of data is rejected, and the ANOVA test can't be used. Table 5 represents the result of the Shapiro test. The Friedman test is suitable to compare machine learning classifiers in this case. Table 6 indicates the results of this test. The null hypothesis of this test is rejected because the p-value=0 is less than the significance level of 0.05. It means that at least one Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 September 2021 doi:10.20944/preprints202109.0181.v1  of the classifiers performs differently. Table 7 demonstrates the median and sum of ranks achieved by the Friedman test. The median indicates the midpoint value, which is the point where half the data points are above it and the other half is the other way round. In addition, the median of all data points which is shown as the overall median. Additionally, the median response for the stacking  model is higher than the overall median. The result of the sum of Ranks shows that the stacking model is better than all classifiers except Logistic Regression.

CONCLUSION
Nowadays, a vast amount of data is collected and produced by the development of health care systems and biomedical equipment. Data processing and releasing valuable patterns and information is a way to save many lives. Predicting mortality after heart surgery is the essential concepts in medical data mining. An accurate prediction of the mortality status of patients waiting for heart surgery can give helpful information to save lives and lessen costs and time; so, it is compulsory to utilize it in patients as soon as possible. This paper tries to propose a stacking ensemble model, which uses features selected by Random Forest feature importance technique to provide an early mortality prediction while solving the unbalanced data problem by using a combination of SVM-SMOTE over-sampling strategy and Edited-Nearest-Neighbor under-sampling technique. The random hold-out and shuffle 10-fold cross-validation are used as two validation procedures to assess the machine learning models' stability and performance. Moreover, a statistical test called the Friedman test is used as another measure of performance. After solving the highly unbalanced data problem, wellknown models and the introduced stacking ensemble model are applied using the random hold-out method on balanced data. According to the assessment outcomes, the introduced stacking model has acceptable performance and outperforms all other models using various assessment metrics. It should be mentioned that Random Forest and Logistic Regression have an acceptable performance as well.
The evaluation results of machine learning models obtained using the shuffle 10-fold cross-validation method indicate similar results to choose the best model. The introduced stacking ensemble model achieved higher accuracy than other models with a low figure of acceptable variance. Using the stacking model, which is a combination of some single and ensemble classifiers, improved the mortality prediction ability after heart surgery according to different validation and assessment techniques. The stacking model achieved the best performance in all the measures except Recall. In Recall, it achieved the second stage. Therefore, it seems that the new model had the best performance. It should be considered that RF and LR were the two other efficient models after the stacking model. Many ways are existed to refine and sharpen this research. Future researches can be done to make more powerful models. Also, other combinations of resampling techniques can be implemented to handle imbalanced data problems.