Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Stacking Ensemble of Machine Learning Methods for Landslide Susceptibility Mapping in Zhangjiajie City, Hunan Province, China

Version 1 : Received: 24 March 2022 / Approved: 25 March 2022 / Online: 25 March 2022 (03:43:32 CET)

How to cite: Huan, Y.; Song, L.; Khan, U.; Zhang, B. Stacking Ensemble of Machine Learning Methods for Landslide Susceptibility Mapping in Zhangjiajie City, Hunan Province, China. Preprints 2022, 2022030337 (doi: 10.20944/preprints202203.0337.v1). Huan, Y.; Song, L.; Khan, U.; Zhang, B. Stacking Ensemble of Machine Learning Methods for Landslide Susceptibility Mapping in Zhangjiajie City, Hunan Province, China. Preprints 2022, 2022030337 (doi: 10.20944/preprints202203.0337.v1).

Abstract

The current study aims to apply and compare the performance of six machine learning algorithms, including three basic classifiers: random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGB), as well as their hybrid classifiers, using the logistic regression (LR) method (RF+LR, GBDT+LR, and XGB+LR), in order to map the landslide susceptibility of Zhangjiajie City, Hunan Province, China. First, a landslide inventory map was created with 206 historical landslide points and 412 non-landslide points, which was randomly divided into two datasets for model training (80%) and model testing (20%). Second, 15 landslide conditioning factors (i.e., altitude, slope, aspect, plane curvature, profile curvature, relief, roughness, rainfall, topographic wetness index (TWI), normalized difference vegetative index (NDVI), distance to roads, distance to rivers, land use/land cover (LULC), soil texture, and lithology) were initially selected to establish a landslide factor database. Thereafter, the multicollinearity test and information gain ratio (IGR) technique were applied to rank the importance of the factors. Subsequently, we used a series of metrics (e.g., accuracy, precision, recall, f-measure, area under the ROC (receiver operating characteristic) curve (AUC), kappa index, mean absolute error (MAE), and root mean square error (RMSE)) to evaluate the accuracy and performance of the six models. Based on the AUC values derived from the models, the GBDT+LR model with the highest AUC value (0.8168) was identified as the most efficient model for mapping landslide susceptibility, followed by the XGB+LR, XGB, RF+LR, GBDT, and RF models, which achieved AUC values of 0.8124, 0.8118, 0.8060, 0.7927, and 0.7883, respectively. The results from this study suggest that the stacking ensemble machine learning method is promising for use in landslide susceptibility mapping in the Zhangjiajie area and is capable of targeting the areas prone to landslides.

Keywords

landslide susceptibility; stacking ensemble; machine learning; random forest; gradient boosting decision tree; extreme gradient boosting

Subject

EARTH SCIENCES, Geoinformatics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.