Preprint
Article

This version is not peer-reviewed.

Comparison of Liquefaction Potential Prediction Results Using Machine Learning Methods Based on CPT Data

Submitted:

13 April 2026

Posted:

14 April 2026

You are already at the latest version

Abstract
Soil liquefaction is a significant geotechnical hazard that can lead to severe structural damage during seismic events. Traditional liquefaction assessment methods, such as those based on the Standard Penetration Test (SPT) and Cone Penetration Test (CPT), rely on empirical correlations but often struggle to capture the complex, nonlinear interactions between soil properties and seismic parameters. Recent advancements in machine learning (ML) offer data-driven approaches that can improve liquefaction prediction accuracy. This study evaluates and compares the performance of Random Forest (RF) and Artificial Neural Networks (ANNs) for liquefaction potential prediction using a dataset containing 480 field observations derived from CPT-based studies. The dataset was preprocessed using min-max normalization, and models were trained and optimized through hyperparameter tuning. Model performance was assessed using accuracy, precision, recall, F-measure, Cohen’s kappa, and AUC-ROC analysis. The results show that RF achieved the highest accuracy (89%), outperforming both ANN (86%) and the traditional CPT-based liquefaction assessment method (87%). Additionally, ROC-AUC values of 0.932 for RF and 0.872 for ANN indicate the superior classification capability of machine learning models. Feature importance analysis in RF revealed that cone tip resistance (qc), cyclic stress ratio (CSR), and peak ground acceleration (amax) are the most influential factors in liquefaction prediction. These findings demonstrate that machine learning techniques, particularly RF, provide more reliable liquefaction predictions compared to conventional empirical methods. The study highlights the potential of ML models in improving seismic risk assessments and guiding engineering decision-making processes.
Keywords: 
;  ;  ;  ;  

1. Introduction

Soil liquefaction is an earthquake phenomenon in which saturated granular soils lose their strength and stiffness under seismic loading and behave more like a viscous fluid than a solid medium. This transformation, recognized by the 1964 Niigata and Great Alaska events, has been observed and reported repeatedly in many earthquakes for about half a century. After the awareness of liquefaction, a significant portion of earthquake damage has been attributed to soil liquefaction, and many studies and procedures have been included in the literature to estimate liquefaction potential and prevent liquefaction hazards [1,2,3,4].
At the core of liquefaction lies the mechanism of cyclic shearing under seismic forces. As strong ground shaking occurs, cyclic loading causes a progressive buildup of pore water pressure in the saturated soils. When pore pressure nearly equals the initial confining stress, the effective stress-and hence the soil’s shear strength–drops dramatically. While loose sands often experience complete loss of strength and significant deformations, medium-density sands, silty sands, and sandy silts may not fail entirely but still undergo considerable softening and cyclic strain [5,6,7].
Traditional liquefaction potential assessments are semi-empirical methods based on both field tests and past liquefaction observations due to the difficulty of taking undisturbed samples from frequently liquefaction-occurred loose sandy soils. These methods rely on key parameters such as soil properties (e.g., relative density and gradation), environmental conditions (e.g., groundwater depth), and earthquake (e.g., peak ground acceleration) to classify a site as susceptible or resistant to liquefaction. One of the seminal approaches in this area is based on the standard penetration test (SPT) or cone penetration test (CPT) combined with the computation of the cyclic stress ratio (CSR), which utilizes parameters like peak ground acceleration to quantify the potential for liquefaction [6,8,9].
Despite their widespread use and proven track record, traditional methods face inherent challenges. In particular, the influence of various in situ state parameters-such as fines content, plasticity index, cementation, and overconsolidation-on q c and V s is complex and not fully captured by deterministic approaches. For example, while q c may decrease with increasing fines content and increase with cementation and overconsolidation, V s tends to behave inversely when fines content exceeds a critical threshold. Such complexities underscore the limitations of conventional methods in accurately modeling the multifaceted nature of soil behavior under seismic loading [9].
Although these empirical techniques have formed the backbone of liquefaction hazard assessment for decades, in recent years, machine learning (ML) methods have emerged as transformative tools in civil and geotechnical engineering, offering innovative solutions to complex problems that traditional techniques often struggle to address. One of the key advantages of machine learning techniques over conventional empirical methods is their ability to model nonlinear and multidimensional interactions without the need for a predetermined mathematical form between input and output variables. Traditional approaches typically rely on simplified assumptions and empirical correlations that may overlook the intricate interplay among various soil properties, environmental conditions, and earthquake characteristics. In contrast, ML algorithms can flexibly adapt to the complex behavior of soils under seismic loading, capturing subtle variations that influence liquefaction potential and thereby enhancing prediction accuracy [10] ML methods can automatically learn from large datasets-such as those derived from standard penetration tests (SPT), cone penetration tests (CPT), and shear wave velocity ( V s ) measurements-by adjusting internal parameters to detect underlying patterns and relationships in the data [5].
Moreover, the adaptive learning process inherent in ML techniques offers a dynamic framework that evolves with emerging research and changing site conditions. This continuous updating capacity makes ML-based approaches particularly well-suited for capturing the transient and variable nature of soil behavior under seismic loading, ultimately leading to more resilient and cost-effective engineering designs. Consequently, machine learning is paving the way for a new era of innovation in geotechnical engineering by overcoming traditional limitations through advanced data analytics and computational intelligence [10,11].

2. Materials and Methods

2.1. Dataset

Within the scope of the study, the dataset is prepared for loose sandy soils that is known as susceptible to liquefaction, soils such as gravel, clay, unsaturated soils are excluded. Dataset used in the analyses consists of both liquefied and non-liquefied cases due to earthquakes [12,13,14]. The selected sources, which reported observed liquefaction and non-liquefaction cases for the 22 earthquake events, are listed in Table 1. The number of cases reported as liquefied and non-liquefied for each earthquake event is given in Figure 1. The total number of data points is 480, and proportions of observed liquefaction and non-liquefaction cases are 66% (316 cases) and 34% (164 cases), respectively, as shown in Figure 2.
Each data point in the created dataset contains nine quantities, which are earthquake moment magnitude ( M w ), peak ground acceleration ( a m a x ), measured depth of q c and f s ( d ), cone tip resistance ( q c ), cone friction resistance ( f s ), cyclic stress ratio (CSR), soil behavior type index ( I c ), total stress ( σ v ) and effective stress ( σ v ),. M w , a m a x , d , q c , f s , σ v , σ v quantities are determined by measurement, and CSR, I c quantities are determined depending on the measured quantities.
Researchers conduct studies in cases where liquefaction is observed as a result of earthquakes, and they provide valuable contributions to the literature by reporting their results. Some of these studies added to the literature may include laboratory study results such as dynamic triaxial, monotonic triaxial, consistency limit testing, and sieve analyses. Even simple tests such as sieve analysis and consistency limits require laboratory work. Studies based on field data results can be completed faster and more economically. Therefore, some studies in the literature may not contain laboratory results. Thus, approximately 40% of the studies evaluated within the scope of this study do not have fine material data [6,9,10,11,14,15,16]. Field tests are performed as a standard practice in studies about cases of liquefaction after an earthquake. Therefore, almost all of the case studies evaluated within the scope of this study have field test data. The number of data points is one of the major factors that governs the accuracy and reliability of the machine learning methods [17]. For this reason, the high number of input data in machine learning methods was prioritized, in this study. For this purpose, it has been preferred to use the soil type ( I c ) parameter instead of fine material content (FC) parameter that may be missing in some studies. I c is a parameter that indicates the soil type (gravel, sand, silt, silty sand, etc.), and it can be determined based on CPT test results for all data points [3].
The following parameters earthquake moment magnitude ( M w ), peak ground acceleration ( a m a x ), depth ( d ), cone tip resistance ( q c ), cone friction resistance ( f s ), total stress ( σ v ), effective stress ( σ v ), cyclic stress ratio (CSR), soil behavior type index ( I c ) were evaluated as input parameters of the prediction models. The liquefaction status in the database, which is evaluated as the output parameter, is considered as a binary classification problem to separating patterns between samples where liquefaction occurs (positive) and those where liquefaction does not occur (negative).The histogram and scatter plots of the input parameters used in the study are given in Figure 3.
In machine learning models, variables with different scales can negatively impact model performance. Specifically, some algorithms may be biased toward variables with larger values, causing the model to become unbalanced. To prevent this issue, a normalization process is applied during the data preprocessing stage. In this study, the min-max normalization method given in Equation 1 was used to scale the variables to a certain range. Min-max normalization ensures that all values fall between 0 and 1, allowing variables of different magnitudes to contribute equally.
X n o r m a l i z e d = X X m i n X m a x X m i n
Here X represents the original value, X m i n represents the lowest value in the dataset, and X m a x represents the highest value [18,19,20,21].
As part of this study, min-max normalization was applied to all independent variables used for liquefaction prediction. This process helped balance the effect of variables with different scales, increased the learning efficiency of the model, and minimized errors due to differences in scale between variables [7,17,22]. Traditional liquefaction potential prediction method [3].
Liquefaction potential has been computed for several decades by evaluating field tests such as SPT, CPT and shear wave velocity tests [23]. Cyclic stress ratio (CSR) is calculated using Equation 2, was proposed by Seed and Idriss [24], and it is a worldwide accepted and frequently preferred index for liquefaction computations. The cyclic resistance ratio (CRR) can be computed is calculated using the CPT data [3].
C S R = 0.65 a m a x g σ v 0 σ v 0 r d
Here a m a x is maximum horizontal surface ground acceleration, g is ground acceleration, σ v 0 is total vertical stress, σ v 0 is effective vertical stress and r d is stress reduction coefficient.
The Stress Reduction Coefficient r d , which is included in Equation 2 and allows the flexibility of the soil profile to be considered, is calculated by Equation 3. The z value in the equation indicates the depth (m).
r d = ( 1 0.4113 z 0.5 + 0.04052 z + 0.001753 z 1.5 ) ( 1 0.4177 z 0.5 + 0.05729 z 0.006205 z 1.5 + 0.001210 z 2 )
The curve separating the areas where liquefaction is observed/not observed, shown in Figure 4, which was presented by combining the studies in the literature to be used for estimating CRR using CPT tip resistance, is approximately given by Equation 10. This graph is known to provide more valid results for clean sands [3], so the values used in the calculations need to be converted to values suitable for clean sands.
The soil behavior table given in Figure 5 was developed using the appropriate value of n = 1.0 for clayey soil types. However, a value of n = 0.5 would be appropriate for clean sands, and a value between 0.5 and 1.0 for silts and sandy silts [3]. The soil behavior type boundaries shown in Figure 5 were determined according to the I c value ranges given in Table 2. The soil behavior type index I c is not valid for zones 1, 8 or 9. Throughout the normally consolidated zone, the soil behavior type index increases in direct proportion to the fines content and plasticity. This relationship is also shown in Figure 6.
The Cyclic Stress Ratio (CRR) is calculated by following the process steps explained below and summarized in the flow chart given in Figure 8 [3].
Firstly, the soil types defined as clay are distinguished from sand and silt. This distinction is calculated by assuming n = 1.0 in the equation given by Equation 4 and calculating the Q value.
Q = q c σ v o / P a P a / σ v o n
Here, Q is the dimensionless CPT end resistance, q c is the CPT end resistance, σ v o is the vertical total stress, P a is the air pressure (100 kPa = 1 atm), σ v o is the vertical effective stress, and n is a value ranging from 0.5 to 1.0 representing the soil type.
F = f s / q c σ v o × 100 %
The soil behavior type index I c given by Equation 6 is calculated using the Q value calculated by Equation 4 and the F value calculated by Equation 5 . Here, F is the dimensionless friction ratio and f s is the CPT friction resistance.
I c = 3.47 log Q 2 + 1.22 + log F 2 0.5
If the calculated I c value is greater than 2.6, the soil is classified as clayey, and the analysis is completed by assuming that it is too rich in clay to liquefy. However, soil samples should be taken and tested with laboratory tests to verify the soil type and liquefaction resistance. If the calculated I c value is less than 2.6, the soil is classified as granular and therefore, Q and I c values are recalculated by accepting n = 0.5 in the equation given by Equation 4.
If the recalculated I c value is less than 2.6, the soil is classified as non-plastic and granular. This I c value is used to estimate the liquefaction resistance in the following steps. If the recalculated I c value is greater than 2.6, the soil is very silty and probably plastic. In this case, the normalized CPT tip resistance q c 1 N value for silty sands given by Equation 7 should be recalculated by taking n = 0.7 .
q c 1 N = q c P a P a σ v o n
The I c value is then recalculated using the recalculated q c 1 N value instead of the Q value in Equation 6. This I c value is then used to calculate the liquefaction resistance.
For silty sands, the normalized CPT tip resistance q c 1 N is corrected to the equivalent clean sand value q c 1 N c s using Equation 8.
q c 1 N c s = K c × q c 1 N
Here K c is the correction factor for the grain characteristics calculated by Equation 9.
i f I c 1.64 , K c = 1.0 i f I c > 1.64 , K c = 0.403 I c 4 + 5.581 I c 3 21.630 I c 2 + 33.750 I c 17.880
The K c curve given by Equation 9 is shown in Figure 7. For I c > 2.6 , the curve shown as a dashed line indicates that the soils in this I c range must have a high clay content or be plastic to liquefy.
The obtained q c 1 N c s value is used in the relationship given by Equation 10 and as a result of calculating the C R R 7.5 value, the cyclic resistance ratio (CRR) required for the liquefaction of the soil and the cyclic stress ratio (CSR) created by the earthquake are compared with each other and the liquefaction potential can be estimated with the ratio obtained. As a result of these calculations, it is concluded that there is a risk of liquefaction in the layer if the FS safety factor is less than 1.0. Here q c 1 N c s is the clean sand CPT tip resistance normalized to 100 kPa (1 atm).
i f q c 1 N c s < 50 , C R R 7.5 = 0.833 q c 1 N c s / 1000 + 0.05 i f 50 q c 1 N c s < 160 , C R R 7.5 = 93 q c 1 N c s / 1000 3 + 0.08

2.2. Machine Learning Methods

Liquefaction, defined as the loss of soil strength during earthquakes, poses a significant risk to engineering structures. While traditional empirical and experimental methods can be used for liquefaction prediction, machine learning methods have recently emerged as a powerful alternative in this field.
Machine learning enables the development of data-driven models that can learn patterns from large datasets to make predictions. Various machine learning algorithms, such as logistic regression, decision trees, support vector machines (SVM), random forest, and artificial neural networks (ANN), have been utilized for liquefaction prediction.
Machine learning-based approaches can predict liquefaction potential by utilizing different variables, including soil properties (e.g., grain size, void ratio), groundwater level, and seismic parameters (e.g., acceleration, magnitude). The advantages of these methods include:
More flexible and faster than traditional methods.
Capable of making high-accuracy predictions with large datasets.
Easily updatable with new data.
In this study, different models were created using Random Forest (RF) and Artificial Neural Networks (ANN) methods for liquefaction prediction and their performances were evaluated and compared with the traditional method. The data set used in the modeling includes field observations with and without liquefaction, and the algorithms are aimed to learn these patterns and make accurate predictions. It is thought that machine learning-based predictions can be a useful tool in engineering decision-making processes.

2.2.1. Random Forest (RF)

The Random Forest method was developed by Leo Breiman in 2001 [25]. While developing the Random Forest method, Breiman was inspired by several previous studies and combined the ideas in these studies to create a new approach. As Breiman stated in his article [25], this method is based on the combination of two main concepts: Bagging (Bootstrap Aggregating) [26] and Random Subspace Method [27].
Random Forest is a machine learning algorithm used for both classification and regression problems that works by combining multiple decision trees. It is based on the basic principle that individual trees may be weak, but when they work together, they can form a strong predictive model.
One of the significant differences between decision trees and random forest methods is the management and reduction of overfitting problem. Overfitting problem is the situation where the model makes predictions that are very close to the learning data, in other words, it memorizes the problem. In this case, the model can make very high accuracy predictions with the learning data, but it can make very low accuracy predictions with the test data. While this problem is evident in the decision tree method, it is controlled in the Random Forest method with the techniques of “Bagging” [26] and “Random Subspace” [27]. These techniques reduce the variance of random forests, increase the generalization ability of the model, and reduce the risk of overfitting. In this way, the random forest method generally provides the creation of high-performance and reliable models.

2.2.1.1. Bagging (Bootstrap Aggregating) [26]

Bagging is an ensemble learning technique used particularly in statistical learning. (Figure 9)
The main purpose of Bagging is to increase generalization power and reduce overfitting by combining many independent models. In this context, Bagging consists of Bootstrap Sampling, Training of Independent Models and Aggregating Model Results steps.
Bootstrap Sampling is the process of creating a new training data set by taking random samples from the original training data. The feature of this sampling is that each sample is selected by replacing the place in the training set. Therefore, the same data point can be selected more than once.
Training of Independent Models is the process of creating a model that is trained independently for each Bootstrap Sampling and each learns on a different data set.
Combining Model Results is the process of combining the predictions of each model after the training phase is completed. For classification problems, the class that receives the most votes from the class predictions made by each model is determined, and for regression problems, the final prediction is made by calculating the arithmetic average of the predictions of all models.

2.2.1.2. Random Subspace Method [27]

The Random Subspace Method is also used to reduce over-learning and increase model generalization power, like the Begging Technique. This method ensures that each model is trained by selecting a random subset from all available features (input parameters). The main differences between the two methods are shown in Table 3.

2.2.1.3. Structure of Random Forest Method

Instead of branching each node using the best branch among all variables, Random Forest branches each node using the best branch among randomly selected variables at each node (Figure 10). Each dataset is generated from the original dataset with replacement. Then, trees are developed using random feature (input parameter) selection. In simple terms, Random Forest creates multiple decision trees and combines them to obtain a more accurate and stable prediction.
Decision trees consist of three main structures: root, decision node and leaf (Figure 11).
Root is the top node of the decision tree and is where the analysis of the entire data set begins. Each branch of the decision tree is assumed to start from this root.
Decision Node is the node where a decision rule is applied and branches. Decision nodes divide the data set by means of a feature and a threshold value determined for this feature. (For example, a decision node can check whether a feature value is greater or less than a certain threshold value. This divides the data set into two subsets.)
Leafs are the last nodes of the decision tree and contain the results. A class label (for classification problems) or a prediction value (for regression problems) is found in the leaf nodes. Leaves are where a branch ends, and the results are obtained. In other words, a leaf node is reached at the end of each branch.

2.2.1.4. Splitting Criteria

Splitting criteria are a crucial step in the construction of decision trees. These crises decide which feature and which equality persists at a node. The goal is to maximize the information gain at each node by splitting the dataset into purer (homogeneous) subgroups. Different categories can apply different splitting criteria. The most used splitting criteria are:
Gini impurity is a metric that measures the mixing of classes in a node and is calculated by Equation 11. Here C is the number of classes and p i is the proportion of observations belonging to class i . Gini impurity is widely used by decision trees (e.g. CART - Classification and Regression Trees). When a node is pure (i.e. contains a single class), its Gini value is 0.
G i n i = 1 i = 1 C p i 2
Information gain measures the difference in entropy before and after the split. Entropy is used to measure the confusion or uncertainty in the data set (Equation 12 and Equation 13). Here D is the dataset, D k are the subsets after the split. Information gain measures the difference in entropy before and after the split. Entropy is used to measure the confusion or uncertainty in the dataset.
E n t r o p y = i = 1 C p i · log 2 p i
I n f o r m a t i o n   G a i n = E n t r o p y D k D k D · E n t r o p y D k
The information gain can be biased for features (input parameters) with many classes. For example, if a feature has a different value for each observation, the information gain can be maximum. The information gain ratio normalizes the information gain by dividing it by the partition entropy to correct for such biases (Equation 14). This method is widely used in the C4.5 algorithm.
I n f o r m a t i o n   G a i n   R a t i o = I n f o r m a t i o n   G a i n E n t r o p y s p l i t
In this study, since the Random Forest method, which is a decision tree method, was preferred, the Gini Impurity method was used as the division criterion applied at the nodes.

3. Results

3.1. Hyperparameter Optimization

Hyperparameters are predefined settings that directly affect the learning process and overall accuracy of the model. Wrongly selected hyperparameters can lead to problems such as overfitting or underfitting, causing the model to perform poorly on real-world data. Therefore, determining the optimal hyperparameters is a critical step to increase the generalization ability of the model and maximize its accuracy.
In this study, hyperparameter optimization was applied to improve the performance of RF and ANN models. The optimized hyperparameters used in the RF model are shown in Table 4, and the optimized hyperparameters used in the ANN model are shown in Table 5. This optimization process increased the generalization ability of the models and provided more reliable results in liquefaction prediction.

3.2. Correct and Incorrect Estimated Values

True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN) concepts are the basic parameters used when evaluating the prediction results of a model in classification problems. These concepts, summarized in Table 6. , are calculated because of comparing the results predicted by the model with the actual results.
Classification models have four key outcomes. A True Positive (TP) occurs when the model correctly identifies a positive class as positive, accurately recognizing the presence of a condition. Conversely, a False Positive (FP) happens when the model incorrectly predicts a negative class as positive, mistakenly indicating the presence of a condition when it is absent. Similarly, a True Negative (TN) is observed when the model accurately classifies a negative class as negative, correctly acknowledging the absence of a condition. Lastly, a False Negative (FN) takes place when the model incorrectly predicts a positive class as negative, thereby failing to detect a condition that is present.

3.3. Performance Metrics

The accuracy and effectiveness of predict results need to be measured with various metrics.

3.3.1. Recall

Recall expresses the rate at which the model correctly predicts positive classes (Equation 15). It is also known as sensitivity.
R e c a l l = T P T P + F N

3.3.2. Precision

Precision measures how accurate the positive predictions of the model are. It is expressed as the ratio of true positive predictions to all positive predictions (Equation 16).
P r e c i s i o n = T P T P + F P

3.3.3. Specificity

Specificity refers to the rate at which the model correctly predicts negative classes as negative (Equation 17).
S p e c i f i c i t y = T N T N + F P

3.3.4. F-Measure

It represents the harmonic mean of precision and recall (Equation 18). It is preferred especially in cases where the positive class is more important.
F m e a s u r e = 2 · P r e c i s i o n · R e c a l l P r e c i s i o n + R e c a l l

3.3.5. Accuracy

The proportion of classes that the model predicts correctly (Equation 19) is defined as accuracy.
A c c u r a c y = T P + T N T P + T N + F P + F N

3.3.6. Cohen's Kappa

Cohen's Kappa compares the classification performance of the model to the probability of making a random guess. It eliminates the effect of random guess. It is calculated by Equation 20. Here P o is defined as the observed accuracy rate ( ( T P + T N ) / S u m ), P e is defined as the expected accuracy rate (probability of random guess).
κ = P o P e 1 P e

3.4. Evaluation of Performance Metrics

The number of correct and incorrect predictions and performance metrics obtained as a result of the study are shown in Table 7 for RF, Table 8 for ANN, and Table 9 for the traditional method; all results are summarized in Table 10.
The accuracy values obtained for RF, ANN and the traditional method are given in Figure 12. The highest accuracy value was reached with the RF model with 89%. Accuracy rates of 87% were obtained for the traditional method and 86% for ANN.

3.5. ROC Curve and AUC

ROC curve (Receiver Operating Characteristic Curve) is a graph used to analyze how a classification model performs at different threshold values. It is drawn by calculating the True Positive Rate (TPR) and False Positive Rate (FPR) values. TPR (Sensitivity) is the rate at which the model correctly predicts true positives (Equation 21). FPR (False Positive Rate - Fall-out) is the rate at which the model falsely predicts true negatives as positives (Equation 22).
T P R = T P T P + F N
F P R = F P F P + T N
If the ROC curve is close to the upper left corner, it indicates that the model is performing well. If the curve is close to the 45-degree diagonal line, it means that the model is making random predictions. The ROC curve is given in Figure 13 for RF and in Figure 14 for ANN.
AUC (Area Under the Curve) refers to the area under the ROC curve and measures the overall classification performance of the model. If the AUC value is close to 1, the model makes excellent discrimination, if AUC = 0.5, the model makes random predictions, and if AUC < 0.5, the model makes poor classification. The AUC value is one of the most important metrics that show how well the model distinguishes positive and negative classes. A value between 0.7-0.8 is considered good, 0.8-0.9 is considered very good, and greater than 0.9 is considered an excellent model. The ROC curve and AUC are important metrics for comparing different models, evaluating the model in datasets with class imbalance, and determining the optimal threshold value. The ROC curves given in Figure 13 and Figure 14 were evaluated and the area under the graph was calculated as 0.932 for RF and 0.872 for ANN.

3.6. Feature Importance

The split number indicates how many times a feature is used for the split operation in the nodes in the trees. In each decision tree, the nodes divide the data into two subsets. During this splitting operation, one of the features is selected and splits the data according to a certain threshold value. The split number indicates how often the model uses that feature in the decision processes. A high split number indicates that the feature plays an important role in the classification or prediction process of the model. If the split number of a feature is very low (or even zero), this may indicate that the feature is generally unimportant to the model or is used less than other features.
The number of candidates indicates how many times a feature is considered a candidate for splitting. At each node, the model makes the best split decision from a randomly selected subset of features (candidate features). The number of candidates indicates how many times a feature is included in this subset. This number usually depends on the total number of nodes and the size of the randomly selected subset. A high number of candidates indicates that the feature is frequently included in randomly selected subsets and therefore frequently considered for splitting. However, being a candidate does not necessarily mean that a feature will be used. The model selects the feature that best splits.
Candidate > Split: Not every feature may be used for splitting in every case it is selected as a candidate. In other words, a feature may be frequently nominated in nodes, but may be used less because it does not perform better than other features for splitting.
Split ≈ Candidate: If a feature is both frequently nominated and frequently used for splitting, this means that the feature is very important for the model.
Split ≪ Candidate: If the candidate number of a feature is high but the number of splits is low, this may indicate that the feature generally falls short of the splitting criterion.
The split and candidate numbers in the first three levels of Random Forest trees are shown in Table 11. Here, the ratio of the split numbers in the first three levels to the total number of splits for each feature (input parameter) is significant in terms of feature importance. When the split number ratios given in Figure 15 are evaluated; It is seen that the parameters q c , a m a x , f s , CSR and I c are more important than other parameters in the Random Forest model, respectively. Since feature importance is not directly calculated in ANN models compared to the RF module, only feature importance for the RF model was evaluated within the scope of the study.

4. Discussion

In this study, the potential of machine learning (ML) techniques for predicting seismic soil liquefaction was evaluated using Random Forest (RF) and Artificial Neural Network (ANN) models, and their performances were compared with the traditional Cone Penetration Test (CPT) based method. The findings clearly demonstrate the capability of ML algorithms in analyzing complex soil behaviors under seismic loading and highlight their advantages over conventional empirical approaches.
The analyses conducted to compare the ML models and the traditional method reveal significant differences in classification success:
  • Accuracy Rates: According to the performance metrics, the RF model achieved the highest accuracy rate at 89%. This was followed by the traditional CPT-based method with 87% and the ANN model with 86%.
  • ROC-AUC Analysis: When evaluating the Area Under the Curve (AUC) values, the RF model yielded an AUC of 0.932, whereas the ANN model scored 0.872. An AUC value greater than 0.9 indicates that the RF model possesses an "excellent" classification capability in distinguishing between liquefied and non-liquefied cases. Although the ANN model is considered a "very good" model with an AUC between 0.8 and 0.9, it fell behind the RF model.
  • Overfitting Control: The overfitting problem, which is frequently encountered in decision tree methodologies, was successfully managed in the RF model through the use of "Bagging" and "Random Subspace" techniques. This structural advantage effectively reduces the variance of the model and increases its generalization capability, confirming why RF provided more robust predictions compared to the ANN model.
The fact that the traditional method achieved a relatively high accuracy of 87% proves its long-standing empirical validity. However, traditional deterministic approaches often struggle to fully capture the complex, nonlinear interactions between soil properties and seismic parameters.
Conventional methods typically rely on simplified assumptions and deterministic correlations to account for multifaceted parameters such as fines content or plasticity.
In contrast, machine learning algorithms can model nonlinear and multidimensional interactions flexibly, without the need for a predetermined mathematical form between input and output variables.
Furthermore, the dynamic framework of ML techniques allows them to continuously update and adapt as new data emerges, offering a distinct advantage over static traditional models in capturing the transient and variable nature of soil behavior.
Identifying which parameters play a more significant role in prediction models directly impacts engineering decision-making processes. The feature importance analysis applied to the RF model revealed the parameters that the model relied on most heavily:
The analysis demonstrated that the most influential variables used in the splitting operations of the trees were, respectively, cone tip resistance ( q c ), peak ground acceleration ( a m a x ), cone friction resistance ( f s ), cyclic stress ratio (CSR), and soil behavior type index ( I c ).
Considering that the fundamental mechanism underlying liquefaction is the cyclic shearing induced by seismic forces, it is theoretically consistent that a m a x and CSR are among the most critical parameters for the model.
Similarly, the high importance of direct field measurement metrics such as q c and f s , which characterize the in-situ physical state of the soil, provides a strong physical validation that reinforces the model's accuracy.

Author Contributions

Methodology, S.T. and E.B.; Validation, E.B.; Formal analysis, S.T.; Investigation, E.B.; Resources, S.T.; Writing—original draft, S.T.; Writing—review & editing, E.B.; Visualization, S.T.; Supervision, E.B.; Project administration, E.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Baziar, M.H.; Nilipour, N. Evaluation of Liquefaction Potential Using Neural-Networks and CPT Results. Soil Dynamics and Earthquake Engineering 2003, 23, 631–636. [Google Scholar] [CrossRef]
  2. Liu, B.Y.; Ye, L.Y.; Xiao, M.L.; Miao, S. Artificial Neural Network Methodology for Soil Liquefaction Evaluation Using CPT Values. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2006, 4113 LNCS-I, 329–336. [Google Scholar] [CrossRef]
  3. Robertson, P.K.; Wride, C.E. Evaluating Cyclic Liquefaction Potential Using the Cone Penetration Test. Canadian Geotechnical Journal 1998, 35, 442–459. [Google Scholar] [CrossRef]
  4. Seed, R.; Cetin, K.; Moss, R.; Kammerer, A.; Wu, J.; Pestana, J.; Riemer, M.; Sancio, R.; Bray, J.; Kayen, R.; et al. Recent Advances in Soil Liquefaction Engineering: A Unified and Consistent Framework. In Proceedings of the 26th Annual ASCE Los Angeles Geotechnical Spring Seminar, Long Beach, CA, 2003. [Google Scholar]
  5. Hanna, A.M.; Ural, D.; Saygili, G. Neural Network Model for Liquefaction Potential in Soil Deposits Using Turkey and Taiwan Earthquake Data. Soil Dynamics and Earthquake Engineering 2007, 27, 521–540. [Google Scholar] [CrossRef]
  6. Tung, A.T.Y.; Wang, Ya Yung; Wong, F.S. Assessment of Liquefaction Potential Using Neural Networks. Soil Dynamics and Earthquake Engineering 1993, 12, 325–335. [Google Scholar] [CrossRef]
  7. Wang, J.; Rahman, M.S. A Neural Network Model for Liquefaction-Induced Horizontal Ground Displacement. Soil Dynamics and Earthquake Engineering 1999, 18, 555–568. [Google Scholar] [CrossRef]
  8. Erzin, Y.; Ecemis, N. The Use of Neural Networks for CPT-Based Liquefaction Screening. Bulletin of Engineering Geology and the Environment 2015, 74, 103–116. [Google Scholar] [CrossRef]
  9. Livingston, G.; Piantedosi, M.; Kurup, P.; Sitharam, T.G. Using Decision-Tree Learning to Assess Liquefaction Potential from CPT and Vs. Geotechnical Earthquake Engineering and Soil Dynamics IV 2008, 1–10. [Google Scholar]
  10. Kohestani, V.R.; Hassanlourad, M.; Ardakani, A. Evaluation of Liquefaction Potential Based on CPT Data Using Random Forest. Natural Hazards 2015, 79, 1079–1089. [Google Scholar] [CrossRef]
  11. Demir, S.; Sahin, E.K. Comparison of Tree-Based Machine Learning Algorithms for Predicting Liquefaction Potential Using Canonical Correlation Forest, Rotation Forest, and Random Forest Based on CPT Data. Soil Dynamics and Earthquake Engineering 2022, 154, 107130. [Google Scholar] [CrossRef]
  12. Boulanger, R.W.; Idriss, I.M.; Boulanger, R.W. CPT and SPT Based Liquefaction Triggering Procedures; 2014. [Google Scholar]
  13. Boulanger, R.W.; Meija, L.H.; Idriss, I.M. Liquefaction at Moss Landing during Loma Prieta Earthquake. Journal of Geotechnical and Geoenvironmental Engineering 1997, 123, 453–467. [Google Scholar] [CrossRef]
  14. Juang, C.H.; Yuan, H.; Lee, D.-H.; Lin, P.-S. Simplified Cone Penetration Test-Based Method for Evaluating Liquefaction Resistance of Soils. Journal of Geotechnical and Geoenvironmental Engineering 2003, 129, 66–80. [Google Scholar] [CrossRef]
  15. Nejad, A.S.; Guler, E.; Ozturan, M. Evaluation of Liquefaction Potential Using Random Forest Method and Shear Wave Velocity Results; Institute of Electrical and Electronics Engineers Inc.: Budapest, Hungary, 2018. [Google Scholar]
  16. Juang, C.H.; Chen, C.J.; Tien, Y.-M. Appraising Cone Penetration Test Based Liquefaction Resistance Evaluation Methods: Artificial Neural Network Approach. 1999, 36, 443–454. [Google Scholar] [CrossRef]
  17. Pacheco, V.L.; Bragagnolo, L.; Dalla Rosa, F.; Thomé, A. Cone Penetration Test Prediction Based on Random Forest Models and Deep Neural Networks. Geotechnical and Geological Engineering 2023, 41, 4595–4628. [Google Scholar] [CrossRef]
  18. Kumar, D.R.; Samui, P.; Burman, A.; Wipulanusat, W.; Keawsawasvong, S. Liquefaction Susceptibility Using Machine Learning Based on SPT Data. Intelligent Systems with Applications 2023, 20, 200281. [Google Scholar] [CrossRef]
  19. Sui, Q. ru; Chen, Q. huang; Wang, D. dan; Tao, Z. gang Application of Machine Learning to the Vs-Based Soil Liquefaction Potential Assessment. J. Mt. Sci. 2023, 20, 2197–2213. [Google Scholar] [CrossRef]
  20. Ozsagir, M.; Erden, C.; Bol, E.; Sert, S.; Özocak, A. Machine Learning Approaches for Prediction of Fine-Grained Soils Liquefaction. Comput. Geotech. 2022, 152, 105014. [Google Scholar] [CrossRef]
  21. Huang, S.; Huang, M.; Lyu, Y. An Improved KNN-Based Slope Stability Prediction Model. Advances in Civil Engineering 2020, 2020, 8894109. [Google Scholar] [CrossRef]
  22. Kuran, F.; Tanırcan, G.; Pashaei, E. Developing Machine Learning-Based Ground Motion Models to Predict Peak Ground Velocity in Turkiye. J. Seismol. 2024, 28, 1183–1204. [Google Scholar] [CrossRef]
  23. Youd, T.L.; Idriss, I.M. Liquefaction Resistence of Soils: Summary Report From The 1996 NCEER and 1998 NCEER/NSF Workshops of Evaluation of Liquefaction Resistence of Soils. Journal of Geotechnical and Geoenvironmental Engineering 2001, 127, 297–313. [Google Scholar] [CrossRef]
  24. Seed, H.B.; Idriss, I.M. Ground Motions and Soil Liquefaction during Earthquakes; Oakland, California, 1982. [Google Scholar]
  25. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  26. Breiman, L. Bagging Predictors. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  27. Ho, T.K. The Random Subspace Method for Constructing Decision Forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar] [CrossRef]
Figure 1. Earthquake events and number of observed liquefaction and non-liquefaction cases.
Figure 1. Earthquake events and number of observed liquefaction and non-liquefaction cases.
Preprints 208131 g001
Figure 2. Proportions of observed liquefaction and non-liquefaction cases.
Figure 2. Proportions of observed liquefaction and non-liquefaction cases.
Preprints 208131 g002
Figure 3. Scatter and histogram of input parameters.
Figure 3. Scatter and histogram of input parameters.
Preprints 208131 g003
Figure 4. Recommended clean sand curve for use in CRR calculations [3].
Figure 4. Recommended clean sand curve for use in CRR calculations [3].
Preprints 208131 g004
Figure 5. Determination of soil behavior and type with CPT data [3].
Figure 5. Determination of soil behavior and type with CPT data [3].
Preprints 208131 g005
Figure 6. Changes in soil behavior type index according to fines ratio and soil regions (Table 2) [3].
Figure 6. Changes in soil behavior type index according to fines ratio and soil regions (Table 2) [3].
Preprints 208131 g006
Figure 7. Recommended grain characteristic correction to achieve clean sand equivalent CPT tip resistance in sandy soils [3].
Figure 7. Recommended grain characteristic correction to achieve clean sand equivalent CPT tip resistance in sandy soils [3].
Preprints 208131 g007
Figure 8. Calculating CRR using CPT data [3].
Figure 8. Calculating CRR using CPT data [3].
Preprints 208131 g008
Figure 9. Bagging.
Figure 9. Bagging.
Preprints 208131 g009
Figure 10. Randomly Selected Variables at Each Node.
Figure 10. Randomly Selected Variables at Each Node.
Preprints 208131 g010
Figure 11. Basic Components of a Decision Tree.
Figure 11. Basic Components of a Decision Tree.
Preprints 208131 g011
Figure 12. Comparison of accuracy values obtained with all models.
Figure 12. Comparison of accuracy values obtained with all models.
Preprints 208131 g012
Figure 13. ROC Curve of RF.
Figure 13. ROC Curve of RF.
Preprints 208131 g013
Figure 14. ROC Curve of ANN.
Figure 14. ROC Curve of ANN.
Preprints 208131 g014
Figure 15. Feature importance ratios.
Figure 15. Feature importance ratios.
Preprints 208131 g015
Table 1. Selected data sources.
Table 1. Selected data sources.
Source Location
Boulanger et al., 1997 USA (Loma Prieta)
Juang et al., 2003 China (Haicheng), Taiwan (Chi-Chi), USA (Imperial Valley, Loma Prieta, San Fernando Valley)
Boulanger et al., 2014 China (Haicheng, Tangshan), Japan (Hyogo-Ken Nanbu, Nihonkai-Chubu, Niigata, Tohoku), Mexico (Victoria), New Zealand (Christchurch, Darfield, Edgecumbe, Inangahua), Taiwan (Chi-Chi), Türkiye (Kocaeli), USA (Borah Peak, Imperial Valley, Loma Prieta, Northridge, Superstition Hills, Westmorland)
Table 2. Soil Behavior type limits.
Table 2. Soil Behavior type limits.
Soil   Behavior   Type   Index   ( I c ) Region Soil Behavior Type Limits
I c <1.31 7 Gravelly sand to dense sand
1.31 < I c <2.05 6 Sands: clean sand to silty sand
2.05 < I c <2.60 5 Sand mixtures: silty sand to sandy silt
2.60 < I c <2.95 4 Silt mixtures: clayey silt to silty clay
2.95 < I c <3.60 3 Clays: silty clay to clay
I c >3.60 2 Organic soils: peats
Table 3. Comparison of begging and random subspace method.
Table 3. Comparison of begging and random subspace method.
Stages Bootstrap Aggregating Random Subspace Method
I. Data Sampling Data subsets are created using bootstrap sampling. Feature sampling is performed to select a different subset of features for each tree.
II. Training the Models Each tree is trained with all features. Each tree is trained with a subset of features.
III. Combining Results The prediction with the most votes is determined for classification, and the average of the predictions is determined for regression. The prediction with the most votes is determined for classification, and the average of the predictions is determined for regression.
Table 4. Hyperparameters for RF model.
Table 4. Hyperparameters for RF model.
Number of trees 190
Tree depth 11
Table 5. Hyperparameters for ANN model.
Table 5. Hyperparameters for ANN model.
Number of iterations 160
Number of hidden layers 5
Number of hidden neurons per layer 10
Table 6. Comparing model predicted results with actual results.
Table 6. Comparing model predicted results with actual results.
Positive Prediction Negative Prediction
Positive Observation True Positive
(TP)
False Negative
(FN)
Negative Observation False Positive
(FP)
True Negative
(TN)
Table 7. Random forest accuracy statistics.
Table 7. Random forest accuracy statistics.
Fold No TP FP TN FN Recall Precision Sensitivity Specificity F-measure Cohen's Kappa Accuracy
0 30 4 12 2 0.94 0.88 0.94 0.75 0.91 0.71 0.88
1 29 4 12 3 0.91 0.88 0.91 0.75 0.89 0.67 0.85
2 30 5 11 2 0.94 0.86 0.94 0.69 0.90 0.66 0.85
3 30 2 14 2 0.94 0.94 0.94 0.88 0.94 0.81 0.92
4 31 4 12 1 0.97 0.89 0.97 0.75 0.93 0.75 0.90
5 31 6 10 1 0.97 0.84 0.97 0.63 0.90 0.64 0.85
6 29 0 17 2 0.94 1.00 0.94 1.00 0.97 0.91 0.96
7 29 5 12 2 0.94 0.85 0.94 0.71 0.89 0.67 0.85
8 29 3 14 2 0.94 0.91 0.94 0.82 0.92 0.77 0.90
9 30 3 14 1 0.97 0.91 0.97 0.82 0.94 0.81 0.92
Mean 0.94 0.89 0.94 0.78 0.92 0.74 0.89
Table 8. Artificial neural network accuracy statistics.
Table 8. Artificial neural network accuracy statistics.
Fold No TP FP TN FN Recall Precision Sensitivity Specificity F-measure Cohen's Kappa Accuracy
0 27 3 13 5 0.84 0.90 0.84 0.81 0.87 0.64 0.83
1 29 5 11 3 0.91 0.85 0.91 0.69 0.88 0.61 0.83
2 28 7 9 4 0.88 0.80 0.88 0.56 0.84 0.46 0.77
3 29 2 14 3 0.91 0.94 0.91 0.88 0.92 0.77 0.90
4 30 3 13 2 0.94 0.91 0.94 0.81 0.92 0.76 0.90
5 30 8 8 2 0.94 0.79 0.94 0.50 0.86 0.48 0.79
6 30 5 12 1 0.97 0.86 0.97 0.71 0.91 0.71 0.88
7 30 6 11 1 0.97 0.83 0.97 0.65 0.90 0.66 0.85
8 27 1 16 4 0.87 0.96 0.87 0.94 0.92 0.78 0.90
9 30 3 14 1 0.97 0.91 0.97 0.82 0.94 0.81 0.92
Mean 0.92 0.88 0.92 0.74 0.89 0.67 0.86
Table 9. Traditional method accuracy statistics.
Table 9. Traditional method accuracy statistics.
TP FP TN FN Recall Precision Sensitivity Specificity F-measure Cohen's Kappa Accuracy
290 38 126 26 0.92 0.88 0.92 0.77 0.90 0.70 0.87
Table 10. Comparison of accuracy statistics obtained with all models.
Table 10. Comparison of accuracy statistics obtained with all models.
Model Recall Precision Sensitivity Specificity F-measure Cohen's Kappa Accuracy
RF 0.94 0.89 0.94 0.78 0.92 0.74 0.89
ANN 0.92 0.88 0.92 0.74 0.89 0.67 0.86
Traditional 0.92 0.88 0.92 0.77 0.90 0.70 0.87
Table 11. Number of split and candidate.
Table 11. Number of split and candidate.
Split Candidate
Level 0 Level 1 Level 2 Level 0 Level 1 Level 2
M w 0 20 23 60 139 246
a m a x 36 80 77 70 153 251
d 7 13 53 72 135 248
q c 52 76 105 52 115 249
f s 34 55 101 62 124 265
CSR 29 53 100 80 113 266
I c 25 36 85 55 115 255
σ v 4 20 59 53 131 248
σ v 3 21 47 66 115 216
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated