Submitted:
15 May 2025
Posted:
16 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Background and State-of-the-Art
2.1. Mining Tailing and Geotechnical Challenge
2.2. Soil Stratigraphic Profiling
- is the corrected cone tip resistance,
- is the sleeve friction,
- is the total vertical overburden stress,
- is the effective vertical overburden stress.
2.3. Clustering Analysis
2.4. Related Work
3. Materials and Methods
3.1. Site Overview
3.2. Dataset Characterization
3.3. Clustering-Based Stratigraphic Profile
3.4. Implementation Details
- Pandas. This library contains an open-source data analysis and manipulation tool broadly used in machine learning projects. We rely on the Pandas library [36] for data management and processing.
- Scikit-learn. This library contains several algorithms used for machine learning purposes [37]. We used Scikit-learn version 1.6.1 to implement the clustering algorithms used in this research.
- matplotlib. This library contains a comprehensive set of tools for creating visualizations in Python [38]. We rely on Matplotlib to generate most of the visualizations presented in this paper, including the stratigraphic profiles.
4. Results
4.1. Model Selection and Tuning
4.2. Stratigraphic Profile Through Clustering
4.3. Stratigraphic Profile Based on Soil Behavior Index ()
5. Discussion
6. Conclusions
- k-means and MeanShift were the most effective methods for detecting geotechnically significant stratigraphic layers;
- DBSCAN and Affinity Propagation showed limitations in dealing with vertical CPTu data, resulting in either under- or over-segmentation;
- The index, although widely used, may overlook internal variations linked to depositional history or consolidation, which clustering methods can identify;
- When aligned by depth and construction phase, clustered profiles from temporally distinct tests revealed consistent stratigraphic patterns.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| CPTu | Cone Penetration Tests with pore pressure measurements |
| DBCVI | Density-Based Clustering Validation Index |
| Soil Behavior Type Index | |
| DBSCAN | Density-Based Spatial Clustering of Applications with Noise |
| PCA | Principal Component Analysis |
| Probability Distribution Function |
References
- Lune, T.; Powell, J.; Robertson, P. Cone Penetration Testing in Geotechnical Practice, 1st ed.; CRC Press, 1997. [CrossRef]
- White, D. CPT equipment: Recent advances and future perspectives. Cone Penetration Testing 2022, 66–80.
- Adamo, N.; Al-Ansari, N.; Sissakian, V.; Laue, J.; Knutsson, S. Dam safety: The question of tailings dams. Journal of Earth Sciences and Geotechnical Engineering 2020, 11, 1–26. [Google Scholar] [CrossRef] [PubMed]
- Robertson, P. Soil classification using the cone penetration test. Canadian Geotechnical Journal 1990, 27, 151–158. [Google Scholar] [CrossRef]
- Ekeanyanwu, C.V.; Obisakin, I.F.; Aduwenye, P.; Dede-Bamfo, N. Merging GIS and machine learning techniques: a paper review. Journal of Geoscience and Environment Protection 2022, 10, 61–83. [Google Scholar] [CrossRef]
- Haghshenas, S.S.; Haghshenas, S.S.; Geem, Z.W.; Kim, T.H.; Mikaeil, R.; Pugliese, L.; Troncone, A. Application of harmony search algorithm to slope stability analysis. Land 2021, 10. [Google Scholar] [CrossRef]
- Zhou, B.; Li, C.; Andrade, J.E. Autonomous particle-shape-based classification and identification of calcareous soils through machine learning. Geotechnique 2024. [Google Scholar] [CrossRef]
- Nierwinski, H.P.; Pfitscher, R.J.; Barra, B.S.; Menegaz, T.; Odebrecht, E. A practical approach for soil unit weight estimation using artificial neural networks. Journal of South American Earth Sciences 2023, 131, 104648. [Google Scholar] [CrossRef]
- Vick, S. Planning, Design, and Analysis of Tailings Dams; BiTech, 1990.
- Chropeňová, D.; Slávik, I. Raising of Embankment of an Ore Tailings Pond and an Analysis of its Stability. Slovak Journal of Civil Engineering 2023, 31, 24–33. [Google Scholar] [CrossRef]
- Lyu, Z.; Chai, J.; Xu, Z.; Qin, Y.; Cao, J. A Comprehensive Review on Reasons for Tailings Dam Failures Based on Case History. Advances in Civil Engineering 2019, 2019, 4159306. [Google Scholar] [CrossRef]
- Robertson, P.K.; Melo, L.; Williams, D.J.; Wilson, G.W. Report of the Expert Panel on the Technical Causes of the Failure of Feijão Dam I. Technical report, B1 Technical Investigation Panel, 2019. Accessed: 2025-05-12.
- Liu, L.L.; Wang, Y. Quantification of stratigraphic boundary uncertainty from limited boreholes and its effect on slope stability analysis. Engineering Geology 2022, 306, 106770. [Google Scholar] [CrossRef]
- Jewell, R.J.R.J.; Fourie, A.B. Paste and thickened tailings : a guide / editors R.J. Jewell and A.B. Fourie, third edition ed.; Australian Centre for Geomechanics, University of Western Australia: Nedlands, Western Australia, 2015. [Google Scholar]
- Wroth, C.P. The interpretation of in situ soil tests. Géotechnique 1984, 34, 449–489. [Google Scholar] [CrossRef]
- Ezugwu, A.E.; Ikotun, A.M.; Oyelade, O.O.; Abualigah, L.; Agushaka, J.O.; Eke, C.I.; Akinyelu, A.A. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence 2022, 110, 104743. [Google Scholar] [CrossRef]
- Nazareth, A.F.D.V.; Lana, M.S. A methodology for the definition of geotechnical mine sectors based on multivariate cluster analysis. Geotechnical and Geological Engineering 2021, 39, 4405–4426. [Google Scholar] [CrossRef]
- Dueck, D. Affinity propagation: clustering data by passing messages. PhD thesis, 2009.
- Min, E.; Guo, X.; Liu, Q.; Zhang, G.; Cui, J.; Long, J. A survey of clustering with deep learning: From the perspective of network architecture. IEEE Access 2018, 6, 39501–39514. [Google Scholar] [CrossRef]
- MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability.
- Celebi, M.E.; Kingravi, H.A.; Vela, P.A. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert systems with applications 2013, 40, 200–210. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X.; et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the kdd, Vol. 96; 1996; pp. 226–231. [Google Scholar]
- Schubert, E.; Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS) 2017, 42, 1–21. [Google Scholar] [CrossRef]
- Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on pattern analysis and machine intelligence 2002, 24, 603–619. [Google Scholar] [CrossRef]
- Cheng, Y. Mean shift, mode seeking, and clustering. IEEE transactions on pattern analysis and machine intelligence 1995, 17, 790–799. [Google Scholar] [CrossRef]
- Frey, B.J.; Dueck, D. Clustering by passing messages between data points. science 2007, 315, 972–976. [Google Scholar] [CrossRef]
- Wang, K.; Zhang, J.; Li, D.; Zhang, X.; Guo, T. Adaptive affinity propagation clustering. arXiv preprint 2008. [Google Scholar]
- San Roman Iturbide, O.; Botero Jaramillo, E. Identification of geotechnical units in soil exploration through principal component analysis and clustering. International Journal for Numerical and Analytical Methods in Geomechanics 2024, 48, 1681–1699. [Google Scholar] [CrossRef]
- Sottile, M.G.; Crocker, J.A.; Roldan, L. Interpretation of CPTu data using machine learning techniques to develop the ground model of a dam. In Proceedings of the 7th International Conference on Geotechnical and Geophysical Site Characterization (ISC 24); 2024. [Google Scholar]
- Cho, S.; Cho, B.; Kang, S.; Kim, H. Development of locally specified soil stratification method with CPT data based on machine learning techniques. In Proceedings of the Geotechnics for Sustainable Infrastructure Development. Springer; 2020; pp. 1287–1294. [Google Scholar]
- Shi, C.; Wang, Y. Nonparametric and data-driven interpolation of subsurface soil stratigraphy from limited data using multiple point statistics. Canadian Geotechnical Journal 2021, 58, 261–280. [Google Scholar] [CrossRef]
- Nierwinski, H.P.; Custodio, L.A.; Barbosa, A.S.; Pfitscher, R.J. , Use of Artificial Intelligence to Obtain a StratiGraphic Profile of Tailings Dams from CPTu Tests. In Geo-EnvironMeet 2025; ASCE Library, 2025; pp. 315–323, [https://ascelibrary.org/doi/pdf/10.1061/9780784485699.034]. [CrossRef]
- Dauda, U.; Ismail, B. A study of normalization approach on K-means clustering algorithm. Int. J. Appl. Math. Stat 2013, 45, 439–446. [Google Scholar]
- Abdulnassar, A.; Nair, L.R. A Comprehensive Study on the Importance of the Elbow and the Silhouette Metrics in Cluster Count Prediction for Partition Cluster Models. REVISTA GEINTEC-GESTAO INOVACAO E TECNOLOGIAS 2021, 11, 3792–3806. [Google Scholar] [CrossRef]
- Moulavi, D.; Jaskowiak, P.A.; Campello, R.J.; Zimek, A.; Sander, J. Density-based clustering validation. Proceedings of the In Proceedings of the 2014 SIAM international conference on data mining.SIAM, 2014, pp.839–847.
- Wes McKinney. Data Structures for Statistical Computing in Python. In Proceedings of the Proceedings of the 9th Python in Science Conference; Stéfan van der Walt.; Jarrod Millman., Eds., 2010, pp. 56-61. [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 2011, 12, 2825–2830. [Google Scholar]
- Hunter, J.D. Matplotlib: A 2D graphics environment. Computing in Science & Engineering 2007, 9, 90–95. [Google Scholar] [CrossRef]
- Vendramin, L.; Campello, R.J.; Hruschka, E.R. Relative clustering validity criteria: A comparative overview. Statistical analysis and data mining: the ASA data science journal 2010, 3, 209–235. [Google Scholar] [CrossRef]









| Range | Soil Behavior Type |
|---|---|
| Gravelly sand to sand | |
| – | Sand to silty sand |
| – | Silty sand to sandy silt |
| – | Clayey silt to silty clay |
| Clay |
| Test | num | max depth (m) | (kPa) | (kPa) | u (kPa) |
|---|---|---|---|---|---|
| CPTU04-21-1-2005 | 753 | -15040 | 1138.03 | 10.80 | 27.55 |
| CPTU05-15-1-2005 | 903 | -18040 | 1931.26 | 17.11 | 55.72 |
| CPTU06-1-2-2005 | 1015 | -20280 | 1324.05 | 13.26 | 138.00 |
| CPT00190-11-01-24 | 6305 | -31810 | 5441.29 | 69.99 | 203.04 |
| CPT00190B-13-01-24 | 5209 | -26290 | 4185.61 | 50.40 | 37.79 |
| CPT00190C-15-01-24 | 6315 | -31810 | 3644.32 | 49.69 | 79.49 |
| CPT00190G-06-02-24 | 6348 | -31805 | 5157.49 | 43.32 | 113.02 |
| CPT00191-31-07-24 | 6347 | -31805 | 3339.42 | 105.94 | 461.91 |
| CPT00191A-03-08-24 | 5158 | -25860 | 3588.67 | 99.79 | 273.65 |
| CPT00191B-17-08-24 | 904 | -4595 | 3641.19 | 161.87 | 24.48 |
| CPT00191C-19-08-24 | 1643 | -8290 | 4006.88 | 128.28 | 23.00 |
| CPT00191D-20-08-24 | 5441 | -27280 | 3696.87 | 114.33 | 297.06 |
| Model | Parameter | Values |
|---|---|---|
| k-Means | k | |
| MeanShift | (bandwidth) | |
| DBSCAN | ; minPts | ; |
| Affinity Propagation | preference |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).