Submitted:
12 February 2025
Posted:
13 February 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Research Area and Data Sources
2.1. Overview of the Study Area
2.2. Data Sources and Processing
2.2.1. Soil Sample Data
2.2.2. Obtaining Environmental Covariates
- (1)
- Topographical factors
- (2)
- Climate factors
- (3)
- Biological factors
- (4)
- soil texture
- (5)
- Soil type and land use data
3. Research Method
3.1. Ordinary Kriging
3.2. Random Forest
3.3. Genetic Algorithm
3.4. SHAP Driving Force Analysis
3.5. Model Evaluation Indicators
4. Experimental Results and Analysis
4.1. Basic Statistics of Soil Organic Matter Content
4.2. Assessment of the Importance of Environmental Variables in RF Models
4.3. Comparative Analysis of Mapping Accuracy
4.4. SHAP Overlay Explanation
4.5. Spatial Distribution of Soil Organic Matter
5. Discussion
5.1. Advantages of RF-GA Model
5.2. Explanation of Environmental Variables
5.3. Limitations and Potential Improvements
5.3.1. Insufficient Data Scale and Representativeness
5.3.2. Improvement Directions for Model Optimization
5.3.3. Lack of Applicability and Interaction Analysis of Explanatory Methods
6. Conclusions and Prospects
- (1)
- The distribution of the SOM in the research area is influenced by factors such as terrain, climate, and biological factors, and it has obvious spatial differentiation characteristics. In the study area, the SOM content is higher in the northern and eastern mountainous areas, while it is lower in the central area with a flat terrain. A few high values are also distributed in southern cities and mixed forest areas.
- (2)
- The random forest model RF-GA based on genetic algorithm variable combination optimization is more effective in extracting environmental variables, it demonstrates improved accuracy in SOM prediction compared to the RF model using full-variable prediction, making it a promising tool for SOM prediction in complex areas.
- (3)
- Further research using the RFGA-SHAP model indicates that the key influencing factors on the spatial distribution of surface SOM in the hilly basin area of Lanxi City are CNBL, DEM, Pm, NDWI, CI, Tm, SCD, BSI, etc. These factors can make significant contributions to soil management practices and provide information for decision-making to promote sustainable land use and agricultural productivity.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wiesmeier, M.; Barthold, F.; Blank, B.; et al. Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem[J]. Plant & Soil 2011, 340, 7–24. [Google Scholar] [CrossRef]
- Kempen, B.; Brus, D.J.; Stoorvogel, J.J.; et al. Efficiency Comparison of Conventional and Digital Soil Mapping for Updating Soil Maps[J]. Soil Science Society of America Journal 2012, (revisions)(6). [Google Scholar] [CrossRef]
- Zhao, M.S.; Rossiter, D.G.; Li, D.C.; et al. Mapping soil organic matter in low-relief areas based on land surface diurnal temperature difference and a vegetation index[J]. Ecological Indicators 2014. [Google Scholar] [CrossRef]
- Zhao, M.S.; Rossiter, D.G.; Li, D.C.; et al. Mapping soil organic matter in low-relief areas based on land surface diurnal temperature difference and a vegetation index[J]. Ecological Indicators 2014. [Google Scholar] [CrossRef]
- Xie, H.; Li, W.; Duan, L.; et al. Digital mapping of cultivated land soil organic matter in hill-mountain and plain regions[J]. Journal of soil & sediments 2024, 24. [Google Scholar] [CrossRef]
- Zhang, W.C.; Wan, H.S.; Zhou, M.H.; et al. Soil total and organic carbon mapping and uncertainty analysis using machine learning techniques[J]. Ecological Indicators 2022, 143. [Google Scholar] [CrossRef]
- Sun, Y.; Ma, J.; Zhao, W.; et al. Digital mapping of soil organic carbon density in China using an ensemble model[J]. Environmental research, 231(Pt 2):116131[2025-02-06]. [CrossRef]
- Mousavi, S.R.; Sarmadian, F.; Omid, M.; et al. Three-dimensional mapping of soil organic carbon using soil and environmental covariates in an arid and semi-arid region of Iran[J]. Measurement 2022. [Google Scholar] [CrossRef]
- Zeraatpisheh, M.; Ayoubi, S.; Jafari, A.; et al. Digital mapping of soil properties using multiple machine learning in a semi-arid region, central Iran[J]. Elsevier 2019. [Google Scholar] [CrossRef]
- Agyeman, P.C.; Ahado, S.K.; Boruvka, L.; et al. Trend analysis of global usage of digital soil mapping models in the prediction of potentially toxic elements in soil/ sediments: a bibliometric review[J]. Environmental Geochemistry and Health 2021, 42. [Google Scholar] [CrossRef]
- Hendriks C M J, Stoorvogel J J, lvarez-Martínez, Jose Manuel, et al. Introducing a mechanistic model in digital soil mapping to predict soil organic matter stocks in the Cantabrian region (Spain)[J]. European Journal of Soil Science 2021, 72. [Google Scholar] [CrossRef]
- Sun, X.L.; Wang, H.L.; Zhao, Y.G.; et al. Digital soil mapping based on wavelet decomposed components of environmental covariates[J]. Geoderma 2017, 303, 118–132. [Google Scholar] [CrossRef]
- Min, H.; Ko, H.J.; Ko, C.S. A genetic algorithm approach to developing the multi-echelon reverse logistics network for product returns[J]. Omega 2006, 34, 56–69. [Google Scholar] [CrossRef]
- Maziar, Pasdarpour, and,et al. Optimal design of soil dynamic compaction using genetic algorithm and fuzzy system[J]. Soil Dynamics and Earthquake Engineering 2009. [CrossRef]
- Shapchenkova, O.A.; Krasnoshchekov, Y.N.; Loskutov, S.R. Application of the methods of thermal analysis for the assessment of organic matter in postpyrogenic soils[J]. Eurasian Soil Science 2011, 44, 677–685. [Google Scholar] [CrossRef]
- Agyeman, P.C.; Ahado, S.K.; Boruvka, L.; et al. Trend analysis of global usage of digital soil mapping models in the prediction of potentially toxic elements in soil/ sediments: a bibliometric review[J]. Environmental Geochemistry and Health 2021, 42. [Google Scholar] [CrossRef] [PubMed]
- Minasny, B.; McBratney, A.B.; Malone, B.P.; et al. Digital mapping of soil carbon[J]. Advances in agronomy 2013, 118, 1–47. [Google Scholar]
- McBratney, A.B.; Santos ML,, M.; Minasny, B. On digital soil mapping[J]. Geoderma 2003, 117, 3–52.
- Peng, S.; Ding, Y.; Liu, W.; et al. 1 km monthly temperature and precipitation dataset for China from 1901 to 2017[J]. Earth System Science Data 2019, 11, 1931–1946. [Google Scholar] [CrossRef]
- MATHERONG Estimating and Choosing [M]. Springer Berlin Heidelberg, 1989.
- WEBSTERR Geostatistics for Environmental Scientists [M]. John Wiley & Sons, 2001.
- BREIMANL Random forests [J]. Machine Learning 2001, 45, 5–32. [CrossRef]
- FORKUOR G, HOUNKPATIN O K L, WELP G, et al. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models [J]. Plos One 2017, 12. [Google Scholar]
- Were, K.; Bui, D.T.; Dick, O.B.; et al. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape [J]. Ecological Indicators 2015, 52, 394–403. [Google Scholar] [CrossRef]
- PITTMANR; HUB; WEBSTERK Improvement of soil property mapping in the Great Clay Belt of northern Ontario using multi-source remotely sensed data [J]. Geoderma 2021, 381.
- KENNEDY, WERE, DIEU, et al. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape [J]. Ecological Indicators 2015, 52, 394–403. [Google Scholar] [CrossRef]
- Salah, B.; Ali, F.; Ali, O.; et al. Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches[J]. Energies 2018, 11, 1636. [Google Scholar] [CrossRef]
- Xue,Bing,Zhang, et al. A Comprehensive Comparison on Evolutionary Feature Selection Approaches to Classification.[J]. International Journal of Computational Intelligence & Applications 2015. [CrossRef]
- Huang, J.; Cai, Y.; Xu, X. A hybrid genetic algorithm for feature selection wrapper based on mutual information[J]. Pattern Recognition Letters 2007, 28, 1825–1844. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions[J]. 2017. [CrossRef]
- Lini, K. A concordance correlation coefficient to evaluate reproducibility [J]. Biometrics 1989, 45, 255–68. [Google Scholar] [CrossRef]
- Mcbrideg, B. A proposal for strength-of-agreement criteria for Lin's concordance correlation coefficient [J]. 2005.
- Ying-Qiang, S.; Lian-An, Y.; Bo, L.; et al. Spatial Prediction of Soil Organic Matter Using a Hybrid Geostatistical Model of an Extreme Learning Machine and Ordinary Kriging[J]. Sustainability 2017, 9, 754. [Google Scholar] [CrossRef]
- Li-Na, G.; Gui-Sheng, F. Support Vector Machines for Surface Soil Density Prediction based on Grid Search and Cross Validation[J]. Chinese Journal of Soil Science 2018. [Google Scholar]
- Chen, S.; Arrouays, D.; Mulder, V.L.; et al. Digital mapping of GlobalSoilMap soil properties at a broad scale: A review[J]. Geoderma 2022, 409, 115567. [Google Scholar] [CrossRef]
- Hamzehpour, N.; Shafizadeh-Moghadam, H.; Valavi, R. Exploring the driving forces and digital mapping of soil organic carbon using remote sensing and soil texture[J]. Catena 2019, 182, 104141. [Google Scholar] [CrossRef]
- Zhao, C.; Li, P.; Yan, Z.; et al. Effects of landscape pattern on water quality at multi-spatial scales in Wuding River Basin, China[J]. Environmental Science and Pollution Research 2024, 31, 19699–19714. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Zhao, X.; Guo, X.; et al. Mapping of soil organic carbon using machine learning models: Combination of optical and radar remote sensing data[J]. Soil Science Society of America Journal 2022, 86, 293–310. [Google Scholar] [CrossRef]
- Zeraatpisheh, M.; Ayoubi, S.; Jafari, A.; et al. Digital mapping of soil properties using multiple machine learning in a semi-arid region, central Iran[J]. Geoderma 2019, 338, 445–452. [Google Scholar] [CrossRef]
- Guo, P.T.; Li, M.F.; Luo, W.; et al. Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach[J]. Geoderma 2015, 237, 49–59. [Google Scholar] [CrossRef]
- Keskin, H.; Grunwald, S.; Harris, W.G. Digital mapping of soil carbon fractions with machine learning[J]. Geoderma 2019, 339, 40–58. [Google Scholar] [CrossRef]








| Soil-Forming Factors | Input Variables | Spatial Resolution |
|---|---|---|
| Topographic factors | Analytical hillshading (AH), aspect (ASP), closed depressions (CDs), convergence index (CI), channel network base level (CNBL), channel network distance (CND), elevation (DEM), coefficient of variation of elevation (ECV), LS factor (LS), mass balance index (MBI), multiscale ridge top flatness (MRRTF), multi-resolution valley bottom flatness (MRVBF), plan curvature (PLC), profile curvature (PRC), relative slope position (RSP), surface cutting depth (SCD), slope (SLP), total catchment area (TCA), topographic position index (TPI), terrain ruggedness index (TRI), topographic wetness index (TWI), terrain undulation (TU), valley depth (VD), wind exposition index (WEI) | 12.5 m |
| Biological factors | Bare soil index (BSI), enhanced vegetation index (EVI), global environment monitoring index (GEMI), green normalized difference vegetation index (GNDVI), modified normalized difference water index (MNDWI), modified soil-adjusted vegetation index (MSAVI), normalized difference moisture index (NDMI), normalized difference vegetation index (NDVI), normalized difference water index (NDWI), net primary production (NPP), soil-adjusted vegetation index (SAVI), simple ratio (SR), visible light atmospheric impedance index (VARI) | 10 m |
| Soil texture | Sand content (sand), silt content (silt), clay content (clay) | 900 m |
| Climate factors | Evaporation (E_m), humidity mean (H_m), land surface temperature mean (LST_m), precipitation mean (P_m), temperature mean (T_m) | 1000 m |
| Land use (LU) | Vector data | |
| Soil type (ST) | ||
| Sentinel-2 Bands | Bandwidth (nm) | Central Wavelength (nm) |
|---|---|---|
| Band 1—coastal aerosol | 21 | 442.7 |
| Band 2—blue | 66 | 492.4 |
| Band 3—green | 36 | 559.8 |
| Band 4—red | 31 | 664.6 |
| Band 5—vegetation red edge | 2 | 704.1 |
| Band 6—vegetation red edge | 15 | 740.5 |
| Band 7—vegetation red edge | 20 | 782.8 |
| Band 8—NIR | 106 | 832.8 |
| Band 8A—narrow NIR | 21 | 864.7 |
| Band 9—water vapor | 20 | 945.1 |
| Band 10—SWR-Cirrus | 3 | 1373.5 |
| Band 11—SWIR | 91 | 1613.7 |
| Band 12—SWIR | 175 | 2202.4 |
| Type | Samples | Max (g·kg−1) |
Min (g·kg−1) |
AVE (g·kg−1) | SD (g·kg−1) | |
|---|---|---|---|---|---|---|
| Training set | Raw data | 1249 | 66.20 | 3.91 | 22.25 | 8.40 |
| Box–Cox | 1249 | 10.87 | 1.81 | 6.01 | 1.31 | |
| Validation set | Raw data | 311 | 58.60 | 5.21 | 22.50 | 8.58 |
| Box–Cox | 311 | 10.24 | 2.34 | 6.05 | 1.30 | |
| Type | CV (%) | Skewness | Kurtosis | K-S | ||
| Training set | Raw data | 37.77 | 0.85 | 1.89 | 0.000 | |
| Box–Cox | 21.84 | −0.01 | 0.44 | 0.081 | ||
| Validation set | Raw data | 38.15 | 0.86 | 1.40 | 0.006 | |
| Box–Cox | 21.54 | 0.12 | 0.29 | 0.200 | ||
| Method | MAE | RMSE | R2 | LCCC |
|---|---|---|---|---|
| OK | 6.31 | 8.33 | 0.06 | 0.16 |
| RF | 4.60 | 5.86 | 0.21 | 0.38 |
| RF-GA | 3.02 | 3.49 | 0.49 | 0.67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).