Submitted:
01 September 2025
Posted:
02 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Background and Significance of Pollution Source Detection
1.2. Role of Machine Learning in Environmental Monitoring
1.3. Rationale for the Review
1.4. Research Questions or Review Objectives
- a)
- What are the most common machine learning algorithms used in source identification of soil contamination? This demands the most common forms of ML models employed within this application and reasons for such frequency.
- b)
- Most common form of data used with ML models in source identification of soil contamination and why so? Understanding of how input data nature—i.e., geospatial, physicochemical, or remote sensing—is and preprocessing thereof for ML modeling. Knowledge of the nature of input data and how they are preprocessed to fit into ML modeling will be required in order to ascertain model reliability and universality.
- c)
- How is the efficiency and effectiveness of these ML models established for use in source identification of soil pollution? This entails comparison of techniques used to establish accuracy, stability, and interpretability, and benchmarking techniques used in research.
- d)
- What are current trends, issues, and future direction of applying ML to detect the sources of soil pollution, with particular reference to interpretability of models, alignment with emerging technologies, and policy matters? After placing ML in context with regard to where it is being placed, i.e., how it stacks up with regard to actual need, existing policy, and future paths in data science and AI.
1.5. Scope and Delimitations of the Review
2. Review Methodology
2.1. Review Protocol
- Inclusion Criteria
- a)
- Relevance to Detection of Source of Soil Pollution: The study must be directly related to detection or soil pollution source classification, and not overall soil monitoring.
- b)
- Machine Learning Algorithm Application: The inclusion was limited to studies that applied supervised, unsupervised, semi-supervised, or reinforcement learning algorithms. They vary from the standard ML models such as Decision Trees and Support Vector Machines to sophisticated ones such as Deep Neural Networks and ensemble methods.
- c)
- Type of Pollution: Heavy metals (e.g., lead, cadmium, arsenic), organic pollutants (e.g., PAHs, VOCs), agrochemicals (e.g., herbicides, pesticides), hydrocarbons, and microplastics were the types of soil pollutants under consideration.
- d)
- Peer-Reviewed Journal Articles: Full-text peer-reviewed journal articles or academic conference proceedings only from credible sources.
- e)
- Publication Year Window: 2013 to 2024.
- f)
- Language: English language publications only.
- Exclusion Criteria
- a)
- Inadequate Application of Machine Learning: Individuals using statistical, rule-based, or deterministic modeling alone (e.g., linear regression without the application of machine learning method) were excluded.
- b)
- Nonenvironmental Theme: Those articles with focus solely on non-polluting soils such as erosion, salinity, level of nutrients, or overall fertility were excluded unless as related to the determination of sources of pollution.
- c)
- Non-Primary Literature: Editorials, opinion pieces, technical comments, book chapters, and non-peer-reviewed grey literature were ruled out on academic quality.
- d)
- Methodological Opacity: Flawed research studies with no explicit description of the ML model, data employed, or evaluation process were ruled out to check replicability and quality.
2.2. Search Strategy
- Databases Used
- a)
- Scopus – Offers great coverage of scientific topics from environmental science to computer science.
- b)
- Web of Science – Renowned for its high-impact, peer-reviewed journals.
- c)
- IEEE Xplore – Technologically driven and specialized in nature when it comes to technology and engineering, great for machine learning and AI-based research.
- d)
- ScienceDirect (Elsevier) – One of the main sources of research in applied and environmental sciences.
- e)
- SpringerLink – Offers multidisciplinary journals with good emphasis on environmental modeling and AI.
- f)
- Google Scholar – Used to retrieve any other suitable or recently published material not included within the listed databases.
- Keyword Strategy and Boolean Operators
- a)
- Concept 1: Soil Pollution/Contamination: (soil OR "soil quality" OR "soil health") AND (pollut* OR contaminat* OR heavy-metal* OR pesticide* OR microplastic* OR chemical* OR toxic*)
- b)
- Concept 2: Source Detection/Identification: ("source detection" OR "source identification" OR "pollutant origin" OR "pollution tracking" OR "attribution" OR "fingerprinting" OR "hotspot detection")
- c)
- Concept 3: Machine Learning: ("machine learning" OR "deep learning" OR "artificial intelligence" OR "AI" OR "neural network*" OR "support vector machine*" OR "random forest*" OR "ensemble learning" OR "clustering" OR "classification" OR "regression")
- Search Strategies Used:
- a)
- Truncation (*) was utilized to capture a variation of word forms (e.g., pollut* would capture pollutant, pollution).
- b)
- Phrase searching ("\"") offered exact multi-word phrase matching like "machine learning" or "source identification."
- c)
- Wildcards were not used due to database syntax variation but truncation allowed for term coverage.
- Duplicate Management and Filtering
2.3. Selection Process
2.4. Quality Assessment
- a)
- Properly Framed Problem: Whether framed properly was the identification of soil pollution sources and extent.
- b)
- Quality and Quantity of Dataset: Appropriateness and relevance of the dataset, e.g., representativeness, completeness, and dealing with missing values.
- c)
- Data Preprocessing and Feature Engineering: Appropriateness and readability of the data preprocessing methods, i.e., normalization, feature reduction, and feature selection.
- d)
- Explanation of why a particular machine learning approach is used under problem nature and data type.
- e)
- Performance Metrics and Validation: Associated evaluation metrics (e.g., precision, recall, F1-score, accuracy), and adequate validation methods like k-fold cross-validation or external validation.
- f)
- Reproducibility of Results: Adequate methodological information, code, or data to reproduce.
- g)
- Real-World Validation: Testing model performance against field data or an external database for the purpose of establishing real-world usability.
- g)
- Evaluation Process
2.5. Data Extraction and Synthesis
2.5.1. Data Extraction
- a)
- Study ID, Author(s), Year of Publication
- b)
- Pollutant(s) Investigated: Type of soil pollutants that were investigated (e.g., heavy metals, organic pollutants, pesticides, microplastics)
- c)
- Type of Pollution Source: Categorization of the pollution sources such as industrial effluent, agricultural runoff, mining, or combined sources
- d)
- Location: Country or region in which the study was conducted
- e)
- Type(s) of Data Used: Categories of input data that were utilized for the modeling (for instance, sensor data, satellite data, lab soil data, GIS data)
- f)
- ML Algorithm(s) Used: Type of ML techniques employed (e.g., Random Forest, Support Vector Machine (SVM), Convolutional Neural Networks (CNN), k-Nearest Neighbors (k-NN))
- g)
- Problem Type: Type of ML problem (e.g., classification, regression, clustering)
- h)
- Key Features/Inputs: Most significant variables or features used for training
- i)
- Performance Measures Reported: Quantitative performance metrics such as accuracy, precision, recall, F1-score, Root Mean Square Error (RMSE), coefficient of determination (R²)
- j)
- Main Results of Significance to Source Identification: Most significant results and conclusions drawn in relation to source identification of pollution
- k)
- Strengths and Weaknesses: Strengths and weaknesses as identified by the study
- l)
- Tools/Software Used: ML modeling environments or libraries (Python, R, TensorFlow, WEKA) employed
2.5.2. Synthesis Strategy
- a)
- Machine learning algorithm types and comparative performance
- b)
- Data type differences and preprocessing strategies
- c)
- Pollutant types and respective detection challenges
- d)
- Study environmental and geographical conditions
3. Results of the Review
3.1. Overview of Selected Studies
3.1.1. Descriptive Statistics
3.1.2. Pollutants Addressed
3.1.3. Geographic Distribution and Environmental Context
3.2. Machine Learning Techniques Applied
3.3. Data Types and Sources
3.4. Performance Evaluation
3.5. Interpretability and Practical Use
4. Discussion
4.1. Trends and Advancements
4.2. Gaps in Existing Literature
4.3. Limitations in Datasets/Methodologies
4.4. Emerging Technologies
4.5. Integration with Policy and Real-time Monitoring Systems
5. Conclusions and Research Gaps
5.1. Summary of Key Findings
5.2. Explicit Research Gaps and Open Problems
5.3. Implications for Future
5.4. Proposed Direction for Research
References
- A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects | IEEE Journals & Magazine | IEEE Xplore. (n.d.). Retrieved July 16, 2025. Available online: https://ieeexplore.ieee.org/abstract/document/9893798.
- Acharya, S. Heavy Metal Contamination in Food: Sources, Impact, and Remedy. In Food Safety and Quality in the Global South; Ogwu, M.C., Izah, S.C., Ntuli, N.R., Eds.; Springer Nature: 2024; pp. 233–261. [CrossRef]
- Ahmed, S. F.; Alam Md, S.B.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Shawkat Ali, A.B.M.; Gandomi, A.H. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artificial Intelligence Review 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
- Ali, M.S.; Islam, M.K.; Das, A.A.; Duranta, D.U.S.; Haque Mst, F.; Rahman, M.H. A Novel Approach for Best Parameters Selection and Feature Engineering to Analyze and Detect Diabetes: Machine Learning Insights. BioMed Research International 2023, 2023, 8583210. [Google Scholar] [CrossRef]
- Alotaibi, E.; Nassif, N. Artificial intelligence in environmental monitoring: In-depth analysis. Discover Artificial Intelligence 2024, 4, 84. [Google Scholar] [CrossRef]
- Ambasht, A. Real-Time Data Integration and Analytics: Empowering Data-Driven Decision Making. International Journal of Computer Trends and Technology 2023, 71, 8–14. [Google Scholar] [CrossRef]
- Boateng, E. Y.; Otoo, J.; Abaye, D.A. Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review. Journal of Data Analysis and Information Processing 2020, 8, 4. [Google Scholar] [CrossRef]
- Booth, A.; Mitchell, A.S.; Mott, A.; James, S.; Cockayne, S.; Gascoyne, S.; McDaid, C. An assessment of the extent to which the contents of PROSPERO records meet the systematic review protocol reporting items in PRISMA-P. F1000Research 2020, 9, 773. [Google Scholar] [CrossRef]
- Borah, K.; Das, H.S.; Seth, S.; Mallick, K.; Rahaman, Z.; Mallik, S. A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis. Functional & Integrative Genomics 2024, 24, 139. [Google Scholar] [CrossRef] [PubMed]
- Cachada, A.; Rocha-Santos, T.; Duarte, A.C. Chapter 1 - Soil and Pollution: An Introduction to the Main Issues. In Soil Pollution; Duarte, A.C., Cachada, A., Rocha-Santos, T., Eds.; Academic Press: 2018; pp. 1–28. [CrossRef]
- Chen, H.; Jia, Q.; Zhao, X.; Li, L.; Nie, Y.; Liu, H.; Ye, J. The occurrence of microplastics in water bodies in urban agglomerations: Impacts of drainage system overflow in wet weather, catchment land-uses, and environmental management practices. Water Research 2020, 183, 116073. [Google Scholar] [CrossRef]
- Chen, Y.; Song, L.; Liu, Y.; Yang, L.; Li, D. A Review of the Artificial Neural Network Models for Water Quality Prediction. Applied Sciences 2020, 10, 17. [Google Scholar] [CrossRef]
- Cheng, Y.; Wang, X.; Xia, Y. Supervised t-Distributed Stochastic Neighbor Embedding for Data Visualization and Classification. INFORMS Journal on Computing 2021, 33, 566–585. [Google Scholar] [CrossRef]
- Cho, B.; Dayrit, T.; Gao, Y.; Wang, Z.; Hong, T.; Sim, A.; Wu, K. Effective Missing Value Imputation Methods for Building Monitoring Data. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data); 2020; pp. 2866–2875. [Google Scholar] [CrossRef]
- Choi, Y. GeoAI: Integration of Artificial Intelligence, Machine Learning, and Deep Learning with GIS. Applied Sciences 2023, 13, 6. [Google Scholar] [CrossRef]
- Chuvieco, E. Fundamentals of Satellite Remote Sensing: An Environmental Approach, 3rd ed.; CRC Press: 2020. [CrossRef]
- Delaine, F. In situ calibration of low-cost instrumentation for the measurement of ambient quantities: Evaluation methodology of the algorithms and diagnosis of drifts [Phdthesis, Institut Polytechnique de Paris]. 2020. Available online: https://theses.hal.science/tel-03086234.
- Demattê, J.A.M.; Dotto, A.C.; Bedin, L.G.; Sayão, V.M.; Souza, A.B.e. Soil analytical quality control by traditional and spectroscopy techniques: Constructing the future of a hybrid laboratory for low environmental impact. Geoderma 2019, 337, 111–121. [Google Scholar] [CrossRef]
- Dwivedi, R.; Dave, D.; Naik, H.; Singhal, S.; Omer, R.; Patel, P.; Qian, B.; Wen, Z.; Shah, T.; Morgan, G.; Ranjan, R. Explainable AI (XAI): Core Ideas, Techniques, and Solutions. ACM Comput. Surv. 2023, 55, 194:1–194:33. [Google Scholar] [CrossRef]
- Esther Darkwah Developing spatial risk maps of PFAScontamination in farmlands using soil core sampling, G. I.S. World Journal of Advanced Research and Reviews 2023, 20, 2305–2325. [CrossRef]
- Fang, S.; Hua, C.; Yang, J.; Liu, F.; Wang, L.; Wu, D.; Ren, L. Combined pollution of soil by heavy metals, microplastics, and pesticides: Mechanisms and anthropogenic drivers. Journal of Hazardous Materials 2025, 485, 136812. [Google Scholar] [CrossRef]
- Gavrilescu, M. Water, Soil, and Plants Interactions in a Threatened Environment. Water 2021, 13, 19. [Google Scholar] [CrossRef]
- Gong, Y.; Liu, G.; Xue, Y.; Li, R.; Meng, L. A survey on dataset quality in machine learning. Information and Software Technology 2023, 162, 107268. [Google Scholar] [CrossRef]
- Gulledmath, S.; Hemanth, K.S. Exploring Soil Diversity and Land Use Patterns in Arid Tropical Zones: Employing K-Means Clustering in Kolar District, Karnataka. SN Computer Science 2024, 5, 1–12. [Google Scholar] [CrossRef]
- Gupta, R.; Srivastava, D.; Sahu, M.; Tiwari, S.; Ambasta, R.K.; Kumar, P. Artificial intelligence to deep learning: Machine intelligence approach for drug discovery. Molecular Diversity 2021, 25, 1315–1360. [Google Scholar] [CrossRef] [PubMed]
- Han, K.; Wang, Y. A review of artificial neural network techniques for environmental issues prediction. Journal of Thermal Analysis and Calorimetry 2021, 145, 2191–2207. [Google Scholar] [CrossRef]
- Hasan, B.M. S.; Abdulazeez, A. M. A Review of Principal Component Analysis Algorithm for Dimensionality Reduction. Journal of Soft Computing and Data Mining 2021, 2, 1. [Google Scholar] [CrossRef]
- Hassan Al-Taai, S.H. Soil Pollution—Causes and Effects. IOP Conference Series: Earth and Environmental Science 2021, 790, 012009. [Google Scholar] [CrossRef]
- Hodson, T.O. (n.d.). Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not.
- Hu, T.; Lai, Q.; Fan, W.; Zhang, Y.; Liu, Z. Advances in Portable Heavy Metal Ion Sensors. Sensors 2023, 23, 8. [Google Scholar] [CrossRef]
- Iranzad, R.; Liu, X. A review of random forest-based feature selection methods for data science education and applications. International Journal of Data Science and Analytics 2024. [CrossRef]
- ISHOLA (2021). APPLICATION OF SATELLITE BASED REMOTE SENSING TO THE ESTIMATION AND MONITORING OF CROP HEALTH [Thesis]. Available online: http://irepo.futminna.edu.ng:8080/jspui/handle/123456789/14492.
- Jäger, S.; Allhorn, A.; Bießmann, F. A Benchmark for Data Imputation Methods. Frontiers in Big Data 2021, 4. [Google Scholar] [CrossRef]
- Jia, X.; Hu, B.; Marchant, B. P.; Zhou, L.; Shi, Z.; Zhu, Y. A methodological framework for identifying potential sources of soil heavy metal pollution based on machine learning: A case study in the Yangtze Delta, China. Environmental Pollution 2019, 250, 601–609. [Google Scholar] [CrossRef] [PubMed]
- Katya, E. Exploring Feature Engineering Strategies for Improving Predictive Models in Data Science. Research Journal of Computer Systems and Engineering 2023, 4, 2. [Google Scholar] [CrossRef]
- Khanam, Z.; Sultana, F. M.; Mushtaq, F. Environmental Pollution Control Measures and Strategies: An Overview of Recent Developments. In Geospatial Analytics for Environmental Pollution Modeling: Analysis, Control and Management; Mushtaq, F., Farooq, M., Mukherjee, A.B., Eds.; Springer Nature: Switzerland, 2023; pp. 385–414. [Google Scholar] [CrossRef]
- khatri, A.; kumar, K.; Thakur, I. S. Emerging technologies for occurrence, fate, effect and remediation of organic contaminants in soil and sludge. Systems Microbiology and Biomanufacturing 2025, 5, 35–56. [Google Scholar] [CrossRef]
- Khorshidi, N.; Parsa, M.; Lentz, D. R.; Sobhanverdi, J. Identification of heavy metal pollution sources and its associated risk assessment in an industrial town using the K-means clustering technique. Applied Geochemistry 2021, 135, 105113. [Google Scholar] [CrossRef]
- Liao, T.; Taori, R.; Raji, I. D.; Schmidt, L. (2021, August 29). Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). Available online: https://openreview.net/forum?id=mPducS1MsEK.
- Liu, G.; Zhou, X.; Li, Q.; Shi, Y.; Guo, G.; Zhao, L.; Wang, J.; Su, Y.; Zhang, C. Spatial distribution prediction of soil As in a large-scale arsenic slag contaminated site based on an integrated model and multi-source environmental data. Environmental Pollution 2020, 267, 115631. [Google Scholar] [CrossRef] [PubMed]
- Lovynska, V.; Bayat, B.; Bol, R.; Moradi, S.; Rahmati, M.; Raj, R.; Sytnyk, S.; Wiche, O.; Wu, B.; Montzka, C. Monitoring Heavy Metals and Metalloids in Soils and Vegetation by Remote Sensing: A Review. Remote Sensing 2024, 16, 17. [Google Scholar] [CrossRef]
- Lu, X.; Du, J.; Zheng, L.; Wang, G.; Li, X.; Sun, L.; Huang, X. Feature fusion improves performance and interpretability of machine learning models in identifying soil pollution of potentially contaminated sites. Ecotoxicology and Environmental Safety 2023, 259, 115052. [Google Scholar] [CrossRef]
- Mahdavifar, S.; Ghorbani, A.A. Application of deep learning to cybersecurity: A survey. Neurocomputing 2019, 347, 149–176. [Google Scholar] [CrossRef]
- Maione, C.; da Costa, N.L., Jr.; F. B.; Barbosa, R. M. (2022). A Cluster Analysis Methodology for the Categorization of Soil Samples for Forensic Sciences Based on Elemental Fingerprint. Applied Artificial Intelligence. Available online: https://www.tandfonline.com/doi/abs/10.1080/08839514.2021.2010941.
- Mallick, J.; Alqadhi, S.; Hang, H. T.; Alsubih, M. Interpreting optimised data-driven solution with explainable artificial intelligence (XAI) for water quality assessment for better decision-making in pollution management. Environmental Science and Pollution Research 2024, 31, 42948–42969. [Google Scholar] [CrossRef] [PubMed]
- Manikandan, G.; Pragadeesh, B.; Manojkumar, V.; Karthikeyan, A.L.; Manikandan, R.; Gandomi, A. H. Classification models combined with Boruta feature selection for heart disease prediction. Informatics in Medicine Unlocked 2024, 44, 101442. [Google Scholar] [CrossRef]
- Mehendale, N.; Neoge, S. (2020). Review on Lidar Technology (SSRN Scholarly Paper No. 3604309). Social Science Research Network. [CrossRef]
- Meng, Y.; Qasem, S.N.; Shokri, M.; S, S. Dimension Reduction of Machine Learning-Based Forecasting Models Employing Principal Component Analysis. Mathematics 2020, 8, 8. [Google Scholar] [CrossRef]
- Mohammad Aman Ullah Sunny. Unveiling spatial insights: Navigating the parameters of dynamic Geographic Information Systems (GIS) analysis. International Journal of Science and Research Archive 2024, 11, 1976–1985. [CrossRef]
- Movahedi, F.; Padman, R.; Antaki, J.F. Limitations of receiver operating characteristic curve on imbalanced data: Assist device mortality risk scores. The Journal of Thoracic and Cardiovascular Surgery 2023, 165, 1433–1442. [Google Scholar] [CrossRef] [PubMed]
- Odhiambo, J.M.; Mvurya, D.M.; Luvanda, D.A.; Mwakondo, D. F. (n.d.). Deep Learning Algorithm for Identifying Microplastics in Open Sewer Systems: A Systematic Review.
- Ogwu, M.C. (2025). Science and Theory of Pollution: Sources, Pathways, Effects and Pollution Credit. In M. C. Ogwu & S. Chibueze Izah (Eds.), Evaluating Environmental Processes and Technologies (pp. 117–147). Springer Nature Switzerland. [CrossRef]
- Olawade, D.B.; Wada, O.Z.; Ige, A.O.; Egbewole, B.I.; Olojo, A.; Oladapo, B. I. Artificial intelligence in environmental monitoring: Advancements, challenges, and future directions. Hygiene and Environmental Health Advances 2024, 12, 100114. [Google Scholar] [CrossRef]
- Parhizkar, T.; Rafieipour, E.; Parhizkar, A. Evaluation and improvement of energy consumption prediction models using principal component analysis based feature reduction. Journal of Cleaner Production 2021, 279, 123866. [Google Scholar] [CrossRef]
- Parisineni, S.R. A.; Pal, M. Enhancing trust and interpretability of complex machine learning models using local interpretable model agnostic shap explanations. International Journal of Data Science and Analytics 2024, 18, 457–466. [Google Scholar] [CrossRef]
- Parums, D.V. Editorial: Review Articles, Systematic Reviews, Meta-Analysis, and the Updated Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 Guidelines. Medical Science Monitor : International Medical Journal of Experimental and Clinical Research 2021, 27, e934475–1. [Google Scholar] [CrossRef]
- Patial, R.; Sobti, R.C. (2024). Exploring the Impact of Meta-Analysis in Scientific Research: A Review. Medinformatics. [CrossRef]
- Pisner, D. A.; Schnyer, D. M. (2020). Chapter 6—Support vector machine. In A. Mechelli & S. Vieira (Eds.), Machine Learning (pp. 101–121). Academic Press. [CrossRef]
- Rane, N.; Choudhary, S. P.; Rane, J. Ensemble deep learning and machine learning: Applications, opportunities, challenges, and future directions. Studies in Medical and Health Sciences 2024, 1, 2. [Google Scholar] [CrossRef]
- Rashid, A.; Schutte, B.J.; Ulery, A.; Deyholos, M.K.; Sanogo, S.; Lehnhoff, E.A.; Beck, L. Heavy Metal Contamination in Agricultural Soil: Environmental Pollutants Affecting Crop Health. Agronomy 2023, 13, 6. [Google Scholar] [CrossRef]
- RDean, J.; Ahmed, S.; Cheung, W.; Salaudeen, I.; Reynolds, M.; L Bowerbank, S.; E. Nicholson, C.; J. Perry, J. Use of remote sensing to assess vegetative stress as a proxy for soil contamination. Environmental Science: Processes & Impacts 2024, 26, 161–176. [Google Scholar] [CrossRef]
- Salman, H.A.; Kalakech, A.; Steiti, A. Random Forest Algorithm Overview. Babylonian Journal of Machine Learning 2024, 2024, 69–79. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science 2021, 2, 160. [Google Scholar] [CrossRef]
- Sharma, S.; Beslity, J.O.; Rustad, L.; Shelby, L.J.; Manos, P.T.; Khanal, P.; Reinmann, A.B.; Khanal, C. Remote Sensing and GIS in Natural Resource Management: Comparing Tools and Emphasizing the Importance of In-Situ Data. Remote Sensing 2024, 16, 22. [Google Scholar] [CrossRef]
- Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
- Siddique, I. (2024). Machine learning empowered geographic information systems: Advancing Spatial analysis and decision making (SSRN Scholarly Paper No. 4892563). Social Science Research Network. Available online: https://papers.ssrn.com/abstract=4892563.
- Singh, K.P.; Gupta, S.; Rai, P. Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmospheric Environment 2013, 80, 426–437. [Google Scholar] [CrossRef]
- Singla, M.; Ghosh, D.; Shukla, K.K. A survey of robust optimization based machine learning with special reference to support vector machines. International Journal of Machine Learning and Cybernetics 2020, 11, 1359–1385. [Google Scholar] [CrossRef]
- Sun, A.Y.; Scanlon, B.R. How can Big Data and machine learning benefit environment and water management: A survey of methods, applications, and future directions. Environmental Research Letters 2019, 14, 073001. [Google Scholar] [CrossRef]
- Talukdar, P.; Kumar, B.; Kulkarni, V.V. A review of water quality models and monitoring methods for capabilities of pollutant source identification, classification, and transport simulation. Reviews in Environmental Science and Bio/Technology 2023, 22, 653–677. [Google Scholar] [CrossRef]
- Thakur, S.; Chandra, A.; Kumar, V.; Bharti, S. (2025). Environmental Pollutants: Endocrine Disruptors/Pesticides/Reactive Dyes and Inorganic Toxic Compounds Metals, Radionuclides, and Metalloids and Their Impact on the Ecosystem. In P. Verma (Ed.), Biotechnology for Environmental Sustainability (pp. 55–100). Springer Nature. [CrossRef]
- Thiyagalingam, J.; Shankar, M.; Fox, G.; Hey, T. Scientific machine learning benchmarks. Nature Reviews Physics 2022, 4, 413–420. [Google Scholar] [CrossRef]
- Tsokov, S.; Lazarova, M.; Aleksieva-Petrova, A. A Hybrid Spatiotemporal Deep Model Based on CNN and LSTM for Air Pollution Prediction. Sustainability 2022, 14, 9. [Google Scholar] [CrossRef]
- Upton, R.; David, B.; Gafner, S.; Glasl, S. Botanical ingredient identification and quality assessment: Strengths and limitations of analytical techniques. Phytochemistry Reviews 2020, 19, 1157–1177. [Google Scholar] [CrossRef]
- Verdonck, T.; Baesens, B.; Óskarsdóttir, M.; vanden Broucke, S. Special issue on feature engineering editorial. Machine Learning 2024, 113, 3917–3928. [Google Scholar] [CrossRef]
- Wang, Q.; Li, C.; Hao, D.; Xu, Y.; Shi, X.; Liu, T.; Sun, W.; Zheng, Z.; Liu, J.; Li, W.; Liu, W.; Zheng, J.; Li, F. A novel four-dimensional prediction model of soil heavy metal pollution: Geographical explanations beyond artificial intelligence “black box. ” Journal of Hazardous Materials 2023, 458, 131900. [Google Scholar] [CrossRef]
- Wang, S.; Cao, J.; Yu, P.S. Deep Learning for Spatio-Temporal Data Mining: A Survey. IEEE Transactions on Knowledge and Data Engineering 2022, 34, 3681–3700. [Google Scholar] [CrossRef]
- Wang, W.; Wang, G.; Li, J.; Chen, J.; Gao, Z.; Fang, L.; Ren, S.; Wang, Q. Remote sensing identification and model-based prediction of harmful algal blooms in inland waters: Current insights and future perspectives. Water Research X 2025, 28, 100369. [Google Scholar] [CrossRef]
- Wani, A.K.; Rahayu, F.; Ben Amor, I.; Quadir, M.; Murianingrum, M.; Parnidi, P.; Ayub, A.; Supriyadi, S.; Sakiroh, S.; Saefudin, S.; Kumar, A.; Latifah, E. Environmental resilience through artificial intelligence: Innovations in monitoring and management. Environmental Science and Pollution Research 2024, 31, 18379–18395. [Google Scholar] [CrossRef] [PubMed]
- Weldeslassie, T.; Naz, H.; Singh, B.; Oves, M. (2018). Chemical Contaminants for Soil, Air and Aquatic Ecosystem. In M. Oves, M. Zain Khan; I. M.I. Ismail (Eds.), Modern Age Environmental Problems and their Remediation (pp. 1–22). Springer International Publishing. [CrossRef]
- Wikle, C. K.; Zammit-Mangion, A. Statistical Deep Learning for Spatial and Spatiotemporal Data. Annual Review of Statistics and Its Application 2023, 2023), 247–270. [Google Scholar] [CrossRef]
- Willmott, C. J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Xu, H.; Croot, P.; Zhang, C. Discovering hidden spatial patterns and their associations with controlling factors for potentially toxic elements in topsoil using hot spot analysis and K-means clustering analysis. Environment International 2021, 151, 106456. [Google Scholar] [CrossRef]
- Yan, R.; Liao, J.; Yang, J.; Sun, W.; Nong, M.; Li, F. Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering. Expert Systems with Applications 2021, 169, 114513. [Google Scholar] [CrossRef]
- Yao, L.; Xu, M.; Liu, Y.; Niu, R.; Wu, X.; Song, Y. Estimating of heavy metal concentration in agricultural soils from hyperspectral satellite sensor imagery: Considering the sources and migration pathways of pollutants. Ecological Indicators 2024, 158, 111416. [Google Scholar] [CrossRef]
| Pollutant Type | Examples | Typical Sources | Study Focus |
|---|---|---|---|
| Heavy Metals | Lead (Pb), Cadmium (Cd), Arsenic (As), Mercury (Hg), Chromium (Cr) | Industrial sites, mining activities, agriculture (fertilizers, pesticides) | Identification and modeling of heavy metal contamination in soil |
| Organic Pollutants | Pesticides, Polycyclic Aromatic Hydrocarbons (PAHs), Volatile Organic Compounds (VOCs) | Agricultural runoff, industrial discharge, fossil fuel residues | Assessment of ecological and health risks from organic soil contaminants |
| Mixed Pollutants | Both inorganic (e.g., heavy metals) and organic (e.g., PAHs, VOCs) | Urban, peri-urban, and agro-industrial zones | Simulation of complex, real-world soil pollution scenarios for better accuracy |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).