Submitted:
10 June 2025
Posted:
11 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Business Understanding
2.1.1. Problem Identification
2.1.2. Business Objectives
- Obtain satellite images of the Arequipa region.
- Prepare the satellite images of the Arequipa region so that they can be processed by geospatial analysis software.
- Apply machine learning algorithms to identify lentic continental water bodies in the Arequipa region from multispectral satellite images.
- Evaluate and compare the effectiveness of machine learning algorithms for this task.
2.1.3. Assessment of the Current Situation and Requirements
2.2. Data Understanding
2.2.1. Data Collection
2.2.2. Data Description
2.2.3. Data Exploration
2.2.4. Data Quality Check
- Date indicating the range of interest between January 2020 and December 2022
- Minimum percentage of clouds, trying to obtain images with the lowest percentage of clouds (1%).
2.3. Data Preparation
2.3.1. Selecting the Data
2.3.2. Data Cleaning
2.3.3. Data Integration
2.4. Modeling
2.4.1. Model Construction
- Random forest [19] is a machine learning algorithm developed by Breiman and Cutler, which combines the outputs from multiple decision trees to arrive at a single result. It is widely used for regression and classification problems. In this study, we determined 50 maximum trees, 30 maximum tree depths, and 1000 maximum samples per class.
- Support Vector Machine [20] is a non-parametric classification method. This algorithm defines a hyperplane that maximizes the distance between the training samples of two classes and then classifies the remaining pixels and objects based on this hyperplane. It is less sensitive to the number of training samples and can yield higher classification accuracy, even with a relatively small number of samples, compared to other classification algorithms. Radial basis function kernel (RBF), coefficient gamma=´scale´, and C parameter up to 1 (C=1) were determined for classification.
- K-nearest neighbor [21] is based on the distance of unknown pixels and objects from training samples in a feature space. The nearest training samples determine the class of an unknown pixel with a majority vote. In this research, the 1-nearest neighbor (k=1) and Euclidean distance (p=2) were determined for classification.
2.4.2. Evaluation
2.4.3. Cross-Validation
2.4.4. Performance Metrics
- Accuracy: Provides how many predictions were correct but may not capture errors across different classes.
- Precision: Measures the proportion of all positive identifications that were positive.
- Recall: Measures the proportion of all positive identifications that were classified correctly as positives.
- Error rate: Measures the degree of prediction error of a model.
- F-1 Score: Defined as the harmonic mean of precision and recall that provides a model’s overall performance.
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Yin, Z., Wu, P., Li, X., Hao, Z., Ma, X., Fan, R., Liu, C., Ling, F.: Super-resolution water body mapping with a feature collaborative CNN model by fusing Sentinel-1 and Sentinel-2 images. Int. J. Appl. Earth Obs. Geoinf. 134, 104176 (2024). [CrossRef]
- IUCN: IUCN Global Ecosystem Typology 2.0: descriptive profiles for biomes and ecosystem functional groups. IUCN, International Union for Conservation of Nature (2020). [CrossRef]
- Huang, C., Chen, Y., Zhang, S., Wu, J.: Detecting, Extracting, and Monitoring Surface Water From Space Using Optical Sensors: A Review. Rev. Geophys. 56, 333–360 (2018). [CrossRef]
- Vorosmarty, C.J., Green, P., Salisbury, J., Lammers, R.B.: Global water resources: vulnerability from climate change and population growth. Science (80-. ). 289, 284–288 (2000).
- Ministerio de Desarrollo y Riego: Clasificación de los cuerpos de agua continentales superficiales, https://www.ana.gob.pe/publicaciones/clasificacion-de-los-cuerpos-de-agua-continentales-superficiales.
- Distefano, T., Kelly, S.: Are we in deep water? Water scarcity and its limits to economic growth. Ecol. Econ. 142, (2017). [CrossRef]
- Sigopi, M., Shoko, C., Dube, T.: Advancements in remote sensing technologies for accurate monitoring and management of surface water resources in Africa: an overview, limitations, and future directions. Geocarto Int. 39, 2347935 (2024). [CrossRef]
- Mahdavi, S., Salehi, B., Granger, J., Amani, M., Brisco, B., Huang, W.: Remote sensing for wetland classification: a comprehensive review. GIScience Remote Sens. 55, 623–658 (2018). [CrossRef]
- Nagaraj, R., Kumar, L.S.: Extraction of Surface Water Bodies using Optical Remote Sensing Images: A Review. Earth Sci. Informatics. 17, 893–956 (2024). [CrossRef]
- Pandey, V., Pandey, P.K., Lepcha, P.T., Devi, N.N.: Assessment of surface water dynamics through satellite mapping with Google Earth Engine and Sentinel-2 data in Manipur, India. J. Water Clim. Chang. 15, 1313–1332 (2024). [CrossRef]
- Zafar, Z., Zubair, M., Zha, Y., Fahd, S., Ahmad Nadeem, A.: Performance assessment of machine learning algorithms for mapping of land use/land cover using remote sensing data. Egypt. J. Remote Sens. Sp. Sci. 27, 216–226 (2024). [CrossRef]
- Acharya, T.D., Subedi, A., Lee, D.H.: Evaluation of Machine Learning Algorithms for Surface Water Extraction in a Landsat 8 Scene of Nepal. Sensors. 19, 2769 (2019). [CrossRef]
- Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.-H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008). [CrossRef]
- Google: Google Earth Engine, https://earthengine.google.com.
- Esri Inc.: ArcGIS Pro (Version 3.3.1), (2024).
- Systems, E.S.D.: Sentinel-2 MSI, https://www.earthdata.nasa.gov/data/instruments/sentinel-2-msi.
- ArcMap: ¿Qué son los datos ráster?, https://desktop.arcgis.com/es/arcmap/latest/manage-data/raster-and-images/what-is-raster-data.htm.
- Instituto Nacional de Estadística e Informática: Portal de Infraestructura de Datos Espaciales, https://ide.inei.gob.pe/.
- Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). [CrossRef]
- Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995). [CrossRef]
- Fix, E., Hodges, J.L.: Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties. Int. Stat. Rev. / Rev. Int. Stat. 57, (1989). [CrossRef]
- Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 111, 98–136 (2015). [CrossRef]
- Wang, M., Mao, D., Wang, Y., Xiao, X., Xiang, H., Feng, K., Luo, L., Jia, M., Song, K., Wang, Z.: Wetland mapping in East Asia by two-stage object-based Random Forest and hierarchical decision tree algorithms on Sentinel-1/2 images. Remote Sens. Environ. 297, (2023). [CrossRef]
- Li, A., Song, K., Chen, S., Mu, Y., Xu, Z., Zeng, Q.: Mapping African wetlands for 2020 using multiple spectral, geo-ecological features and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 193, (2022). [CrossRef]
- Prasad, P., Loveson, V.J., Kotha, M.: Probabilistic coastal wetland mapping with integration of optical, SAR and hydro-geomorphic data through stacking ensemble machine learning model. Ecol. Inform. 77, (2023). [CrossRef]
- Peyre, G., Osorio, D., François, R., Anthelme, F.: Mapping the páramo land-cover in the Northern Andes. Int. J. Remote Sens. 42, (2021). [CrossRef]
- Kirby, K., Ferguson, S., Rennie, C.D., Cousineau, J., Nistor, I.: Identification of the best method for detecting surface water in Sentinel-2 multispectral satellite imagery. Remote Sens. Appl. Soc. Environ. 36, 101367 (2024). [CrossRef]
- Qian, H., Bao, N., Meng, D., Zhou, B., Lei, H., Li, H.: Mapping and classification of Liao River Delta coastal wetland based on time series and multi-source GaoFen images using stacking ensemble model. Ecol. Inform. 80, (2024). [CrossRef]
- Mahdianpari, M., Granger, J.E., Mohammadimanesh, F., Warren, S., Puestow, T., Salehi, B., Brisco, B.: Smart solutions for smart cities: Urban wetland mapping using very-high resolution satellite imagery and airborne LiDAR data in the City of St. John’s, NL, Canada. J. Environ. Manage. 280, (2021). [CrossRef]
- Jamali, A., Mahdianpari, M., Brisco, B., Granger, J., Mohammadimanesh, F., Salehi, B.: Comparing solo versus ensemble convolutional neural networks for wetland classification using multi-spectral satellite imagery. Remote Sens. 13, (2021). [CrossRef]
- Peña, F.J., Hübinger, C., Payberah, A.H., Jaramillo, F.: DEEPAQUA: Semantic segmentation of wetland water surfaces with SAR imagery using deep neural networks without manually annotated data. Int. J. Appl. Earth Obs. Geoinf. 126, (2024). [CrossRef]
- Hosseiny, B., Mahdianpari, M., Brisco, B., Mohammadimanesh, F., Salehi, B.: WetNet: A Spatial–Temporal Ensemble Deep Learning Model for Wetland Classification Using Sentinel-1 and Sentinel-2. IEEE Trans. Geosci. Remote Sens. 60, 1–14 (2022). [CrossRef]






| Country | Work units |
|---|---|
| Venezuela | 5 |
| Colombia | 35 |
| Ecuador | 11 |
| Perú | 60 |
| Bolivia | 40 |
| Argentina | 73 |
| Chile | 49 |
| Work Unit | Country | Latitude | Longitude |
|---|---|---|---|
| U97 | PER | -14.1274610 | -72.553376 |
| -15.0312010 | -73.483256 | ||
| U98 | PER | -14.1274760 | -71.638882 |
| -15.0312300 | -72.568750 | ||
| U99 | PER | -14.1274550 | -70.758722 |
| -15.0312340 | -71.654846 | ||
| U100 | PER | -14.1274530 | -69.812202 |
| -15.0312290 | -70.742090 | ||
| U101 | PER | -14.1274950 | -68.893070 |
| -15.0312450 | -69.822948 | ||
| U102 | PER | -14.9998050 | -73.339385 |
| -15.9035050 | -74.269130 | ||
| U103 | PER | -15.0187220 | -72.425584 |
| -15.9224060 | -73.355411 | ||
| U104 | PER | -15.0197130 | -71.500456 |
| -15.9224070 | -72.434329 | ||
| U105 | PER | -15.0187510 | -70.580027 |
| -15.9224500 | -71.509844 | ||
| U106 | PER | -15.0187470 | -69.654652 |
| -15.9224360 | -70.584498 | ||
| U107 | PER | -15.9125520 | -69.317483 |
| -16.8161770 | -70.251337 | ||
| U108 | PER | -15.9125550 | -70.242837 |
| -16.8161780 | -71.176718 |
| Color | Description |
|---|---|
| Water | |
| Rocky / Desert | |
| Crop | |
| Urban | |
| Ice |
| Random Forest | Support Vector Machine | K-nearest Neighbor | ||||
|---|---|---|---|---|---|---|
| Class | UA | PA | UA | PA | UA | PA |
| Water | 0.6456205 | 0.79353375 | 0.620482 | 0.77018375 | 0.688648 | 0.8031175 |
| Rocky/Desert | 0.97863625 | 0.30275325 | 0.975294 | 0.29107275 | 0.97 | 0.26536325 |
| Crop | 0.17207925 | 0.59821425 | 0.17261925 | 0.56818175 | 0.06799775 | 0.5694445 |
| Urban | 0.014706 | 0.25 | 0.01171875 | 0.10714275 | 0.0108695 | 0.0714285 |
| Ice | 0.44293475 | 0.75 | 0.3282895 | 0.75 | 0.23449525 | 0.73958325 |
| Overall Accuracy (OA) | 0.46457125 = 46.46% | 0.43555275 = 43.56% | 0.4091195 = 40.91% | |||
| Kappa (k) | 0.31436175 = 31.44% | 0.284222 = 28.42% | 0.24825925 = 24.83% | |||
| ML Algorithm | Producer Accuracy | User Accuracy | Accuracy Mean | Precision | Recall | Error Rate (%) | F-1 Score |
|---|---|---|---|---|---|---|---|
| Random Forest | 0.79353375 | 0.6456205 | 0.719577125 | 0.64562054 | 0.79353355 | 28.04% | 0.711976 |
| Support Vector Machine | 0.77018375 | 0.605482 | 0.687832875 | 0.62048207 | 0.77018398 | 31.22% | 0.687275 |
| K-Nearest Neighbor | 0.80311758 | 0.688648 | 0.745882788 | 0.68864796 | 0.80311772 | 25.41% | 0.741491 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).