Dynamic Threshold Selection for the Classification of Large Water Bodies within Landsat-8 OLI Water Index Images

Surface water distribution extracted from remote sensing data has been used in water resource assessment, coastal management, and environmental change studies. Traditional manual methods for extracting water bodies cannot satisfy the requirements for mass processing of remote sensing data; therefore, accurate automated extraction of such water bodies has remained a challenge. The histogram bimodal method (HBM) is a frequently used objective tool for threshold selection in image segmentation. The threshold is determined by seeking twin peaks, and the valley values between them; however, automatically calculating the threshold is difficult because complex surfaces and image noise which lead to not perfect twin peaks (single or multiple peaks). We developed an operational automated water extraction method, the modified histogram bimodal method (MHBM). The MHBM defines the threshold range of water extraction through mass static data; therefore, it does not require the identification of twin histogram peaks. It then seeks the minimum values in the threshold range to achieve automated threshold. We calibrated the MHBM for many lakes in China using Landsat 8 Operational Land Imager (OLI) images, for which the relative error (RE) and squared correlation coefficient (R2) for threshold accuracy were found to be 2.1% and 0.96, respectively. The RE and root-mean-square error (RMSE) for the area accuracy of MHBM were 0.59% and 7.4 km2. The results show that the MHBM could easily be applied to mass time-series remote sensing data to calculate water thresholds within water index images and successfully extract the spatial distribution of large water bodies automatically.


Introduction
Surface water is a critical resource that is impacted by changes in land cover/use, climate, and the environment on a global scale. Changes in surface water can cause droughts, floods, and pollution. Monitoring surface water distribution using remote sensing is critical, and has the added advantages of low cost, high speed and frequency, wide cover, and low impact from surface conditions. Surface water distributions extracted using remote sensing data have been used in water resources assessments, coastal management, and studies of environmental change [1,2].
Water extraction using remote sensing techniques first began forty years ago [3]. Since then, numerous studies have developed indices and algorithms of water extraction. Water surfaces appear dark in near-infrared radiation (NIR) because of strong absorption by the water. In contrast, land appears white because NIR is strongly reflected by terrestrial vegetation and dry soil. Therefore, initially, single-band methods based on NIR were used for water extraction [3][4][5]. However, surface terrains can be complex and other surface features can behave in a manner similar to water, resulting in errors. Then, the normalized difference vegetation index (NDVI) was used to enhance water features by introducing red band (R) data to NIR techniques [6][7][8]. While NDVI can delineate more effectively between water and non-water objects, and especially vegetation, it only suppresses nonwater features, but does not eliminate them completely [9]. Therefore, other studies introduced the green (G) band instead red band, resulting in the formation of the normalized difference water index (NDWI) [9]. The NDWI aims to enhance water features using the high reflectance of water in the green band, and low reflectance in the NIR band. However, the NDWI cannot efficiently suppress signals from built-up areas; moreover, water features are still interspersed with noise from other features [10]. To rectify this problem, the NDWI was modified to form the modification of normalized difference water index (MNDWI) [10], which utilizes shortwave infrared radiation (SWIR) that has a lower reflectance from water than NIR.
Despite the improvements [11], all of these indices require a threshold to discriminate water from land, and subjective and static threshold values can lead to over-or underestimation of surface water area [12]. Threshold values usually vary with viewing angle, satellite attitude, illumination, atmospheric conditions, and environmental noise in the images obtained [13,14]; therefore, identifying an appropriate threshold is challenging and time consuming [2]. For this reason, automated water extraction methods have increasingly been considered. Feyisa et al. [2] introduced a multiple-band index (the automated water extraction index; AWEI) to increase the accuracy and robustness of water extraction. However, the suggested threshold value of 0.0 is only more robust than NDWI and MNDWI, it still changes in different images, which hinders automated water extraction.
Both supervised and unsupervised classification procedures have frequently been applied to identify and classify water features in remote sensing images [15]. Supervised classification includes systems such as maximum-likelihood classification [16], decision trees [17,18], artificial neural networks, and support vector machines and fuzzy clustering method (FCM) [19,20]. Unsupervised classification methods include isodata [21] and k-mean clustering [22]. However, while supervised methods require a priori knowledge about the number of classes and the spectral characteristic of each class in the retrieved images, unsupervised methods are not stable because surface features vary in different images, preventing the application of these methods over large areas [12]. Computer graphics technology has also been used to extract water distributions in some automated methods, including tasseled cap wetness, boundary tracing [23], principal components analysis [24,25], morphological segmentation [15,26], end member extraction [27], and others [28]. However, these methods have common complex weaknesses that lead to instabilities and difficulties in mass data processing.
Automated extraction of water bodies remains a challenge; however, satellite sensors are being launched increasingly currently, and there has been an explosive growth of image data. Remote sensing data is generally automatically processed; therefore, we developed an automated water extraction method (the modified histogram bimodal method; MHBM) for simple and fast water extraction. The MHBM was applied to Landsat 8 Operational Land Imager (OLI) images to evaluate its performance.

Modified histogram bimodal method (MHBM)
The histogram bimodal method (HBM) is often used for threshold selection in image segmentation of land and water bodies [29,30]. In this method, the valley values between the two peaks are used as the threshold for image segmentation ( Figure 1). However, it is difficult to automatically find two peaks using a computer because complex surfaces and image noise usually result in single or multi-peak retrieval and not perfect twin peaks, resulting in non-determinacy of the threshold. In view of these problems, we modified the HBM to automatically calculate the threshold through an automated water extraction method (the MHBM). In the MHBM, the threshold range (R) is defined from mass valley values obtained through visual interpretation, with the mean value of the valley defined as the initial threshold (Ti). Furthermore, we seek the minimum value in the threshold range and not twin peaks and valleys to avoid non-determinacy. To get the best results, we restricted the data in our histograms to those from the areas around water bodies and not from the full image. First, we extracted the rough distribution of a water body using Ti, and then the area was dilated for certain circles forming the statistical area of the histogram (Figure 1b). The dilated area was approximately two times that of the water area, allowing the formation of perfect twin peaks and a histogram of the water index (or single band reflectance) for the dilated area ( Figure 1c). A Rayleigh scattering correction was required to ensure stable threshold results. Using this approach, we were able to calculate the dynamic threshold automatically, which verified that the MHBM could be used for automated water extraction.

of 18
To achieve automated water extraction, we designed a workflow based on the MHBM ( Figure  2), which included Rayleigh scattering correction, threshold statistics, preliminary water extraction, water mask dilation, and accurate water distribution extraction.  To ensure a stable threshold, an accurate atmospheric correction of the remote sensing images was required; however, accurate atmospheric correction is challenging [31]. Reflectance values after Rayleigh scattering correction (ρrc) satisfy water extraction and are often used in water color remote sensing [32,33]. Therefore, we used ρrc to calculate water indices (i.e., NDWI, MNDWI, and AWEI; Table 1) replaced surface reflectance (ρo) though accurate atmospheric correction. The top of atmosphere reflectance (ρTOA) can be expressed as the linear sum of the contribution from Rayleigh scattering (ρr), aerosol scattering in the atmosphere (ρa), and diffuse transmittance to the sensor (ρo) using Eq. 1 [34]: where t is the diffuse transmittance of the atmosphere (considering only gas molecules). We simulated ρr using the Second Simulation of a Satellite Signal in the Solar Spectrum-Vector (6SV) to obtained reflectance after the Rayleigh scattering correction, which included aerosols scattering reflectance and surface reflectance: From correcting Rayleigh scattering and converting to reflectance, to delineation of water distribution, MHBM follows four steps: 1) Threshold statistics: Threshold statistics are necessary when calibrating the threshold range and initial threshold for data of a new sensor ( Figure 2). However, this step can be skipped when the threshold range and the initial threshold are already known for the sensor of interest. The optimal thresholds were defined as the minimum valley value occurring between the twin peaks of a given histogram, which was based on the image statistics derived from water areas with a 1.5 times dilation buffer. For adequate representativeness, threshold statistics were calculated using mass images that included many typical water body types. The minimum and maximum valley values were defined as the left and right edges of the threshold (Tl and Tr), respectively. The length between Tl and Tr was the threshold range (R), and the mean of the valley values was defined as the initial threshold (Ti) for preliminary water extraction. We obtained thresholds of NDWI, MNDWI, and AWEI for 18 water bodies within each of the 152 Landsat 8 OLI images. The 152 thresholds of each water index formed a threshold range ( Table 2). The statistical threshold results of this study could be used by other researchers for delineating other water bodies.

2) Preliminary water extraction:
We extracted water bodies using Ti to obtain a rough water distribution mask. The purpose of this step was to identify the general location and distribution of water bodies, and not to represent accurate shorelines.
3) Water area dilation: To obtain well-defined twin peaks in the histograms, we dilated the preliminary water masks by1.5 times to obtain work areas. The water index images were then masked to isolate the work areas. In the masked water index images (work areas), the number of water pixels and number of non-water pixels were equal, and the histogram peaks of water and non-water, along with the valley between the two peaks, were distinct.

4) Accurate water extraction:
The minimum values of threshold ranges in the histograms are easily processed and represent precise thresholds for accurate water extraction. The use of threshold ranges ensured stability and correctness of selected thresholds, even in non-ideal histograms. For instance, if several minima or a range of minima occurred within the threshold range, the medium position minima was selected as the precise threshold. For every water body within an image, an exclusive threshold (dynamic threshold: Td) for image segmentation was obtained using the above described method, and automated water extraction was accomplished successfully.

Accuracy assessment
We evaluated the MHBM accuracy of the automated dynamic threshold calculation using relative error (RE) and the squared correlation coefficient (R 2 ); furthermore, we also evaluated the area accuracy of the extracted area of water using the MHBM in comparison to the initial threshold and a static threshold (0.0) using RE and the root-mean-square error (RMSE): for threshold accuracy, X is the dynamic threshold of the MHBM (Td), Y is the optimal threshold, Z the interval of the threshold ranges (R) of NDWI, MNDWI, and AWEI, respectively, and N is the number of images and water bodies. The dynamic threshold, X, is derived through automatic calculation by the computer using the MHBM method. The optimal threshold, Y, is the minimum valley value between the two peaks (i.e., water and non-water peaks) of the histogram derived from statistics of preliminary water masks dilated by 1.5 times and defined by visual interpretation. For area accuracy, X is the water area using different thresholds (Td, Ti, and T0), Y is the water area using the optimal threshold, Z is same as Y, and N is the number of images and water bodies.

Application to Landsat 8 OLI data
Landsat satellites have been providing multispectral images of the Earth continuously since the early 1970s. A unique more than 40-year data record of the Earth's land surface now exists [34]. Landsat 8 is the latest satellite in this series. This unique retrospective portrait of the Earth's surface has been widely used for water management. However, automated water extraction from time-series OLI mass data is difficult, but needed. Therefore, Landsat 8 OLI images were used to test thresholds, accuracy, and the robustness of the MHBM. The study areas included several lakes, rivers, and reservoirs in different environmental conditions, including cities, plains, hills, mountains, and highlands ( Figure 3). The data covered ten provinces of China, including Beijing, Tianjin, Hebei, Henan, Jiangsu, Shanghai, Anhui, Jiangxi, Hunan, Yunnan, Qinghai, and Tibet. Water bodies included turbid rivers, clear lakes, ephemeral lakes, eutrophic lakes, clear reservoirs, and eutrophic reservoirs. The total numbers of water bodies and images were 18 and 152, respectively.

Threshold accuracy
The MHBM yielded the greatest accuracy when the dilation size was 150% (Figure 4). Most dynamic thresholds were equal to the optimal thresholds, and the fitting trends had high R 2 and slopes near 1.0. Furthermore, the RE was also low (under 3%) for all three water indices. These findings confirm that the MHBM is able to establish optimal thresholds for most situations.
The MNDWI had the lowest RE and highest R 2 . The threshold accuracy of the NDWI was slightly lower than that of the MNDWI, while that of the AWEI was slightly lower than that of the NDWI. However, even the AWEI attained high accuracy. These results show that the MHBM has good threshold accuracy when using any of the three water indices, but particularly when using MNDWI.  Table 3 presents the area accuracies of water extraction by the three different thresholds: dynamic threshold (Td) of the MHBM, static threshold 0.0 (T0) and initial threshold (Ti). For RMSE, the water extraction area accuracy by T0 was lowest, with RMSE up to 40 km 2 (Table 3). Water extraction area accuracy by Ti was higher than T0, with RMSE more than 20 km 2 ; however, the MHBM using Td had the highest area accuracy compared with Ti and T0, with a minimum RMSE of just 7 km 2 . The trend was similar for RE, with the MHBM using Td much lower than when using Ti and T0, having a mean RE value of ~0.6%. These results show that the MHBM using MNDWI and the dynamic threshold has the highest accuracy (i.e., RE of the threshold and area accuracies as low as 2.1% and 0.59%, respectively).

Comparisons with other classification methods
To verify the performance of the MHBM, we selected two automatic water extraction methods: MMFCM [20] and K-means [22]. MMFCM is the combination of a modified fuzzy clustering method (MFCM) algorithm and the modified normalized difference water index (MNDWI) [20]. K-means is a conventional classification method [22]. An additional ten images were used for independent accuracy assessment. The result of a comparison between the three water extraction methods indicated that the relative error of the MHBM was the lowest, followed by the MMFCM, and then Kmeans (Table 4). We selected several examples to analyze the performance of the MHBM, MMFCM, and K-means water extraction methods ( Figure 5). For example, Figure 5 shows three typical water bodies. Guanting Reservoir, is an area with shallow water and bottoms similar to the surrounding soil. The K-means method extracted portions of this shallow water as "no water," causing many errors and punctuated shorelines on the water map. The shorelines produced by the MHBM and MMFCM, however, were smooth, with the MHBM results being more accurate (Figure 5b and b'). All the three methods yielded accurate results for Miyun Reservoir, which possessed high quality of water ( Figure  5c and c'). At Yuqiao Reservoir, there were grass and algae present at the border of the water body. As a result, the K-means method again produced water distributions with complex spots and curves. The shoreline produced by the MMFCM has been a shift from land to water, while the MHBM produced the most accurate shoreline results (Figure 5d and d'). Compared with the two alternate classification methods, the MHBM showed greater accuracy and more stable performance throughout the delineation of different typical water body types. Moreover, the MHBM has the advantage of simplicity over many other automatic water extraction methods, because it does not require additional auxiliary data such as a DEM. The MHBM is thus easy to use and can be easily reproduced by other operators. Feng et al. [1] developed an automated method for mapping inland surface water bodies, which has been applied to the roughly 9000 Landsat scenes of the GLS 2000 data collection to produce a global, 30-m resolution inland surface water body dataset (GIW) [35] for circa-2000. This GIW dataset has been adopted by the Global Land Cover Facility (GLCF). In the present study, we modified the MHBM to adapt to Landsat 7 ETM+ images and applied the modified method to three ETM+ images (1999/08/11: path 122, row 32; 1999/07/01: path 123, row 32; and 2000/05/07: path 124, row 32). The results of the MHBM and GIW were then compared ( Figure 6). The relative error between MHBM and GIW was 5.03% which indicates that the two products have very similar shorelines (Figure 6 The total area of inland surface water bodies in the MHBM product is greater than that of the GIW (Figure 6(a'), 6(a") and (e')). Field investigation and spectral analysis indicate that these additional areas are water and water grasses mixture. Therefore a primary difference is that MHBM tends to distinguish water and water grasses mixture as water, while GIW tends to distinguish it as land. . The image acquisition dates of (a), (b), and (c) were 1999/07/01 (path 123, row 32); of (d) was 2000/05/07 (path 124, row 32); and of (e) was 1999/08/11 (path 122, row 32).

Water distribution
We extracted water bodies in all of the 18 study areas (Figure 3), which reflects the good performance of the MHBM. Despite differences in environmental conditions and water qualities, the shorelines were all accurate and clear.

Dynamic thresholds
The main indicator of the good performance of the MHBM using the dynamic threshold was that different thresholds for different water bodies were observed within single images. For example, the Guanting, Yuqiao, and Miyun Reservoirs yielded different thresholds, which reflected their differing surface environments, water qualities, atmospheric conditions, and image-forming conditions (Figure 7). The threshold values for the Guanting and Miyun Reservoirs were much lower than the initial threshold (i.e., 0.277), while for Yuqiao Reservoir, the values were similar. Moreover, the dynamic threshold was calculated automatically, removing the effects of human intervention and improving accuracy and objectivity in water extraction.

Dilation size
Dilation size determined the shapes of the histograms. In the histogram of the MNDWI, the left peak represented non-water and the right peak represented water bodies. Balance between the two peaks results in more accurate automatic calculations of thresholds. The water peaks for five dilation sizes were invariant (Figure 8(c)); however, the peaks for non-water gradually increased with increasing dilation size (50-250%.), until their heights at 100% and 150% dilation were close to the water peaks. Therefore, in complex surface environments, multiple surface features could cause nonwater peaks to be flatter than a single object, and the height of two peaks at 150% dilation may be much more in balance. We investigated the threshold accuracy at five dilation sizes ( Figure 9) and found that while RE initially decreased with increasing dilation (50-150%), it subsequently increased with increasing dilation (150-250%). In contrast, while R 2 increased as dilation increased from 50% to 150%, it decreased between 150% and 250% dilation. Therefore, 150% is the most appropriate dilation size for use with the MHBM.

Optimal water index
MNDWI had the smallest threshold range (R) and standard deviation (SD) compared with AWEI and NDWI (Table 2). Low R values provided a narrow threshold range that allowed for the precise identification of thresholds, while the small SD showed that thresholds were close to the initial threshold. Therefore, MNDWI is the optimal water index for automated water extraction using OLI images because the stable thresholds generated ensure low error in preliminary water extraction.
The distance between the water peak and non-water peak (Dw-nw) determines the possibility of distinguishing water from other objects ( Figure 10). The Dw-nw of MNDWI was the widest among the three water indices, showing that water pixels were most easily distinguished using MNDWI. Compared with NDWI and AWEI, the curve in the MNDWI histogram was flatter and the value between Tl and Tr was smallest; therefore, the accuracy of water extraction was high for any value within the threshold range. These results indicate that MNDWI is a highly accurate and stable index; however, for the sensors with absent SWIR bands, NDWI would also produce accurate results when applied to the MHBM. All water indices (including NDWI, AWEI, and even MNDWI) have inherent defects, such as classification errors caused by cloud shadows, topographic shadows, and building shadows. The MHBM implements an automatic threshold value to extract water bodies from common water indices. It does not attempt to modify the indices to avoid their inherent defects. Therefore, in this study, MNDWI fits MHBM; however, its inherent defects were still present within the results.

Initial threshold and optimal threshold
The initial threshold was derived from the statistics of finite samples. In exceptional cases, the initial threshold can be too large to extract valid water area, or so small that the land is extracted as water. Only optimal preliminary water masks can provide good twin peaks and valleys as precise as dynamic thresholds. Vector shorelines of water distribution could be used to generate preliminary water masks instead of using those extracted using the initial threshold. Furthermore, vector shorelines could provide the locations and approximate distributions of water bodies. However, the vector approach could only be used for extracting known water bodies. For extracting unknown water bodies, preliminary water masks must be extracted by the initial threshold approach. Additionally, the dynamic threshold may not be optimal at the limits of the threshold range; however, when it is limited within a specific threshold range, it can be used to extract water within an acceptable level of accuracy.

Water area
For the MHBM, drawing the histogram is an important step. The histogram requires a certain number of pixels to achieve well-defined twin peaks. We simulated the histograms of different sized areas of pixels (from 100 to 5000), based on the Gaussian function ( Figure 11). While areas of the 100, 300, and 500 pixels produced histogram curves that were hackly (uneven and jagged), areas of 1000 pixels resulted in smooth histogram curves. Furthermore, areas of 2000 and 5000 pixels produced very smooth curves and perfect twin peaks. A Landsat 8 OLI image pixel is 0.009 km 2 (30 × 30 m), and 1000 pixels is ~ 0.9 km 2 . Therefore, the MHBM is most effective on water bodies that have an area of greater than 1000 pixels, or 0.9 km 2 in Landsat 8 OLI images, and is invalid for small water bodies such as small ponds and narrow streams.

Conclusions
In this study, we developed a Modified Histogram Bimodal Method (MHBM) for automatic water extraction, focusing on the calculation dynamic threshold. Rayleigh scattering correction and water area dilation contributed to the generation of suitable twin peaks. The method of dynamic threshold calculation seeks minimum values as precise thresholds within threshold ranges instead of finding valleys between twin peaks. Through the above two modifications, MHBM could calculate the dynamic threshold automatically, eliminating the requirement of human experience and providing highly accurate automatic water extraction. MHBM requires some prior knowledge for different sensors. For instance, MNDWI is an optimal index for Landsat 8 OLI images and the most suitable dilation size is 150%; the threshold ranges of MNDWI, NDWI, and AWEI are 0.104-0.427, -0.184-0.228, and -0.295-0.083, respectively, and the initial thresholds of MNDWI, NDWI, and AWEI are 0.277, 0.02, and -0.099, respectively.
We tested the MHBM method based on MNDWI using 152 scenes of Landsat 8 OLI images in 18 water areas. For MHBM, the RE and R 2 for threshold accuracy were 2.1% and 0.962, and the RE and RMSE for area accuracy were 0.59% and 7.4 km 2 , respectively. Using dynamic thresholds in MHBM yielded the highest accuracy for both threshold and area when compared to a static threshold of 0.0, the initial threshold method, MMFCM, K-means, and GIW methods. Therefore, MHBM could easily be applied to mass time-series remote sensing data to calculate water thresholds within water index images and successfully extract the spatial distribution of large water bodies automatically.