Evaluating and Zoning Flood Susceptibility Using Curve Number (CN) Logistic and Hydrological Regression Model (Case Study of Kalateh Qanbar Drainage Basin, Nishabur)

Spatial evaluation of flood-prone areas at the drainage basins is one of the basic strategies in the field of flood risk management. The present study aims to investigate the efficiency of the CN logistic and hydrological regression model for predicting and zoning floods. In the first stage, 13 runoff parameters, hydrologic soil groups (HSGs), slope, lithology, drainage density (DD), land curvature, elevation, distance to waterways/rivers, topographic wetness index (TWI), stream power index (SPI), rainfall, land use, and NDVI were employed. In the SCS-CN model of the drainage basin, the infiltration rate (S) and runoff amount (Q) were determined. The weights of the used layers were weighted by the AHP. Also, a flood zoning map of the drainage basin with different 5, 15, 25, and 50 year return periods was drawn by applying the weights of the layers. To ensure the accuracy of the zoning map with the logistic regression model, the ROC curve, and the area below the curve were used. The results showed that for the prediction rate, the AUC is 0.81%, indicating that the model has acceptable accuracy. The most important factors affecting flood are geological index; distance to waterways/rivers; and NDVI in the logistic regression model, and slope, DD, rainfall, and land use in the SCS-CN model respectively. 30 to 46% of the drainage basin area during 5 to 50 year periods has moderate flood potential, and 28 to 34% has high potential.

of floods and the weight of the criteria has the most change in the model results and can provide a logical, objective tool for assessing flood susceptibility. Reisenbüchler et al. (2019) investigated the relationship between flood and the morphology of the Salzach River and concluded that the TELEMAC-MASCARET hydro-morphological model indicates flood inundation and can provide reliable predictions. More to be used. Mohtar et al. (2020) in a study in Kuala Lumpur, Malaysia, concluded that the 50-ARI precipitation model is highly accurate in identifying flood-prone areas by providing a comprehensive focus map, floodprone color maps. For integrated flood management, the present study seeks to predict and zone flood risk in Kalateh-ye Qanbar Drainage Basin of Khorasan Razavi province using the two CN logistic and hydrological regression models.

The study area
Covering Zebarkhan and Mian-Jolgeh counties at "53 58 58E latitude and" 53 49 35N longitude, Kalatehye Qanbar Drainage Basin with an area of 2422.2 hectares is located in the southeast of Nishabur Township, Khorasan Razavi province in northeastern Iran. Also, a part of the basin is located in the north of Torbat Heydariyeh city and Kadkan County. The study area is adjacent to the villages of Abdolabad and Homayi westerly, the villages of Ishaqabad and Janatabad northerly, and Islamabad village northeasterly ( Figure 1).

Materials and methods
To investigate the flood situation in the study area, using the digital elevation model, the basin area was determined digitally. This map is the basis for preparing a slope map that has been obtained to integrate with other layers. In this research, a DEM with a pixel size of 10 meters was used. Then lithological map was drawn using a 1:100000 geological map. Since soil properties affect the evolution and formation of runoff, it is necessary to determine the hydrological group of the soil to check the CN condition. To determine the hydrological group of soil in the SCS-CN model, it is necessary to prepare a land-use map.
Landsat 8 satellite images of 2018 have been used to prepare the land use map. Equation 1 was used to prepare the topographic index (TWI) and the waterway power index (SPI) map was prepared from equation 2. To prepare the rainfall layer from 20-year data (1999-2019) of Nishabur, Bakavol, Sanobar, Malek Abad, Bar, Farhadgerd, Dahanehshoor, Alang Asadi, Qadirabad, Ardak, Torgh Dam, Iraj Abad, Eishabad, Taghoon, Andarab Zarandeh, Beyut, Fadiyeh, Namagh, Hosseinabad Jangal, Rouhabad, and Dizbad Bala rain-gauging stations were used. Then the kriging spatial interpolation method was employed to draw a rainfall map due to having the lowest amount of RMSE and MEA errors. Equation 3 was used to prepare the vegetation density layer. The map of maximum penetration in the area is prepared using Equation 4. This layer shows the amount of rain that penetrates the ground. In the next step, precipitation zones with 5, 15, 25, and 50 year return periods were prepared from data related to daily maximum precipitation. To prepare this layer, the statistics of 29 meteorological stations in the region with a 20-year common statistical period were used. After performing the relevant tests concerning the homogeneity and adequacy of the data, the amount of precipitation for the desired return periods has been prepared using the Gamble distribution. They were then interpolated using the IDW model. Since the minimum error in the rain data was quadratic; equation 5 was used for drawing the runoff layer map.

)
Where As is the specific surface of the basin, and β is the slope of the land by degree. ) Where P represents precipitation in millimeters, S is the amount of infiltration in millimeters, and Q represents the amount of runoff in millimeters.

Logistic regression model
Logistic regression is a multivariate statistical analysis that considers several physical parameters that may affect the probability of flooding (Shirzadi et al., 2012). Thus, 70% of the flood training points are included in the model as dependent variables and the factors affecting the flood as independent variables (Homser et al., 2000). The coefficients of the logistic regression model are estimated by the maximum likelihood method, which is used for the final map in GIS. Then SPSS16 software was used to run the regression model by Roths. In this method, all factors are entered into the model without deletion. One of the advantages of the logistic regression model is that the data do not need to be normally distributed and the influencing factors can be continuous or discrete. The purpose of logistic regression is to determine an appropriate model to define the relationship between the dependent variable and the factors affecting the flood to generate coefficients for each variable (Lee et al. 2, 2007). Logistic regression to predict the presence or absence of complications as zero and one relies on the values of predictor variables. Quantitatively, the relationship between the event and its dependence on several variables is expressed as Equation 6. : is the probability of a flood. As the value of z varies from -∞ to +∞, the probability of occurrence varies from 0 to 1. Z is the linear combination function of the effective factors that represent the linear relationship and b0 is the intercept of the model.

)= b0 b1x1+ b2x2…+ bnxn
Y is the probability of flooding, bn (i = 0,1… n) is the estimated coefficients of the sample data, n is the number of independent variables and xn (i = 0,1… n) represents the independent variables. Positive coefficients indicate a positive correlation between effective factors and dependent variables, and negative coefficients indicate the opposite effect. Since the relationship between the independent variables and the probability of occurrence is nonlinear, an iterative algorithm is necessary to estimate the parameter. Then the ROC curve method and the area under its AUC are used for evaluation. If the AUC is 1, it indicates the complete accuracy of the prediction.

Determining the CN
The CN is a dimensionless parameter that is used in the SCS model to determine the initial loss parameter as well as the latency of the domain. This number is used to determine the CN of the basin and to overlap different layers such as land use and HSGs. After identifying the HSGs of the basin soil to determine the CN of the basin using the digital elevation model of land use map along with the soil map of the basin and creating its layer in Arc GIS environment and overlapping these layers, CN for soil sets and vegetation is estimated based on the previous average curve number of mode II (Rustaei et al., 2017). The hypothetical mean method is used to obtain the curve number of the sub-basins. For this purpose, soil and land use maps are overlapped and homogeneous units are identified. Then specify the number of curves in each unit and multiply by the area covered. If the sum of the multiplications is divided by the total area of the basin, the average CN is obtained.

Slope
Land slope determines the manner and amount of flow and speed of water movement. In this way, in lands with a steep slope, under the influence of gravity, water has a high velocity (Saraskanrood et al., 2015: 237).
In the study area, the average slope is 30 degrees and in some mountainous areas, the slope is above 45 degrees. At a slope above 45 °, the water velocity increases and reaches the end of the basin faster and accumulates faster, resulting in a sharper hydrographic peak flow ( Figure 2).

Drainage density
The state of the drainage network plays a major role in the occurrence of floods. The flow in waterways is faster than the surface flow; therefore, the higher the drainage density, the faster the runoff accumulation speed and the steeper the hydrograph ascent curve. By examining the relationship between drainage density and coefficient of 0.876 with the highest and coefficient of 0.103 with the lowest value has the greatest effect on flood occurrence ( Figure 3).

Vegetation density
In the study area, some central, as well as western and southwestern parts of the basin, have dense vegetation areas. As a result, the rate of water infiltration and runoff formation is reduced. In the northern and eastern parts, the vegetation area is weak and consequently, the rate of infiltration is low and increases the risk of floods ( Figure 4).

Lithology
In some cases, lithology is considered the most important factor in controlling the flooding process. Changes in lithological properties due to changes in the permeability and strength of rocks, affect the intensity and distribution of floods (Fani et al.: 2017). In the study area, the siltstone unit of susceptibility to erosion or its erodibility and sedimentation is also high so that superficial and grooved erosion forms are seen on them. On their surface, there are fine-grained pink soils with considerable thickness. This unit salinizes surface water (floods) and groundwater. The resulting soil is to some extent brackish. The old alluvial barracks unit has a high potential for erosion and sedimentation. However, because their soils are very sloping, floods do not have much evaporative and erosive energy to reach them and no lateral erosion is seen in them. The young fluvial terrace unit has a high erodibility due to the high density of new waterways in the study area ( Figure 5).

Land curvature
Slope curvature indicates the topographic shape. Positive concavity indicates the surface at which the pixels are convex, negative concavity indicates the surface at which the pixels are concave, and zero means the surface is non-sloping and flat (Lee, 2004). Flatlands have the greatest impact on floods and convex slopes have the least impact on floods. In concave slopes, flooding is moderately effective (Figure 6).

Elevation
The topographic features of an area play an essential role in controlling the mechanism of flood occurrence. As the energy of flood flow as well as its distribution in areas with different levels and geomorphological shapes will not be the same (Green et al., 2014). In the study area, the highest occurrence of floods occurred at an altitude of 1500-1000 meters (Figure 7).

Distance to rivers
Waterways are the primary bedrock of floods so that at close distances the probability of floods is high and at longer distances, the role of waterways in floods is less (Wang et al., 2015). In the study area, most floods occurred at a distance of 0-500 m ( Figure 8).

Topographic Wetness Index (TWI)
The TWI is a secondary topographic feature that shows the spatial distribution of moisture conditions (Razavi Termeh et al., 2017). In the study area, class 294/8-395/9 has the greatest impact on the occurrence of floods in the area (Figure 9).

Stream Power Index (SPI)
The SPI indicates the erosion capacity of the stream, which is directly related to the degree of slope and area of the watershed. Therefore, when the surface flow velocity increases, the SPI increases as well (Pour Taghi et al., 2014). In the study area, class 327/4-395/99 has the greatest effect on the occurrence of floods ( Figure 10).

Rainfall
Rainfall can be considered as the most important factor that is directly involved in the hydrological cycle.
Certainly not flooding if it does not rain. In the field of study, spring is the season of sudden rains, snowmelt, river floods, and severe floods in the region. Areas with more rainfall are more effective in the occurrence of floods. In the study area, more than 220 mm of rainfall has the greatest impact on the occurrence of floods ( Figure 11).

Runoff
The SCS-CN equation has been used to estimate surface runoff at 5, 15, 25, and 50 years. This CN parameter plays an important role in estimating runoff and flood. To determine the CN, the soil of the area, land use, vegetation density, and rainfall of the basin were used (Figures 13, 14, 15, and 16).

Hydrologic soil groups (HSGs)
To prepare the map of hydrological groups, the information on resource assessment, land capability, and soil science layers has been used. Finally, with the above three layers, the CN map was prepared ( Figure  17). Then, using Equation 4, the maximum penetration map in the region was prepared. This layer shows the amount of rain that penetrates the ground (Figure 18). Maps of precipitation zones with different 5, 15, 25, and 50 year return periods were prepared from data related to daily maximum precipitation. After performing the relevant tests concerning the homogeneity and adequacy of the data, the amount of precipitation for the desired return periods was prepared using the Gambling distribution Type 1. Nishabur rain-gauging station has the highest rainfall among the stations and Fadieh rain-gauging station has the lowest rainfall. After maximum precipitation was prepared for different return periods (19, 20, 21, and 22).

Development of statistical model and determination of the weight of effective parameters in flood proneness
A comparative analysis was performed between the location of past floods and environmental parameters affecting the occurrence of floods. Accordingly, the weight of each parameter was obtained. The results of logistic regression, showing the relationship between flood occurrence and effective factors in flood occurrence, are illustrated in Table 1. Data were analyzed by entering all independent and dependent variables including 21 flood positions with codes 1 and 30, and non-flood positions with code 0 into the statistical model of logistic regression and selection of Enter method. Negative weight values for logistic regression coefficients indicate that flood occurrence is negatively correlated with independent variables. For the earth curvature factor, the weight values are negative. The greatest weight and effect on the occurrence of floods is related to the factor of distance from the waterway with a value of 4.12. If the significance level is 0.05, it indicates that the effective factor has a statistically significant effect on floods. The most effective factors in terms of geological significance level are the distance to waterways/rivers and NDVI. The difference in -2log likelihood (-2LL) is considered as an effective indicator of model improvement on the null model (Table 2). A minimum value of -2LL provides the best model GOF for the data and describes the reduction values up to the final iterative step. Cox/snells and negelkerkes R 2 is used to measure the utility of the model. The higher R 2 , the better the model. Equation 9 is used to prepare the flood susceptibility map (Figure 23).  Validation of prepared maps is an essential step in developing and determining susceptible areas and determining their quality. In this study, the ROC curve and AUC were used for evaluation. If the AUC is 1, it indicates the complete accuracy of the prediction. The validation or test data sets (30% or 60%) were used for the prediction rate. The results showed that for the prediction rate, the AUC is 0.81 ( Figure 24). This method has acceptable results.

The SCS-CN model
For weighting the layers by the AHP method and according to expert opinions, we assigned a score from zero to one to each class. Finally, the layers were combined and a flood-prone zoning map was obtained (Equation 10). Table 3 shows the class and weight assigned to each of the factors . Flooding= 0.151 * slope + 0.054 * lithology + 0.181* drainage density + 0.001 * curvature + 0.044 * elevation + 0.058 * DR river + 0.069 * TWI + 0.078 * SPI + 0.199 * Rainfall+ 0.191 * Land use + 0.083 * NDVI + 0.048 * hydrologic soil groups + 0.033 * runoff + 0.083 * NDVI By performing the above equation on the layers, the flood-prone map with 5, 15, 25, and 50 year return periods of the basin was obtained (Figures 25, 26, 27, and 28). According to the maps, the highest proneness for flooding is in the northern and western parts of the basin, which is located at high altitudes. Investigating the layers used in flood zoning shows that these areas are more than 45° in terms of the land slope. Also, these land uses are mostly rainfed with poor pastures. These conditions engender suitable conditions for floods. Concerning the eastern areas, conditions such as lower slope, denser vegetation, and low drainage density put them in the floor with moderate flood-proneness. The lowest flood-proneness in the area is concentrated at low altitudes and lower slopes, which are usually characterized by uses such as irrigated agriculture. The slope of the land in these areas is 5°, which causes little runoff to occur.

Conclusion
In the present study, flood risk assessment was done using thirteen runoff parameters, namely hydrologic soil groups, slope, lithology, drainage density (DD), land curvature, elevation, distance to waterways/rivers, topographic wetness index (TWI), stream power index (SPI), rainfall, land use, and NDVI. In the SCS-CN model, the infiltration rate (S), and runoff amount (Q) of the basin were determined. The weight of the employed layers was weighted by the AHP method and by applying the weight of the layers, a flood zoning map of the basin with different 5, 15, 25, and 50 year return periods were drawn. The ROC curve and AUC were used to ensure the accuracy of the zoning map with the logistic regression model. The results showed that for the prediction rate, the AUC is 0.81%, indicating that the model has an acceptable level of accuracy. The study drainage basin in the northern and western parts is located in high and very high-risk classes. The main reasons can be attributed to high altitude, steep slope, poor vegetation, high drainage density, and convex surfaces. In areas with dense vegetation, runoff is low. As a result, flood proneness decreases; therefore, they are the areas with low to moderate flood occurrences. In these areas, CN and Q levels are low, while S is high. The results indicate that the SCS-CN model has increased the speed and accuracy in preparing the flood zoning map and the predictions are very close to the existing reality. The results of investigating the parameters affecting the occurrence of floods in the study area showed that the occurrence of floods is caused by various environmental and human factors. According to the flood hazard zoning map, with the two mentioned models, appropriate management measures can be taken to reduce damage and losses caused by floods.