Uncertainty Analysis in the Optimization Process of Groundwater 1 Exploitation Scheme Based on SVR Method — A Case Study of Hetao 2 Plain

Yongkai An, Wenxi Lu*, Xueman Yan 4 Corresponding author: Wenxi Lu, Emial: luwenxi@jlu.edu.cn 5 Key Laboratory of Groundwater Resources and Environment, Ministry of Education, Jilin 6 University, Changchun 130021, PR China 7 Abstract: This paper introduces a surrogate model to reduce the huge computational load in 8 the process of simulation-optimization and uncertainty analysis. First, the groundwater 9 numerical simulation model was established, calibrated and verified in the northeast of Hetao 10 Plain. Second, two surrogate models of simulation model were established using support vector 11 regression (SVR) method, one (surrogate model A, SMA) was used to describe the 12 corresponding relationship between the pumping rate and average groundwater table drawdown, 13 and another (surrogate model B, SMB) was used to express the corresponding relationship 14 between the hydrogeological parameter values and average groundwater table drawdown. Third, 15 an optimization model was established to search an optimal groundwater exploitation scheme 16 using the maximum total pumping rate as objective function and the limitative average 17 groundwater table drawdown as constraint condition, the SMA was invoked by the optimization 18 model for obtaining the optimal groundwater exploitation scheme. Finally, the SMB was 19 invoked in the process of uncertainty analysis for assessing the reliability of optimal 20 groundwater exploitation scheme. Results show that the relative error and root mean square 21 error between simulation model and the two surrogate models are both less than 5%, which is 22 a high approximation accuracy. The SVR surrogate model developed in this study could not 23 only considerably reduce the computational load, but also maintain high computational 24 accuracy. The optimal total pumping rate is 7947 m/d and the reliability of optimal scheme is 25 40.21%. This can thus provide an effective method for identifying an optimal groundwater 26 exploitation scheme and assessing the reliability of scheme quickly and accurately. 27


Introduction
Simulation-optimization approach can solve the groundwater optimization problem, by which the optimal decision-making scheme should be obtained by optimizing the decision-making input scheme under the given objectives and constraints [1][2][3] .Uncertainty analysis can assess the reliability of groundwater exploitation scheme due to uncertainty of hydrogeological parameter values [4][5] .
However, the simulation model is invoked repeatedly in the process of simulationoptimization and uncertainty analysis, which will produce a huge computational load.
Therefore, it is of great significance to reduce the computational load so as to identify an optimal groundwater exploitation scheme and assess the reliability of groundwater exploitation scheme quickly and accurately.
In recent years, some scholars have proposed a surrogate model of simulation model, which was used to solve optimization issue of groundwater exploitation scheme and identification of groundwater pollution sources [6][7][8] .The results show that it can not only reduce the computational load but also maintain high computational accuracy after introducing the surrogate model.The frequently used surrogate models include the BP neural network model, the RBF neural network model, the kriging model, the SVR model and so on [9][10][11][12][13] .Then the SVR model had been proved to be suitable as surrogate model because it has high computational accuracy [14][15] .Nowadays, there have been no reports on uncertainty analysis in the optimization process of groundwater exploitation scheme based on SVR model.
In this paper, a groundwater numerical simulation model was established, calibrated and verified in the northeast of Hetao Plain.Then SMA was established and verified using SVR method, which was used to describe the corresponding relationship between the pumping rate and average groundwater table drawdown.Afterwards, an optimization model was established to search an optimal groundwater exploitation scheme using the maximum total pumping rate as objective function and the limitative average groundwater table drawdown as constraint condition, the SMA was invoked by the optimization model for obtaining the optimal groundwater exploitation scheme.
Finally, the SMB was established and verified using SVR method again, which was used to express the corresponding relation between the hydrogeological parameter values and average groundwater table drawdown, then the SMB was invoked in the process of uncertainty analysis for assessing the reliability of optimal groundwater exploitation scheme.

Study Area
The study area is located in the northeastern Hetao Plain, china, where belongs to the temperate continental arid and semi-arid monsoon climate zone.The geographic coordinate lies between 108°18′ ~ 108°44′ east longitude and 41°11′~41°21′ north latitude.The average annual precipitation in this region is about 175 mm, and its temporal-spatial distribution is extremely uneven.The precipitation is mainly occurred in july, august and september, accounting for more than 60% of the whole year.The potential average annual evaporation is about 2084 mm, which is extremely intense.
There are unconfined aquifer and confined aquifer in the study area.The aimed aquifer in this study is the unconfined aquifer.The medium types of aquifer are mainly alluvial sand and gravel.The groundwater resource in the study area is very rich because of large aquifer thickness and good groundwater runoff conditions.The major groundwater recharge source is precipitation in the study area, while the groundwater is also recharged by irrigation infiltration and seasonal river infiltration in localised areas.Groundwater is discharged in the form of evaporation and runoff in the study area under natural conditions.However, in recent years, as a result of human's demand for water resource, artificial exploitation has become the major discharge in localised areas.The distribution of groundwater observation and pumping wells and groundwater flow direction are shown in Fig. 1.The observation wells were used to calibrate and verify the simulation model.The pumping wells were valid only in the process of simulation-optimization.

Methods
(1) Support vector regression Support vector machine as a machine learning method, proposed by Vapnik based on VC dimension theory and structural risk minimization of statistics, is regarded as a better algorithm to substitute artificial neural network.In general, the support vector machine can solve the problems of classification and regression, replace the traditional experience risk to the minimum structure risk, and solve the actual problems which are small-scale, nonlinear and high dimensional [16][17] owing to its strict theory and mathematics foundation.
In this study, support vector regression (SVR) method was used to establish the surrogate model of groundwater flow numerical simulation model.The basic principle of SVR is described as follow: , where yR  , is the corresponding output variables.The basic idea is that make the dataset i x mapped to high dimension space F where linear regression analysis is conducted simultaneously [18][19] .That is: Where  is the weight value vector of hyperplane, b is the bias term.
Therefore, the approximation problem of regression function   fx is equivalent to the following function: Where The loss function takes many forms, in view of the better sparsity of the linear loss function which is insensitive to  [18] , loss function was selected as follow: The empirical risk function was: The method of obtaining loss function was equivalent to solve the following optimization problem: Where C is the weighting parameter used for balancing complex term and training error term of model, i   and i  are the relaxation factor,  is the insensitive loss function.
Finally, minimized Euclidean space norm, duality principle and Lagrange multiplier method were used to seek for the minimum  [20] , then the linear regression function   fx could be described as follow: Where n is the number of support vectors, , ii   are the Lagrange multipliers.
For the support vector regression of nonlinear problem, the basic idea is making data mapped to a high dimensional feature space via a nonlinear mapping, and linear regression is conducted in this space.The specific process was realized through kernel and obtained the follow formula: Then, find out the optimal solution of above optimization problem, regression estimation function of nonlinear problem could be expressed as follow: Frequently used kernel functions mainly include: (1) Polynomial kernel function , (2) Numerical Simulation of Groundwater Flow The aimed aquifer located in the northeastern Hetao Plain is a pore aquifer.The top of the simulation area is the unconfined aquifer's upper boundary where such actives pertaining to water exchange mainly occur as precipitation infiltration, irrigation leakage, evaporation, artificial exploitation, etc.The bottom of the simulation area is middle pleistocene muddy silty clay which is low permeability and almost has no water exchange.The lateral boundary is generalized as shown in Fig. 2.

Fig.2 Lateral boundary types and parameter partitions of study area
The groundwater flow system of the simulation area can be generalized as nonhomogeneous, isotropic, and two-dimensional unsteady flow system, which can be shown as follows [22][23] :   is the boundary of Neuman condition,   ,, q x y t is the recharge and discharge quantity of aquifer per unit width ( -1 md  ), n is the direction of outward normal on the boundary, D is the area for simulation computation.
The parameters partitions of the study area is shown in Fig. 2, in which the study area is divided into 9 subareas.finite difference method [24][25][26] .
(3) Monte Carlo Monte Carlo simulation is also called stochastic statistical simulation.It is a computational method based on probability and statistical theory, that is, using the random numbers or pseudo-random numbers to solve the computational problem.The basic idea of Monte Carlo simulation is that the probability of the random event is estimated by the frequency of the occurrence of the random event, and as the solution of the problem which is a occurrence probability of random event [27][28] .The steps of Monte Carlo simulation are as follows [29][30] .
① Determine random variables by means of sensitivity analysis; ② Construct the probability distribution model of random variables; ③ Random numbers are extracted for each input random variable; ④ Convert the random numbers into the random samples; ⑤ The random samples are introduced into the simulation model; ⑥ Statistic and analysis of simulation results.

Numerical Simulation of Groundwater Flow
The calibration phase of simulation was selected in the dry season for 192 days  From Fig. 3 and Fig. 4 it can be seen that the fitting results of equipotential lines between the actual groundwater  Specific yield 0.12 0.15 0.20 0.08 0.18 0.25 0.20 0.15 0.14

Optimization Model
In the study area, five pumping wells were set and also used as observation wells in the process of simultation-optimization.The distribution of the five pumping wells are shown in Fig. 1.Then an optimization model was established to search an optimal groundwater exploitation scheme using the maximum total pumping rate as objective function and the limitative drawdown as constraint condition.The optimization model was constructed as follows: 1 where Q is the total pumping rate ( 3 -1 md ), i q is the pumping rate of the th i well ( 3 -1 md ), i s is the groundwater table drawdown of the th i well (m), n is the numbers of pumping wells.
The optimal groundwater exploitation scheme (Maximum total pumping rate) can be obtained by computing the optimization model.However, the simulation model is invoked repeatedly by optimization model, which will produce a huge computational load.Thus, a surrogate model of simulation model is introduced to reduce the huge computational load [13,31] .
In order to establish the surrogate model of simulation model, the hydrogeological parameters were considered deterministic variables, and the LHS method was used to obtain 30 and 10 groups of exploitation schemes which were introduced into the numerical simulation model of the groundwater flow to obtain groundwater table drawdown datasets respectively.The former exploitation schemes were used as training samples and the later were used as validation samples.
MATLAB (2017a) procedure was compiled according to the principle of the SVR method.Training samples were used to establish the SVR surrogate model (SMA) and validation samples were used to verify the computational accuracy of the SMA.The fitting result and error between the simulation model and the SMA are shown in Fig. 5 and Table 2, respectively.From Fig. 5 and Table 2, all validation samples are close to the 1:1 line, the relative error of each scheme is between 0.01% and 0.89%, less than 1.00%, and the mean relative error is 0.30%, which shows that the computed groundwater table drawdown of each scheme by the SMA is very close to the simulation model.The root mean square error of the 10 validation schemes is 0.55%.The results show that the SMA has good stability.The above description demonstrates that the SMA could substitute the groundwater numerical simulation model effectively.
Finally, in order to solve the optimization model, the SMA was loaded into the genetic algorithm and linked with the pumping rate.The optimal groundwater exploitation scheme through invoking the SMA is in the Table 3.
Table 3 The optimal exploitation scheme Pumping well Pumping rate ( From the Table 3 the max total pumping rate is 7947 of above quantitative and qualitative analysis show that the optimal groundwater exploitation scheme is reasonable.

Uncertainty Analysis
The major influence factors that affect the simulation-optimization results include   From Fig. 6 and Table 5, all validation samples are close to the 1:1 line, the relative error of each scheme is less than 5%, and the root mean square error of each scheme is also less than 5%.The above description demonstrates that the SMB could substitute the groundwater flow numerical simulation model effectively.
Finally, the LHS method was used to to obtain 10000 groups of hydrogeological parameters which were also introduced into the SMB to obtain groundwater table drawdown datasets.The statistical results of the 10000 groups of groundwater table drawdown datasets are shown in Table 6.This also means the risk of the optimal scheme is 59.79%.Thus in order to enhance the reliability of the optimal scheme, the penalty number should be added into the constraint conditions of the optimization model.This approach is easy to implement, and this paper does not discuss much.

Conclusions
This paper introduces a SVR surrogate model to reduce the huge computational load in the process of simulation-optimization and uncertainty analysis.The SVR surrogate model developed in this study could not only considerably reduce the computational load, but also maintain high computational accuracy.This paper provide an effective method for identifying an optimal groundwater exploitation scheme and assessing the reliability of scheme quickly and accurately.
The established numerical simulation model of groundwater flow can objectively and accurately describe the groundwater flow characteristics of the study area.The two SVR surrogate model both can substitute the groundwater flow numerical simulation model for simulation and prediction.The optimal total pumping rate is 7947 m 3 /d and the reliability of optimal scheme is 40.21%.In order to enhance the reliability of the optimal scheme, the penalty number should be added into the constraint conditions of the optimization model.

Fig. 1
Fig.1 Distribution of groundwater observation and pumping wells and groundwater flow direction of study area

bZ
is the elevation of aimed for aquifer floor (m), k is the hydraulic conductivity ( -1 md  ),  is the specific yield (dissmensionless), W is the vertical recharge, discharge strength of unconfined aquifer (-1 md  ), 1 is the boundary of Dirichlet condition, 2

from September 7 ,
2008 to March 18, 2009, taking into consideration that less source and sink are beneficial to identify hydrogeologic parameters.The verification phase was selected in the wet season for 172 days from March 19, 2008 to September 7, 2008, on account that more source and sink are beneficial to verify the effectiveness of hydrogeologic parameters.The equipotential lines of the groundwater table are shown in Fig.3 and Fig.4 at the end of the model calibration and verification stage respectively.

Fig. 3 Fig. 4
Fig.3 The actual and computed equipotential lines of groundwater table at the end of the model calibration stage

Fig. 5
Fig.5 The fitting result between the simulation model and SMA

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 24 April 2018 doi:10.20944/preprints201804.0317.v1
table and the computed groundwater table are very good.The above description means that the actual measured groundwater table values are very close to the computed groundwater table values, the direction of computed groundwater flow field is in accordance with the actual groundwater flow field, the selected hydrogeological conceptual model generalization, partial differential equations and algorithm are reasonable and feasible, and the established numerical simulation model of groundwater flow can objectively and accurately describe the groundwater flow characteristics of the study area.The research results concluded above can give a good foundation for the establishment of a surrogate model.The parameters values of study area are shown in Table 1 after simulation model calibration and verification.

Table 2 The error between the simulation model and SMA Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 24 April 2018 doi:10.20944/preprints201804.0317.v1
3 -1 md .Compared with other wells, the pumping rate of No.2 well and No.3 well are smaller.This is due to the No.2 well and No.3 well are located at the center of the pumping wells, and long-term exploitation is more likely to generate a large groundwater table drawdown.The results

Table 6 Statistical results of average groundwater table drawdown datasets
From the Table6the groundwater table drawdown in the study are mainly in the range of 1.15-1.45,especially in the range of 1.3-1.45.The reliability of the optimal scheme is 40.21%, which the average groundwater table drawdown is less than 1.3.