Hybrid Model of Singular Value Decomposition, ANFIS and Genetic Algorithm for Prediction of Sediment Transport in Sewers

Densimetric Froude (Fr) is the minimum velocity required to prevent sediment deposition in pipes. Prediction of Fr is of utmost important in numerous applications in civil engineering. In this paper through using a new hybrid method. Genetic Algorithm (GA) is used for optimum selection of membership functions of Adaptive Neuro-Fuzzy Inference System (ANFIS), and Singular Value Decomposition (SVD) method is used to compute the linear parameters of ANFIS’s results section (ANFIS-GA/SVD). Also, two different target functions known as training error (TE) and prediction error (PE) by Pareto curve, the trade-off between these functions is selected as the optimal modeling point. First, different models will be presented using the parameters affecting Fr prediction, classifying them in different groups; then the Fr parameter will be predicted for all the different models through utilizing three different sets of data and the ANFIS-GA/SVD technique. The results of the models indicate that the best Fr prediction is obtained when independent parameters such as volumetric sediment concentration (CV), ratio of median diameter of particle size to pipe diameter (d/D), ratio of median diameter of particle size to hydraulic radius (d/R) and overall friction factor of sediment (λs) use as input variables in prediction of Fr. A sensitivity analysis is also conducted for the purpose of examining the effect of each of the dimensionless parameters on Fr prediction accuracy. Comparing the results of the suggested models with the existing regression-based equations shows that ANFIS-GA/SVD (R=0.986, MAPE=4.397, RMSE=0.206, SI=0.053, ρ=0.026, BIAS=-0.025) is more accurate than the rest of the models. Keywords—ANFIS, bedload, Genetic algorithm (GA), sediment transport, sensitivity analysis, Singular Value Decomposition (SVD), machine learning


I. INTRODUCTION
Storm waters usually wash away solid matter in sewers, transporting it along paths routed according to velocity and the path slope. The sediments get deposited on the channel bed in situations when the flow is passing through the pipe channel, and the gradient (or the flow velocity in a constant gradient) is less than the proper degree for the flow to pass on without solid matter being deposited. If the deposited solids are not washed away in a specific period of time, they will be consolidated and increase the bed roughness also decreasing the cross-sectional area of the flow, which in turn leads to reeducation of the transport capacity.
One of the simplest methods for determining the minimum velocity required to prevent solid matter deposition is to use the minimum criterion of shear stress velocity. The suggested values for velocity and shear stress have been comprehensively presented by Ebtehaj et al. [1] (see Introduction). These values often over-or underestimates the minimum velocity, since it does not consider the hydraulic conditions of the flow and the channel [2]. Therefore, many researchers have presented various equations for the purpose of determining the minimum velocity through using various analytical and experimental studies and considering different dimensionless parameters [3][4][5][6][7][8][9][10][11][12][13]. Due to the complex problem of physics of sediment transport in channel pipes and lack of sufficient knowledge regarding it, the regression-based equations present good results only mostly when the conditions are similar to those of the data used to predict the model, as they perform differently when the conditions differ from the conditions under study [1].
Data-mining methods have been widely used in recent years in solving different water engineering problems, since they are very applicable for modeling the sort of problems that entail insufficiently understood physics, lending them compatible for making predictions about them [14][15][16][17][18][19]. Ab Ghani and Azamathulla [20] used five different sets of data, using gene expression programming (GEP), presenting an equation for sediment transport in smooth and rough bed channels. Guven and Kisi [21] used the linear genetic programming (LGP) and estimated the daily suspended sediments in the Tongue River in Montana, USA. Comparing the LGP with other methods indicated that this method performs better in comparison with artificial neural networks (ANN) and GEP. Azamathulla et al. [22] used ANFIS and modeled sediment transport in a sewer pipe. They demonstrated that using ANFIS leads to satisfactory results and that this method can be a suitable replacement for the existing ones. Ebtehaj and Bonakdari [23] used a hybrid ANN method with evolutionary algorithms such as multilayer perceptron imperialist competitive algorithm (MLP-ICA) and multilayer perceptron genetic algorithm (MLP-GA), examining sediment transport prediction through using the concept of self-cleaning in the sewers. Their results indicated that in comparison with using the backpropagationn algorithm, using the evolutionary algorithms increases the prediction accuracy.
Ebtehaj and Bonakdari [24] used ANFIS and studied sediment transport in a sewer. The authors advise the application of GA for optimum selection of membership functions of ANFIS. This paper persists the prediction of the sediment transport without deposition in sewers by increasing ANFIS network performance and using a wide range of data. For this purpose, the ANFIS networks, based on a hybrid of the SVD and genetic algorithm (ANFIS-GA/SVD) for the optimum selection of the Gaussian membership function of the linear parameters was used for the concluding and premise parts, respectively. The parameters influencing the sediment deposition in channel pipes were determined by categorizing them first in a dimensionless manner in different groups, i.e., "flow resistance", "movement", "transport", "transport mode", "sediment". Furthermore, various models are given to examine the effect of each of the dimensionless parameters. Following that, the Fr prediction results obtained by using the proposed model compared with the results of ANFIS and regression-based equations.

A. The data used
To examine the performance of the models presented, in this study 218 different data were used, which were collected under various hydraulic conditions and from three different sets of data, consisting those of Ab Ghani [25], Ota and Nalluri [26] and Vongvisessomjai et al.'s [12] data. Ab Ghani [25] had conducted his experiments to examine sediment deposition behavior at the limit of deposition, considering 20.5 m in length for three diameter sizes of 154, 305 and 450 mm for the smooth bed tests. In addition, the pipe in 305 mm was utilized for the rough bed test. Ota and Nalluri [26] examined the effect of granulation on sediment transport in their experiments. Their experiments were conducted on a 25 m long pipe with a diameter of 225 mm. In their experiments, the Manning's roughness coefficient was equal to 0.01, with the gradient and roughness of the pipe measuring 0.00315 and 0.24 mm, respectively. Vongvisessomjai et al. [12] also conducted a series of experiments for the purpose of presenting a number of equations to determine the minimum velocity required to prevent sediment deposition at the limit of deposition. The length and diameter of the pipes used in these experiments were smaller than those in the experiments conducted by Ab Ghani [25] and Ota and Nalluri [26]. They used 16 m long pipes of 150 and 100 mm in diameter, while the measurements were taken at two different cross-sections, six meters away from each other, 4.5 m from the beginning of the pipe and 5.5 m from the end of the pipe. The Manning's roughness coefficient is 0.0125. The basic statistical indexes of all the data sets is given in Table 1. B. ANFIS ANFIS is a method which includes a set of IF-THEN rules related to the Takagi-Sugeno-Kang (TSK) type fuzzy set, which was used for the purposes of modeling and mapping between the inputs and outputs of the model. The objective of the problem is to find the f  function in such a manner that the obtained result will almost equal the actual value obtained from the experiments for f. Therefore, at least the second degree of the mean of the difference between the predicted values and the actual values must be minimized in order to predict the output parameter y  for an input vector in the form of X=(x1, x2, x3, …, xn) in Eq. (1) as follows: Therefore, the TSK-type fuzzy set will be designed through using m different observations of the "n inputsingle output" set for the (Xi, Yi) pairs by using one of the fuzzy sets. The fuzzy rules presented in the ANFIS modeling could be generally shown as below: are the set of parameters obtained from each rule. Therefore, the entire fuzzy sets in the Xi space could be expressed by Eq. (3), as follows: Fuzzy sets are presented by using membership functions. The Gaussian shape membership functions will be used in this study. They are defined in [-αi, +β] (i=1, 2, …, n) rage. These membership functions are presented in Eq. (4), as seen below: where σj and cj are variances in antecedents and adjustable centers, respectively. The number of the parameters involved in the antecedent ANIFS models can be calculated as nr where n is the dimensions of the input axis and r is the number of fuzzy sets in each antecedent. The degree of local fuzzy IF-THEN rule can be evaluated through using the concepts of Mamdani algebraic production and by using the Eq. (5): related to the value of the l th fuzzy rule is A1 ji . Inference engine is generated through using unique fuzzification and then the accumulation of unique sections of different rules will generate a fuzzy system as shown in Eq. (6) below: If a set includes N fuzzy rules (Eq. 2), the abovementioned equation can be rewritten as Eq. (7), as follows: where D is the residual of f(X) and actual value, y, and pl(X) can be defined as in the following Eq. (8): (7) could be rewritten in a matrix form for inputoutputt (Xi, yi) as in Eq. (9) below: and every member of (n+1) is the m th member of Wi axis in accordance with IF-THEN section of fuzzy principles of TSK model. The firing strength of p is obtained through partitioning the input space in a number of fuzzy sets. The governing set is rewritten as Eq. (10) in order to minimize D, as follows: Correcting the coefficients in the conclusion section of TSK rules has led to better prediction of the data, given to minimize the D vector. Solving normal equations is sensitive to rounding especially for the singularity of these equations [27]. Therefore, a powerful numerical method named "singular value decomposition" (SVD) is used to optimize the accuracy of the existing linear coefficients in the concluding section of the ANFIS model, which deals with probable singularities, shown by Eq. (9). Fig. 1 illustrates the way the combination of GA and SVD was used in this study in order to optimize the ANFIS design for the purpose of predicting sediment transport in a channel pipe. The GA and SVD are presented in the following sub-sections.

C. Using GA in ANFIS design
The genetic algorithm is used in designing the ANFIS model in order to determine the values of the dimension of input (n) parameters and the number of fuzzy sets in each antecedent (r), known as the nr real-value parameter of {cj, σj}. These parameters are known as the strings added to the sub-strings of binary numbers, selecting the rules in a {1, nr} range as strings of decimal numbers. So the decimal stringsbinary string combination indicates the basic section of the antecedent in a fuzzy system. The fitness of ANFIS model to prediction of the sediment transport, presented in this study, is examined through the following Eq. (11): where E is the target function related to Eq. (1) and minimized through the use of the genetic algorithm that has an evolutionary process. The evolutionary process of this algorithm begins with randomly generating a primary generation in order to reach optimum solutions. Considering that different genetic operators such as "selection", "crossover" and "mutation' are used to correct the existing population up to the time the optimum solution has been obtained; in the present study the roulette wheel selection method was used, [23]. Consequently, SVD was used for the purpose of calculating optimally the linear coefficients of the TSK rules-calculation sections, related to each chromosome in the premise section of the fuzzy system display. A brief introduction of the SVD method, used for the purpose of determining the optimum coefficients, is presented in the following sub-section.

D. Using SVD in ANFIS design
Singular value decomposition (SVD) is a method for solving linear least square problems for states that may exist in the singularity problem. SVD is a matrix, Golub and Rines [28] have presented one of the most popular techniques for calculating SVD. The optimum-Wselection problem in Eq. (9) first attends to searching for matrix inverse of the diagonal (Q) matrix, which considered zero or close-to-zero values as equal to zero. Consequently, the optimum W is calculated through the following equation [29]:

E. Overview of Regression-based equations
As proposed in [26], considering sediment transport at clean pipes, the regression-based equations can generally be divided into two groups i.e., the semi-experimental equations and the dimensional analysis equations The semiexperimental equations were obtained based on the forces influencing the sediment particle in the equilibrium state and using different experimental data. One of the best semiexperimental equations is Eq. (14) [12], that of May et al. [7], estimated through using seven different sets of data (presented in Ackers et al. [30]): where CV is volumetric sediment concentration, D the pipe diameter, A cross-sectional area of the flow, d the median diameter of particle size, V the flow velocity, Vt the velocity required for incipient motion of the sediment (Eq. 15) and y the flow depth.
The second group is the equations obtained from dimensional analysis, determined through considering the hydraulic parameters which influence sediment transport. The Azamathulla et al. [22] equation is one of the most recently presented ones (Eq. 16). Another newly presented equation is that of Ebtehaj et al. [1], as presented in Eq. (17).

III. PREDICTION OF FR USING GA/SVD BASED ON ANFIS
Many studies have been devoted to the field of sediment transport in open channels, each of them concerning different parameters such as gravitational acceleration (g), median diameter of particles (d), pipe diameter (D), hydraulic radius (R), flow depth (y), specific gravity of sediment (s=ρs/ρ), volumetric sediment concentration (CV), dimensionless particle number (Dgr), cross-sectional area of the flow (A) and overall sediment friction factor (λs), having predicted the minimum velocity required in the pipe channel to prevent sediment deposition [7,12,13,20,26]. In order to examine different parameters with regard to the nature of each one, after defining different dimensionless parameters, the factors influencing sediment transport could be categorized into five different groups, namely, "transport", "transport mode", "sediment", "flow resistance", and "movement", in such a manner that the independent parameters of "transport" (CV) "sediment" (Dgr, d/D, s), "transport mode" (d/R, D 2 /A, R/D) and the flow resistance (λs) groups were used to calculate the dependent parameters of the "movement" group. According to [26], six different models are presented as below with regard to the fact that there is more than one parameter in a transport mode and sediment groups:

Model (6): Fr=f(CV, d/D, R/D, λs)
As it could be seen in the above mentioned models, the effects of the other four groups were considered simultaneously in order to predict the Fr parameter, which belongs to the movement group. Three different sets of data [12,25,26], consisting of 218 data were used to predict the densimetric Froude number (Fr). In order to predict the models, 30 % were selected randomly from among all the data to test the model and 70% of the remaining were used to model the models, which were suggested above. And so for each input, three different membership functions (presented in Fig.  2 for model 4 as example), leading to optimization of Gaussian functions, equaling 3 4 = 81 for the first data and 3 3 = 27 for the rest of the data, were used. The optimum results for the initial population, generation number, crossover probability and mutation probability parameters were 200, 300, 0.7, and 0.07, respectively, obtained through various trial and error procedures carried out during the problem solving evolutionary process. The Pareto curve was applied in this study to obtain optimum results; a sample of the curve (model 4) is used as Fig. 3. The functions defined in the suggested ANFIS-GA/SVD model include Training Error (TE) and Prediction Error (PE). The TE point is the point with the minimum training error and the maximum prediction error, while the PE point is exactly the opposite (i.e., maximum training error and minimum prediction error). Optimum modeling occurs when the model presents comparatively similar results for the test and training modes and is flexible enough to present relatively good predictions for the data that had no role in training the model, as well. Regarding Fig. 3, the trade-off point (Trd point) simultaneously satisfies both the test and train error minimization conditions. where Expi Fr presents the observed value and Modeli Fr the prediction value based on the ANFIS-GA/SVD in this study and in other existing models. Also n is the total number of the data, and also the ρ index, which is a dimensionless index dependent on SI and R indices, is defined by Gandomi and Roke [31] as shown below: Table 2 shows the performance of the six different models presented in this study, which were modeled through using two methods of ANFIS and ANFIS-GA/SVD in the test and train modes. The results of the statistical indices indicate that the models 1 and 4 perform well in predicting the Fr for the train mode as well as by both of the ANFIS and ANFIS-GA/SVD methods. However, the results of the test mode indicate that model 1 does not present good results for the data not used for training the model, while model 4 is fairly flexible in predicting the Fr in such a manner that changing the data condition from train to test mode does not lead to a significant change in the results of the indices.
The relative error value, arrived at by using the ANFIS for predicting the Fr through the independent data related to model 4, is approximately 4 % , while using the ANFIS model predicts the Fr with a relative error of almost 9 %, which is near double the value presented by the hybrid model applied in this study. The other statistical indices deliver similar results as well, in such a manner that the results given by the proposed ANFIS-GA/SVD model for the test and train modes indicate that it has more capability to predict the minimum Fr, required to prevent sediment deposition in channel pipes, than the ANFIS model. Fig. 4 shows the results obtained from a qualitative examination of the predictions made by models 1 to 6 through using ANFIS and ANFIS-GA/SVD models. In general and at first glance, it could be stated that among all the six different models, by using both methods mentioned, the models 1 and 4 predict more accurately than the rest of the models. The model 2, which is different from model 1 only in using the D 2 /A parameter instead of d/R as the independent parameter in the "transport mode", shows a greater error in comparison with model 1 in a way that ANFIS often overestimates the Fr prediction, which would lead to sediment deposition in the channel, while predictions are also underestimated with a high Fr, which would lead to the design being uneconomical. The ANFIS-GA/SVD method presents relatively better results, although this method also overestimates the Fr prediction sometimes when using the independent parameters of the model 2. The results of using the parameters related to model 3, presents almost the weakest prediction of Fr among all the models, in a way that none of the two methods present good results. Consequently, on applying the CV and λs parameters that are constant in all the models, using the Dgr (=d(s-1)g/ν 2 , dimensionless particle number) parameter from the "sediment" group and R/D parameter from the "Transport" dimensionless group present the weakest results. Therefore, it is not advisable to use model 3 for predicting the Fr in channel pipe design. Model 5 presents a relatively similar situation to model 3, with the difference that the predictions made for this model by ANFIS and ANFIS-GA/SVD methods are more accurate than that of model 3. This increase in accuracy is more visible in the predictions shown by the use of ANFIS-GA/SVD in comparison with ANFIS. In fact, this comparison indicates that using the d/D parameter is superior to using Dgr, when the other parameters are constant.
Regarding explication about Table 2 and Fig. 4, it could be concluded that by using the parameters of d/D and d/R from the sediment and transport mode groups, respectively, in addition to CV and λs parameters. Belonging to the "transport" and "flow resistance" independent groups, respectively, gives the best result comparing to the rest of the models. Also regarding selection of the superior method, the results presented in Table 2   With regard to the explanation presented about the six different models and the selection of the best model and method (i.e., model 4 and ANFIS-GA/SVD), a sensitivity analysis was conducted ANFIS-GA/SVD, for examining the effect of each of the independent parameters presented in this model (i.e., CV, d/D, d/R, λs) on the Fr, in order to prevent sediment deposition on the channel bed and also to ensure the cost-effectiveness of the design. The results of the sensitivity analysis are presented in Table 3. The point to be considered in this table is in the case of not considering each of the independent parameters from the "sediment", "transport", "transport mode", and "flow resistance" groups decreases the accuracy of the Fr prediction. In alternative cases, not using the λs parameter from the "flow resistance" group reduces the Fr prediction the least and not using the d/R parameter from the "transport mode" group reduces the Fr prediction the most, in such a way that by using Fr = f(CV, d/D, d/R), the model increases the relative error by approximately 1%, while using Fr = f(CV, d/D, λs), the model almost triples the MAPE in comparison with the state where four input parameters are considered in predicting the Fr. Also not considering CV and d/D in Fr prediction has a considerable and similar effect, increasing the value of the relative error significantly.  Table 4 compares the results obtained from the model proposed in this study (ANFIS-GA/SVD) with the results obtained from the existing regression-based models. Among the existing regression-based equations, the Ebtehaj et al. [1] equation is more accurate in comparison with the Azamathulla et al. [12] and May et al. [7] equations. In such a way that with regard to this table, the values of all the indices are better in the Ebtehaj et al.'s [1] equation in comparison with the two other equations. Also Fig. 5, which shows the error distribution for the different models, indicates that the Ebtehaj et al. [1] equation predicts almost 80 % of the data with a relative error less than 15%. In addition, the maximum error given by Ebtehaj et al. [1] equation is equal to 35%, which is 150% and 140% for the May et al. [7] and Azamathulla et al.'s [12] equations, respectively, a difference that indicates the incompetence of these models in predicting the Fr for the purpose of designing a pipe line. The mean relative error of Fr prediction by the ANFIS-GA/SVD model is approximately 2.5%, a one fifth (1/5) of that of the best existing regressionbased equations [1]. Also the presented model predicts almost 95% of the data with a relative error less than 5%, while the one by Ebtehaj et al. [1] makes only 55% of the predictions with a relative error less than 5%.  The evaluation of the sensitivity analysis to investigate the effect of dimensionless parameters using model four shows that none of independent parameters decreases the performance of the model, in such a manner that not considering the parameter related to the flow resistance group (λs) increases the mean relative error by 1% (MAPE= 5.13), besides, not using the parameter related to the transport mode group (d/R) almost triples the mean relative error (MAPE= 12.768). The comparison, with the existing regression-based equations indicates that the herein proposed model ANFIS-GA/SVD performs better than the other existing ones.