Hybridizing neural network with multi-verse, black hole, and shuffled complex evolution optimizer algorithms predicting the dissolved oxygen

The great importance of estimating dissolved oxygen (DO) dictates utilizing proper evaluative models. In this work, a multi-layer perceptron (MLP) network is trained by three capable metaheuristic algorithms, namely multi-verse optimizer (MVO), black hole algorithm (BHA), and shuffled complex evolution (SCE) for predicting the DO using the data of the Klamath River Station, Oregon, US. The records (DO, water temperature, pH, and specific conductance) belonging to the water years 2015 2018 (USGS) are used for pattern analysis. The results of this process showed that all three hybrid models could properly infer the DO behavior. However, the BHA and SCE accomplished this task by simpler configurations. Next, the generalization ability of the developed patterns is tested using the data of the 2019 water year. Referring to the calculated mean absolute errors of 1.0161, 1.1997, and 1.0122, as well as Pearson correlation coefficients of 0.8741, 0.8453, and 0.8775, the MLPs trained by the MVO and SCE perform better than the BHA. Therefore, these two hybrids (i.e., the MLP-MVO and MLP-SCE) can be satisfactorily used for future applications.


Introduction
As is known, acquiring an appropriate forecast from water quality parameters like dissolved oxygen (DO) is an important task due to their effects on aquatic health maintenance and reservoir management [1]. The constraints like the influence of various environmental factors on the DO concentration [2] have driven many scholars to replace the conventional models with sophisticated artificial intelligent techniques [3-6]. As discussed by many scholars, intelligence techniques have a high capability to undertake non-linear and complicated calculations [7][8][9][10][11][12][13][14]. A large number artificial intelligence-based practices are studied, for example, in the subjects of environmental concerns [15][16][17][18][19][20][21] [23,[45][46][47][48][49], image classification and processing [50,51], target tracking and computer vision [41,[52][53][54][55][56][57], structural health monitoring [58,59], building and structural design analysis [58,[60][61][62], structural material (e.g., steel and concrete) behaviors [8,21,61,[63][64][65][66][67], soil-pile analysis and landslide assessment [12, [67][68][69][70], seismic analysis [70][71][72], measurement techniques [41, 59,60,73], or very complex problems such as signal processing [50,52,74,75] as well as feature selection and extraction problems [23,50,74,[76][77][78]. Similar to deep learning-based applications [53,73,[79][80][81][82][83], many decision-making applications work related to engineering complicated problems as well [60,[84][85][86][87][88][89]. A neural network is known as a series of complex algorithms that recognize underlying connections in a set of data input and outputs through a process that mimics the way the human brain operates [45,46,[90][91][92][93]. In another sense, the technique of artificial neural network (ANN) is a sophisticated nonlinear processor that has attracted massive attention for sensitive engineering modeling [94]. Different notions represent this model. Most importantly, a multi-layer perceptron (MLP) [81,95] is composed of a minimum of three layers, each of which contains some neurons for handling the computations -noting that a more complicated ANN-based solution is known as deep learning [96,97] where it refers as part of a wider family of conventional training machine technique based on ANN with representation learning [79,80,82,83,98]. For instance, Chen, et al. [99], Hu, et al. [100], Wang,et al. [47], and Xia, et al. [101] employed the use of extreme machine learning techniques in the field of medical sciences. As some new advanced prediction techniques, hybrid searching algorithms are widely developed to have more accurate prediction outputs; namely, harris hawks optimization [67], enhanced grey wolf optimization [102], multiobjective large-scale optimization [40,90,103,104], fruit fly optimization [105], multiswarm whale optimizer [13], ant colony optimization [106], as well as conventional and extrme machine learning-based solutions [107][108][109][110][111]. Through applying a support vector regression (SVR), Li, et al. [112] showed the efficiency of the maximal information coefficient technique used for feature selection in the estimation of the DO concentration. The results of the optimized dataset were much more reliable (28.65% in terms of root mean square error, RMSE) than the original input configuration. Csábrági, et al. [113] showed the appropriate efficiency of three conventional notions of artificial neural networks (ANNs) by the names multilayer perceptron (MLP), radial basis function (RBF), and general regression neural network (GRNN) for this purpose. Similar efforts can be found in [114,115]. Heddam [116] introduced a new ANN-based model, namely evolving fuzzy neural network as a capable approach for the DO simulation in the river ecosystem. The suitability of fuzzy-based models has been investigated in many studies [117]. Adaptive neuro-fuzzy inference system (ANFIS) is another potent data mining technique that has been discussed in many studies [118][119][120]. More attempts regarding the employment of machine learning tools can be found in [121][122][123][124]. Ouma, et al. [125] compared the performance of a feed-forward ANN with multiple linear regression (MLR) in simulating the DO in Nyando River, Kenya. It was shown that the correlation of the ANN is considerably greater than the MLR (i.e., 0.8546 vs. 0.6199). Zhang and Wang [58] combined a recurrent neural network (RNN) with kernel principal component analysis (kPCA) to predict the hourly DO concentration. Their suggested model was found to be more accurate than traditional data mining techniques, including feedforward ANN, SVR, and GRNN by around 8, 17, and 12%. The most considerable accuracy (the coefficient of determination (R 2 ) = 0.908) was obtained for the DO in the upcoming one hour. Lu and Ma [126] combined a so-called denoising method "complete ensemble empirical mode decomposition with adaptive noise" with two popular machine learning models, namely random forest (RF) and extreme gradient boosting (XGBoost) to analyze various water quality parameters. It was shown that the RF-based ensemble is a more accurate approach for the simulation of DO, temperature, and specific conductance. They also proved the viability of the proposed approaches by comparing them with some benchmark tools. Likewise, Ahmed [127] showed the superiority of the RF over the MLR for DO modeling. He also revealed that water temperature as well as pH olay the most significant role in this process. Ay and Kişi [128] conducted a comparison among MLP, RBF, ANFIS (sub-clustering), and ANFIS (grid partitioning). Respective R 2 values of 0.98, 0.96, 0.95, and 0.86 for one station (Number: 02156500) revealed that the outcomes of the MLP are better-correlated with the observed DOs. Synthesizing conventional approaches with auxiliary techniques has led to novel hybrid tools for various hydrological parameters [129][130][131]. Ravansalar, et al. [132] showed that linking the ANN with a discrete wavelet transform results in improving the accuracy (i.e., Nash-Sutcliffe coefficient) from 0.740 to 0.998. A similar improvement was achieved for the SVR applied to estimate biochemical oxygen demand in Karun River, Western Iran. Antanasijević, et al. [133] presented a combination of Ward neural networks and local similarity index for predicting the DO in the Danube River. They stated the better performance of the proposed model compared to multisite DO evaluative approaches presented in the literature. Metaheuristic search methods, like teaching-learning based optimization [134], have provided suitable approaches for intricate problems. Ahmed and Shah [118] suggested three optimized versions of ANFIS using differential evolution, genetic algorithm (GA), and ant colony optimization for predicting water quality parameters, including electrical conductivity, sodium absorption ratio, and total hardness. In similar research, Mahmoudi, et al. [135] coupled the SVR with shuffled frog leaping algorithm (SFLA) for the same objective. Zhu, et al. [136] compared the efficiency of the fruit fly optimization algorithm (FOA) with the GA and particle swarm optimization (PSO) for optimizing a least-squares SVR for forecasting the trend of DO. Referring to the obtained mean absolute percentage errors of 0.35, 1.3, 2.03, and 1.33%, the proposed model (i.e., FOA-LSSVR) surpassed the benchmark techniques. In this work, three stochastic search techniques of multi-verse optimizer (MVO), black hole algorithm (BHA), and shuffled complex evolution (SCE) are used to optimize an MLP neural network for predicting the DO using recent data collected from the Klamath River Station. To the best of the authors' knowledge, up to now, a few metaheuristic algorithms have been used for training the ANN in the field of DO modeling (e.g., firefly algorithm [137] and PSO [138]). Therefore, the models suggested in this study are deemed as innovative hybrids for this purpose.

Data
As a matter of fact, intelligent models should first learn the pattern of the intended parameter to predict it. This learning process is carried out by analyzing the dependence of the target parameter on some independent factors. In this work, the DO is the target parameter for water temperature (WT), pH, and specific conductance (SC). This study uses the data belonging to a US Geological Survey (USGS) station, namely the Klamath River (station number: 11509370). As Figure 1 illustrates, this station is located in Klamath County, Oregon State. Pattern recognition is fulfilled by means of the data between October 01, 2014, and September 30, 2018. After training the models, they predict the DO for the subsequent year (i.e., from October 01, 2018, to September 30, 2019). Since the models do not know this data, the accuracy of this process will reflect their capability for predicting the DO in unseen conditions. Hereafter, these two groups are categorized as training data and testing data, respectively. Figure 2 depicts the DO versus WT, PH, and SC for the (a, c, and e) training and (b, d, and f) testing data. Based on the available data for the mentioned periods, the training and testing groups contain 1430 and 352 records, respectively. Moreover, the statistical description of these datasets is presented in Table 1.

Methodology
The steps of this research are shown in Figure 3. After providing the appropriate dataset, the MLP is submitted to the MVO, BHA, and SCE algorithms for adjusting its parameters through metaheuristic schemes. During an iterative process, the MLP is optimized to present the best possible prediction of the DO. The quality of the results is lastly evaluated using Pearson correlation coefficient (RP) along with mean absolute error (MAE) and RMSE. They analyze the agreement and the difference between the observed and predicted values of a target parameter. In the present work, given and as the predicted and observed DOs, the RP, MAE, and RMSE are expressed by the following equations: where K signifies the number of the compared pairs.

The MVO
As is implied by its name, the MVO is obtained from multi-verse theory in physics [139]. According to this theory, there is more than one big bang event, each of which has initiated a separate universe. The algorithm was introduced by Mirjalili, et al. [140]. The main components of the MVO are worm holes, black holes, and white holes. The concepts of black and white holes run the exploration phase, while the wormhole concept is dedicated to the exploitation procedure. In the MVO, a so-called parameter "rate of inflation (ROI)" is defined for each universe. The objects are transferred from the universes with larger ROIs to those with lower values for improving the whole cosmos' average ROI. During an iteration, the organization of the universes is carried out with respect to their ROIs, and after a roulette wheel selection (RWS), one of them is deemed as the white hole. In this relation, a set of universes can be defined as: where g symbolizes the number of objects and k stands for the number of universes. The j th objective in the i th solution is generated according to the below equation: where and denote upper and lower bounds, and the function () produces a discrete randomly distributed number.
In each repetition, there are two options for the : (i) it is selected from earlier solutions using RWS (e.g., ∈ ( 1 , 2 , ..., −1 ) and (ii) it does not change. It can be wrriten: In the above equation, stands for the i th universe, ( ) gives the corresponding normalized ROI, and 1 is a random value in [0, 1]. Equation 7 expresses the measures considered to deliver the variations of the whole universe. In this sense, the wormholes are supposed to enhance the ROI.
2 ≥ where signifies the j th best-fitted universe obtained so far and 2 , 3 , and 4 are random values in [0, 1]. Moreover, two parameters of WEP and TDR stand for the wormhole existence probability and traveling distance rate, respectively. Given Iter as the running iteration, and as the maximum number of Iters, these parameters can be calculated as follows: where q is the accuracy of exploitation, a and b are constant pre-defined values [141,142].

The BHA
Inspired by the black holes incidents in space, Hatamlou [143] proposed the BHA in 2013. Emerged after the collapse of massive stars, a black hole is distinguished by a huge gravitational power. The stars move toward this mass, and it explains the pivotal strategy of the BHA for achieving an optimum response. A randomly generated constellation of stars represents the initial population. Based on the fitness of these stars, the most powerful one is deemed as the black hole to absorb the surrounding ones. In this procedure, the positions change according to the below relationship: where rand is a random number in [0, 1], is the black hole's position, Z is the total number of stars, and Iter symbolizes the iteration number. Once the fitness of a star surpasses that of the black hole, they exchange their positions. In this regard, Equation 11 calculates the radius of the event horizon for the black hole.
where is the fitness of the i th star and gives this value for the black hole [144].

The SCE
Originally proposed by Duan, et al. [145], the SCE has been efficiently used for dealing with optimization problems with high dimensions. The SCE can be defined as a hybrid of complex shuffling and competitive evolution concepts along with the strengths of the controlled random search strategy. This algorithm (i.e., the SCE) benefits from a deterministic strategy to guide the search. Also, utilizing random elements has resulted in a flexible and robust algorithm.
In the SCE is implemented in seven steps. Assuming NC as the number of complexes and NP as the number of points existing in one complex, the sample size of the algorithm is generated as S = NC × NP. In this sense, NC ≥ 1 and NP ≥ 1 + the number of design variables. Next, the samples x1, x2, …, xs is created in the viable space (i.e., within the bounds In the fifth step, each complex is evolved by the competitive complex evolution algorithm. Later, in a process named shuffling of the complexes, all complexes are replaced in the array D. This array is then sorted based on the fitness values. Lastly, the algorithm checks for stopping criteria to terminate the process [146].

Results and discussion 4.1 Optimization and weight adjustment
As explained, the proposed hybrid models are designed in the way that MVO, BHA, and SCE algorithms are responsible for adjusting the weights and biases of the MLP. Each algorithm first suggests a stochastic response to re-build the MLP. In the next iterations, the algorithms improve this response to build a more accurate MLP. In this relation, the overall formulation of the MLP that is applied to the training data can be expressed as follows: where f(x) is the activation function used by the neurons in a layer, also, RN and IN denote the response and the input of the neuron N, respectively. The created hybrids are implemented with different population sizes (NPops) for achieving the best results. Figure 4 shows the values of the objective function obtained for the NPops of 10, 25, 50, 75, 100, 200, 300, 400, and 500. In the case of this study, the objective function is reported by the RMSE criterion. Figure 4 says that unlike the SCE, which gives more quality training with small NPops, the MVO performs better with the three largest NPops. The BHA, however, did not show any specific behavior. Overall, the MVO, BHA, and SCE with the NPops of 300, 50, and 10, respectively, could adjust the MLP parameters with the lowest error.   Figure 6 shows a comparison between the observed DOs and those predicted by the MLP-MVO, MLP-BHA, and MLP-SCE for the whole five years. At a glance, all three models could properly capture the DO behavior. It indicates that the algorithms have designated appropriate weights for each input parameter (WT, PH, and SC). The results of the training and testing datasets are presented in detail in the following. As stated previously, the quality of the testing results shows how successful a trained model can be in confronting new conditions. The data of the fifth year were considered as these conditions in this study. Figure 8 depicts the histogram of the testing errors. In these charts, µ stands for the mean error, and represents the standard error. In this phase, the RMSEs of 1.3187, 1.4647, and 1.3085, along with the MEAs of 1.0161, 1.1997, and 1.0122, implied the power of the used models for dealing with stranger data. It means that the weights (and biases) determined in the previous section have successfully mapped the relationship between the DO and WT, PH, and SC for the second phase. From the comparison point of view, unlike the training phase, the SCE-based hybrid outperformed the MLP-MVO. The MLP-BHA, however, presented the poorest prediction of the DO again.

Conclusions
This research pointed out the suitability of metaheuristic strategies for analyzing the relationship between the DO and three influential factors (WT, PH, and SC) through the principals of a multi-layer perceptron network. The used algorithms were multi-verse optimizer, black hole algorithm, and shuffled complex evolution, which has shown high applicability for optimization objectives. A finding of this study was that while the MVO needs NPop = 300 to give a proper training of the MLP, two other algorithms can do this with smaller populations (NPops of 50 and 10). According to the findings of the training phase, the MVO can achieve a more profound understanding of the mentioned relationship. The RMSE of this mode was 1.3148, which was found to be smaller than MLP-BHA (1.4426) and MLP-SCE (1.3304). But different results were observed in the testing phase. The SCE-based model came up with the largest accuracy (the RPs were 0.8741, 0.8453, and 0.8775). All in all, the authors believe that the tested models can serve as promising ways for predicting the DO.