Using enhanced sparrow search algorithm-deep extreme learning machine model to forecast endpoint phosphorus content of BOF

： An effective technology for predicting the end-point phosphorous content of basic oxygen furnace (BOF) can provide theoretical instruction to improve the quality of steel via controlling the hardness and toughness. Given the slightly inadequate prediction accuracy in the existing prediction model, a novel hybrid method was suggested to more accurately predict the end-point phosphorus content by integrating an enhanced sparrow search algorithm (ESSA) and a multi-strategy with a deep extreme learning machine (DELM) as ESSA-DELM in this study. To begin with, the input weights and hidden biases of DELM were randomly selected, resulting in that DELM inevitably had a set of non-optimal or unnecessary weights and biases. Therefore, the ESSA was used to optimize the DELM in this work. For the ESSA, the Trigonometric substitution mechanism and Cauchy mutation were introduced to avoid trapping in local optima and improve the global exploration capacity in SSA. Finally, to evaluate the prediction efficiency of ESSSA-DELM, the proposed model was tested on process data of the converter from the Baogang steel plant. The efficacy of ESSA-DELM was more superior to that of other DELM-based hybrid prediction models and conventional models. The result demonstrated that the hit rate of end-point phosphorus content within ±0.003%, ±0.002%, and ±0.001% was 91.67%, 83.33%, and 63.55%, respectively. The proposed ESSA-DELM model could possess better prediction accuracy compared with other models, which could guide field operations.

significant to accurately forecast and control the end-point phosphorus content of BOF [3].
Recently, some machine learning models have been used to predict the end-point phosphorus content of BOF. Among of them, the BP neural network (BPNN) [4] is the most widely used. Li et al. [5] developed an L-M algorithm based on a BPNN to predict the phosphorus content at the end of the converter steelmaking. He et al. [6] set up a principal component analysis (PCA)-BP model for end-point phosphorus content prediction. Zhu et al. [7] established a Prediction model of end-point phosphorus content for BOF based on monotone-constrained BPNN. The above hybrid BPNN prediction model improves the prediction accuracy to some extent compared with traditional BPNN, but it is still insufficient. The main reason is that the BPNN needs to set a large number of network training parameters, resulting in slow training speed performance and poor generalization ability.
Extreme learning machine (ELM) proposed by Huang et al. is a single-layer feedforward neural networks (SLFNs) [8,9]. ELM possesses the advantages of fast solution speed, high accuracy, and simple parameter setting compared to BPNN. And ELM has been applied in many fields, such as wind power prediction, bearing fault diagnosis, and pattern clustering [10][11][12]. Nevertheless, the ELM is randomly generated due to its input weights and thresholds, and has only one hidden layer, resulting in the robustness of the model being poor.
Based on extreme learning machine-autoencoder (ELM-AE), deep extreme learning machine (DELM), named multi-layer extreme learning machine (ML-ELM) is proposed by Kasun et al. [13]. DELM does not need to be fine-tuned [14]. And it need less training time than deep learning. Furthermore, DELM shows excellent generalization performance like deep learning. So DELM is employed to construct a prediction model of the endpoint phosphorus content of BOF in this study.
Nevertheless, the input weights and hidden biases of DELM are randomly generated. DELM inevitably acquires a set of nonoptimal or unnecessary weights and biases [14]. In addition, DELM may be trapped in an overfitting problem in all training data as well. To tackle these problems of the DELM model for the improvement of the prediction capability, this paper intends to employ a new optimization algorithm in the literature. Recently, many optimization algorithms have been developed in various applications, like power Load Forecasting [15], life prediction of lithium batteries [16], brain tumor diagnosis [17], and polymer electrolyte fuel cell (PEMFC) stack [18].
Based on sparrow predation and anti-predation behavioral traits, sparrow search algorithm (SSA) is a new intelligent optimization method constructed by XUE [19]in 2020, which is established. Compared with the existing optimization algorithms, SSA has a better optimization ability and faster efficiency. However, similar to other algorithms, there are still some shortcomings in SSA, such as loss of population diversity in the later iteration and easily being stuck in local optimum and search stagnation. To mitigate the aforementioned issues and improve the global optimization ability, various improvement mechanisms have been presented. Liu et al. [20] adopted adaptive weight parameters to balance the search and exploitation capability of the sparrow search algorithm, improved the ability of SSA to get rid of stagnation with the help of the Cauchy-Gaussian mechanism. Yuan et al. [21] introduced a center of gravity inverse learning mechanism to initialize the population, added weight factors to update the follower positions in the sparrow algorithm to enhance the global exploration capability of SSA, and finally introduced a mutation strategy in the follower positions to increase the likelihood of SSA in escaping local extremes. Therefore, to furthermore overcome the problems associated with standard SSA, an Enhanced SSA (ESSA) is put forward based on Trigonometric substitution (TS) strategy and Cauchy mutation.
Considering the above mentioned, based on deep extreme learning machine and enhanced sparrow search algorithm (ESSA-DELM), a prediction model of end-point phosphorus content of BOF was proposed in this paper. The many input weights and biases of the DELM were optimized by ESSA. For ESSA, the Trigonometric substitution strategy and Cauchy mutation were adopted to solve the inadequacy of the SSA global optimization search. Finally, some DELM-based hybrid prediction models and conventional models were applied to validate the performance of ESSA-DELM, and the result proved that the performance of ESSA-DELM was significantly superior to that of the other models. The main contributions of this study are: (a) TS and Cauchy mutation are applied to enhance the optimization capacity of SSA. (b) The proposed ESSA is used to optimize the weights and biases randomly generated by DELM. (c) The established ESSA-DELM will be used to predict the endpoint phosphorus content of BOF.
The rest of the study is organized as follows. A brief description of the ELM model, DELM model, and optimization algorithms is given in Section 2. Section 3 evaluates the Enhanced Sparrow Search Algorithm (ESSA) and validates the proposed ESSA-DELM prediction model of end-point phosphorus in terms of performance. The summary and scope for future study are shown in Section 4.

Extreme learning machine (ELM)
ELM is a type of single hidden layer feed-forward network (SLFN) [8]. As shown in Figure 1, given training dataset {( , ) 1, 2, , } ,, , where N signifies the number of samples, g and c signifies the dimension number of input vector x and output vector y respectively, the output of ELM with l hidden neurons can be described as: where H signifies the hidden layer output matrix. The output weight matrix  of the ELM model can be calculated by a generalized inverse matrix of the hidden layer.
where H + describes Moore-Penrose generalized inverse matrix.

Extreme learning machine-autoencoder algorithm (ELM-AE)
The ELM-AE [19] is a neural network that can both reproduce the input data and autoencoder, established by Kasun et al. [9], possessing the characteristics of fast computation and high accuracy rate as well as ELM. Furthermore, similar to the ELM, ELM-AE contains an input layer, a single hidden layer, and an output layer, and the major difference is that the output layer of ELM-AE and the input layer are identical [22]. features. The outputs of the hidden layer for ELM-AE can be represented as: where ,1 TT a a I b b == , the mathematical relationship between the outputs of the hidden layer and the outputs of the output layer can be expressed as: where β represents the output weight of the output layer.

Deep Extreme learning machine (DELM)
DELM introduces ELM-AE to train the parameters of all the hidden layers. Simultaneously the hidden layer activation functions of DELM can be linear or nonlinear piecewise [14]. When the number of nodes in the th k hidden layer is equivalent to the number of the 1 th k − hidden layer, it could be concluded that the activation function g(x) remains linear, else g(x) should be nonlinear piecewise. So the output of the th k hidden layer may be expressed as follows: where k H signifies the output matrix of DELM th k hidden layer (when 10 k −= , this layer denotes the input layer, and

Sparrow search algorithm (SSA)
Sparrows colony primarily consists of discoverers and followers during the foraging process. The discoverer provides the searching zone and direction for the sparrow colony because of its better fitness value, whereas the follower exploits the location of the discoverer to obtain food [19]. When a sparrow colony perceives threat and the alarm value is higher than the security value, it will engage in anti-predatory behavior. SSA can be simply abstracted into a discoverer-followers-early warning model.
where t means the current iteration; T denotes the maximum number of iterations; shows the current position information of the th i sparrow in the th d dimension; describes a random number in 0,1 ) ; Q is a random number obeying normal distribution; L signifies a 1 d  matrix with all parameters being 1; represents the alarm value and security threshold, respectively. When , it indicates that some sparrows have detected the predator around them and all sparrows must migrate to safer areas as soon as possible [19].
All sparrows are followers except all discoverers in population. Followers update their position and describe as follows: where worst X signifies the worst global position; p X denotes the optimal position occupied by the discoverer; A is a 1d  matrix with random elements of 1 or -1, and 1 () , it means that the th i followers with low fitness need to fly to other locations for food because it is in a state of hungry.
Given the presence of predator, 10% ~ 20% of sparrow colonies are responsible for scouting and warning, with the location updated as follows: where  is the step correction parameter, which obeys the standard normal distribution; best X signifies the current global best position; 1,1 K  −  describes a uniform random number; e is the smallest constant to avoid a zero denominator;

Enhanced Sparrow Search Algorithm
To overcome the convergence stagnation and being trapped Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 December 2021 into local convergence of standard SSA, this paper presents Enhanced Sparrow Search Algorithm (ESSA), incorporating Trigonometric substitution strategy (TS) and Cauchy mutation strategy. On the one hand, TS is introduced to balance the development and exploration ability of SSA, and additionally, a step search factor and a position inertia factor are introduced to further strengthen the seeking ability of algorithm. On the other hand, the sparrow individuals are perturbed by using the Cauchy mutation to enhance the global searching capability of SSA. The flowchart of the proposed ESSA is shown in Figure 3.

Trigonometric substitution strategy (TS)
The TS strategy mainly uses the sine function [23] to update the current position of discoverers, which makes the sparrow position change continuously to improve the exploitation and exploration ability of SSA, thus improving the global searching capability. The update formula of discoverers is as follow: To further balance the global exploring and exploiting, the new update of R3 with the following nonlinear adaptive changes is as follow [24]: where  and  are weight coefficients, and Eq. (11) shows that inertia weights are negatively correlated with the number of iterations. In the later stage of the search, a smaller inertia weight can be used to seek the optimal value within a narrower area, thus accelerating the convergence speed.
Considering that the population individual position update is affected by the current position during the whole search process, a nonlinear position inertia factor w is introduced to further improve the searching ability of sparrows, which shows a positive correlation with the iteration in Eq. (12). A smaller w can lessen the impact of individual position updates on the current solution position and improve global searching ability in the early stages. Besides, the greater w can take advantage of the strong dependence of current location information on individual location updates in the later iterations of the algorithm, speeding up convergence. The position update of discoveries is presented in Eq. (13):

Cauchy mutation strategy
The description of Cauchy density function as Eq. (14): where t describes a proportional parameter. Its distribution function is: The Cauchy distribution, like the normal distribution, is a continuous type of probability distribution with a small magnitude at coordinate 0. It can form a large perturbation because the bilateral show a flat and long posture and converge to 0 with a slow speed. Since the Cauchy mutation originates from the Cauchy distribution, introducing the Cauchy mutation into the sparrow individual position update will generate a large perturbation, which will expand the scope of the algorithm to obtain the best solution, and then move away from the local optimum. Eq. (16) presents the position update of followers.
where C represents a Cauchy distributes random number; () gamrnd is a gamma random number, which can further enhance the abruptness of the Cauchy mutation mainly by the jumpiness of the random number selection.

The evaluation of ESSA
To evaluate the performance of ESSA, the proposed ESSA was compared with four traditional intelligent algorithms, including butterfly optimization algorithm(BOA) [25], sine cosine algorithm(SCA) [26], whale optimization algorithm(WOA) [27], sparrow search algorithm(SSA), and five advanced intelligent algorithms, like improved grey wolf optimizer (IGWO) [28], leader slime mould algorithm (LSMA) [29], leader Harris hawks optimization (LHHO) [30], adaptive opposition slime mould algorithm (AOSMA) [31], hybrid butterfly optimization algorithm with particle swarm optimization (HPSOBOA) [32]. Table 1 signifies the parameters settings for 10 algorithms. In addition, 9 classic test functions including 4 unimodal functions F1~F4 and 5 multimodal functions F5~F9 were employed to evaluate the properties of several algorithms (Table 2) [19,33]. To ensure the fairness of the experiment, the initial population size (uniformly randomly generated) was set to 50, the dimension of the solution space to be 30, and the maximum evaluation times to be 500. And each algorithm still runs 30 times independently in each classical test function.   With the purpose of more visually illustrating the convergence of ESSA on different types of test functions, nine convergence trend graphs of SSA were presented in Figure 4. For better observation, the ordinate was the logarithm of base 10. When the curve is no longer shown with the increase of iteration number, it means that the algorithm has obtained the theoretical optimal solution 0.  To further illustrate the effectiveness of ESSA, two criteria were taken for comparison: Average value (Ave) and Standard deviation (Std). Table 3 gives the comparative results between the ESSA algorithm and other algorithms in three dimensions. The results on nine classic benchmark functions (F1-F9) in different dimensions are listed in Table 3. It could be observed that in the unimodal test function (F1-F4) and multimodal test function (F5-F6), the proposed ESSA achieves a theoretical optimum (0) except for F8 and minimum standard deviation (0) for all dimensions, suggesting that ESSA has strong ability on search accuracy and robustness compared with BOA, SCA, WOA, SSA, IGWO, LSMA, LHHO, HPSOBOA. Whereas, AOSMA is equivalent to ESSA in terms of convergence accuracy and stability on nine classic test functions. However, as shown in Figure 4, it could be undoubtedly observed that the convergence speed of ESSA is faster than that of AOSMA, showing excellent competitiveness with advanced algorithms.
From Table 3 and Figure 4, it could be deduced that the ESSA has better convergence capability and robustness compared with the other nine algorithms. The primary reason is that the TS mechanism introduces sine change to equilibrate the exploitation and exploration of the SSA, and further enhances the convergence speed of the algorithm by two nonlinear weights. In addition, the Cauchy mutation strategy helps the sparrow individuals with the current best fitness to further improve the global optimization capability of SSA. According to the above analysis, the performance of the ESSA algorithm performs best, and it was chosen for the following experiments.

Prediction model of end-point phosphorus content based on ESSA-DELM
With the intention of testing the performance of ESSA-DELM, this study adopted the converter production data sets of Baogang steel plants to conduct experiments. The reactions that occurred in the converter are very complex, and end-point phosphorus content is affected by numerous influential factors. Therefore, the 10 variables as shown in Table 4 were selected as inputs of the model by incorporating scholarly research [7,34] and SPSS data analysis.  From the scatter plot in Figure 5, the proposed ESSA-DELM is performed closer to the ideal line (y=x) between the actual and predicted values, and with fewer points outside of the error range from -0.003% to 0.003%. In addition, the prediction curves of the ESSA-DELM models follow the actual value better than the other seven models, as intuitively presented in Figure 6. And it is also observed that the ESSA-DELM model presents the minimum prediction errors as shown in Figure 7. Besides the prediction accuracy, namely the hit rate, is represented by the distribution ratio of the difference between the true value and predicted value in different error ranges. The hit rates of the eight models in error ranges [-0.001%,0.001%], [-0.002%,0.002%], and [-0.003%, 0.003%] are shown in Table 5 and Figure 8.    To further verify the performance of the ESSA-DELM model, this study also adopted mean absolute error (MAE), the root mean square error (RMSE) and the mean absolute percentage error (MAPE), determination coefficient (R 2 ), and Nash-Sutcliffe Efficiency (NSE) [35] as the evaluation criteria. The corresponding results among the eight prediction models are shown in Table 6.   describes the average of input variables. The optimal values of these indices are presented in Table 6. The closer the NSE and R are to 1, the better the model performance. Moreover, ESSA-DELM shows the best predictive performance with RMSE=0.0015366, MAE=0.001074, MAPE=7.15%, R 2 =0.61391, and NSE=0.60034 with the introduction of TS strategy and Cauchy mutation. It is obvious that ESSA-DELM outperforms BOA-DELM, SCA-DELM, and WOA-DELM on RMSE, MAE, and MAPE, and it outperforms SSA-DELM in terms of R 2 . In this paper, the proposed ESSA-DELM prediction model obtain high prediction accuracy performance among all prediction models, which can offer a good reference for industrial operation. However, the R 2 and NSE are not particularly high, because the markers determining the endpoint phosphorus content still interact with one another. In conclusion, ESSA-DELM demonstrated promising performance in phosphorus content prediction

Conclusion
This paper proposes the ESSA-DELM prediction model for the end-point phosphorus content of BOF. Nevertheless, given the numerous random input weights and biases in DELM model, the prediction precision will be degraded. Thus this research introduces an enhanced sparrow search algorithm to optimize. ESSA is obtained by introducing Trigonometric substitution and Cauchy mutation for enhancing the exploration and exploitation capacity of original SSA. Moreover, the superiority of the algorithm is verified by comparing SSA with other classic intelligent algorithms and advanced algorithms.
Simultaneously, to evaluate the prediction accuracy, the BPNN, ELM, DELM, BOA-DELM, WOA-DELM, SCA-DELM, and SSA-DELM are adopted as a comparative prediction model. The 10 parameters that influence end-point phosphorus content are chosen as inputs, and the end-point phosphorus content is chosen as output. Some performance evaluation criteria are applied for comparing the model predicted results to actual values. Finally, the experimental result indicates the hit rate of ESSA-DELM within the error range of ±0.003%, ±0.002%, and ±0.001% is 91.67%, 83.33%, and 63.55%, respectively, which has a higher hit rate than the other 7 models. In addition, the performance metrics (RMSE, MAE, MAPE, R 2 , NSE) of the ESSA model are superior to other predictive models. Obviously, the proposed ESSA-DELM model could obtain a better prediction performance and guide for controlling the end-point phosphorus content of BOF. In the future, since the values of NSE and R do not show very good performance, the converter data from different steel mills will be collected for further study.