A Math Approach with Brief Cases towards Reducing Computational and Time Complexity in the Industrial Systems

The paper proposes a new principle of finding and removing elements of mathematical model, redundant in terms of parametric identification of the model. It allows reducing computational and time complexity of the applications built on the model. Especially this is important for AI based systems, systems based on IoT solutions, distributed systems etc. Besides, the complexity reduction allows increasing an accuracy of mathematical models implemented. Despite the model order reduction methods are well known, they are extremely depended however on the problem area. Thus, proposed reduction principles can be used in different areas, what is demonstrated in this paper. The proposed method for the reduction of mathematical models of dynamic systems allows also the assessment of the requirements for the parameters of the simulator elements to ensure the specified accuracy of dynamic similarity. Efficiency of the principle is shown on the ordinary differential equations and on the neural network model. The given examples demonstrate efficient normalizing properties of the reduction principle for the mathematical models in the form of neural networks.


Introduction
An identification of a mathematical model is reversed problem, and therefore incorrect. Identification errors can occur from redundant model elements. This is used to detect unnecessary elements. Especially this is important for big data processing in real time and text processing [1]. The dynamic object can be represented in the differential equations form. That is why the complexity decreasing is given the model simplifying. In most of these methods, the problem of taking into account the total error in assessing the quality indicators of the systems is not posed.
Other examples of complex system are neural networks, particularly deep neural networks, auto-encoders and predictive models. For this purpose, the dimensionality reduction is widely used.
In this paper, we propose a novel reduction principle of the mathematical model. This principle is used for a long time in a simplified form to reduce mathematical models in the form of ordinary differential equations. The novelty of the proposed principle is in simpler identifying and removing of redundant parameters (elements) within the model. This allows increasing the accuracy of identification of mathematical model, as shown for models in the forms of neural network and differential equations.
The paper consists of following sections. Section 2 Related works presents methods of model complexity reduction. In section 3 the reduction method is given. Section 4 presents the reduction method implementation for different domains. The last section concludes this paper containing the probable decision of appraisal technique

Related works
Analysis of methods for simplifying mathematical models of dynamic systems shows the use of two main approaches: construction of a simplified model based on the criterion of proximity of quality indicators of the original and simplified models in the image space and in the state space [2].
The paper [3] presents uniform manifold approximation and projection (UMAP) reduction based on Riemannian geometry and algebraic topology. The main disadvantage of this method is time complexity. In [4] there is proposed the locality preserving projections algorithm. However, this algorithm can be used only for linear dimensionality reduction.
The idea of reducing redundant elements in machine learning algorithms is not new at all. For, example, PCA (Principal Component Analysis) and k-means algorithms are used for dimensionality reduction in [5]. This classical approach allows also finding outliers. The disadvantage of these algorithms is ability to process only discrete data.
Zuo Z. [6] proposed multi-agent approach and identified using Lyapunov-Krasovskii functionals in the time domain. As it is shown in the experimental results, the quality of the method depends on the domain.
The model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders is used in [7]. The proposed method is simplified the model using the projection on nonlinear manifolds. The similar results are obtained in [8] too.
The reduction is widely used in neural networks too. The basic idea is to find a subset of useless weights on the network and set them to zero. Without exhaustive search, it is hard to tell which weights are involved in the prediction. That is why such type of reduction depends on the model's complexity. For reduction of unnecessary elements, Arnold -Kolmogorov -Hecht -Nielsen theorem is used [9,13]. In this sense, the reduction of connections can be compared with the method of shutting down random neurons (dropout) during network training. In addition, if there are many zeros in the network, it takes up less space in the archive and can be read faster on some architectures.
Group LASSO regularization is most often used to keep useless weights in networks close to zero. Unlike regular regularization, we do not regularize layer weights or activations directly [10]. However, there are difficulties with the application of channel reduction to "branched" architectures and residual-networks (ResNet). After trimming the extra neurons during the merging of the branches, the dimensions may not match.
Using the variation optimization [11], we can approximate the discrete distribution of zeros and ones in the masking layer with a continuous one and optimize the parameters of the latter using the usual backpropagation algorithm. The main disadvantage is the dependence of empirical hyperparameters.
It is much more interesting to throw out not individual weights, but neurons from fully connected layers or channels from convolutions as a whole. In this case, the effect of compressing the network and speeding up predictions is much more pronounced. For this purpose, the other structure of neural network can be used.
The papers [12,13] present the Neural-Like Structures based on Geometric Data Transformations. The main advantages of the proposed method are the following: not iterative training process, the high performance in training process, which creates conditions for solution of large-dimension tasks. This approach allows the time complexity reduction, but the number of model's parameters is the same.
The paper [14] proposed GMDH-neuro-fuzzy system with small number of hyperparameters but with huge time complexity.
The pruning technic can be used for the neural networks reducing too. The variational Bayesian scheme for pruning convolutional neural networks in channel level is proposed in [15]. The channel pruning without desire of re-training stage allows the computation complexity reducing.
Thus, the biggest part of the reduction methods depends on the domain. That is why the proposed method takes into account the error rate and can be used for different models.
The reduction principle can be used for software reliability modeling too. Accurate software valuation models can greatly help software project managers: project managers will be able to make informed decisions on resource management, administration and planning project, and as a result will be able to complete the project on time and within the planned budget, which is a problem today. The essence of the proposed method is to find a functional subset with less variability results and higher accuracy than for the initial functional set of the model [16]. In this case the functional set includes the parameters of the model that allow it to be calibrated.

An Exploration of a Reduction Principle Implemented
Suppose that for a simulated object there is an exact mathematical model with known parameters (p1,…, pn), which are the numerical values of the model elements. The model parameters are calculated using an identification procedure based on some data.
The mathematical model and the identification procedure must satisfy the following two conditions.
1. If the parameter is 0, it means the absence of a model element. 2. The model parameters continuously depend on the identification data within some environment of this data.
The parameter (p1,…, pn) is now will extended with additional parameters (p1,…, pn, pn+1,…, pm). In this case, for redundant parameters (pn+1,…, pm) the identification procedure will calculate the values close to zero (condition 1, within the accuracy of calculations): However, this property cannot detect unnecessary parameters, because some of necessary parameters may be close to zero.
Then we introduce perturbations, named disturb, to the data of identification, but within the environment of continuously (given as condition 2). The identification procedure will calculate the parameters (p1',…, pn', pn+1',…, pm') different from the parameters (p1,…, pn, pn+1,…, pm). For each parameter, we compute the modules of relative deviations as following: The absolute deviations (p'i-pi) tend to zero while perturbations tend to zero due to continuous dependence between parameters and perturbations. The same situation is for relative deviations. It given as following: On the contrary, for the unnecessary parameters the values of the relative deviation (2) are close to one due to (1): Criteria (3) and (4) are received for the precision's model. It is possible to extend these criteria to the arbitrary mathematical models.
In general, unnecessary parameters can be found by much larger relative deviations (2) compared to the necessary parameters. The consistent elimination of the unnecessary elements of the mathematical model improves the accuracy and stability of the identification problem. Numerous examples confirm this conclusion [17,18], including the examples shown in this paper.

The reduction algorithm
The following algorithm represents the main stages for the parameters reduction principle. (Algorithm 1). Algorithm 1. The parameters reduction Input data: list of model's parameters, the structure of the model Output data: reduced list of model's parameters 1. The model identification with parameter pi.
2. Identification of a weakly perturbed model with parameter pi.
3. Calculation of modules of relative deviations of parameters by the formula (2). 4. If there are no relative deviations that are significantly larger than the mean, then the end of the reduction. 5. Element reduction with the biggest δi Go to step 1.
Using the reduction principle, we can effectively extend the structure of the mathematical model by checking each novel element and by removing unnecessary elements. We propose a mechanical illustration of the principle. The mechanical structure in the form of a bridge truss is in a loaded state. Perturbation of this bridge truss causes fluctuation of beams. Loaded (necessary) beams will fluctuate with smaller amplitudes than unloaded (unnecessary).
Student's paired t-test for dependent samples is used to check the mean values in the samples. Student's paired t-test is also used to test for a significant difference between the mean values of the accuracy of a subset function in this approach. This is a statistical test that determines the differences between the average values with a certain level of accuracy, assuming that the dependent variable corresponds to the law of normal distribution. It is used in this model to determine the best subset function.

Lorenz Attractor Test Recovery
The classical equations of the Lorenz attractor (5) are convenient for testing the principle of reduction, since they allow an analytical transformation into an equivalent form (6) convenient for our identification [17].
So, the problem of reconstruction of the exact model (6) is as follows -having a discrete signal y1=x1, we must calculate three derivatives of this signal y1'=y2, y2'=y3, y3', and solve the identification problem (7): All polynomial coefficients of the problem (7) are 50. However, for the exact model (6) only 7 coefficients are required. The remaining coefficients are unnecessary.
The discrete signal y1=x1 is calculated by numerical integration of the equations (6) by the Runge-Kutta method with a step of 0.02 sec. from 0 sec up to 34 sec. On the obtained set of points, an interpolation spline of fifth degree is constructed and its three derivatives are calculated analytically.
Next, the elemental reduction of the arrays of the coefficients aijk, bijk, according to the principle of reduction, was applied. The perturbations were added to values y'3m, with a relative value of 10 -5 . The relative deviations δi (2) were calculated, and the element with the largest δi was deleted.
The criterion for completing the reduction is the compact set of residual relative deviations. The sign of this is the same number of the relative deviations that are larger and smaller than the average of the remaining area.
In the reduction process, a magnitude of the relative deviations max(δi)-min(δi), and the middle relative error model coefficients are calculated. After 43 reduction steps, there are 7 coefficients of the exact model (6) with a middle relative error 0.0016. Figure 1 shows a change in the relative error with increasing number of reduction step. The dependence of the area size of relative deviations on the reduction step is shown in Figure  2.  On the Lorentz model, the procedure for increasing (induction) model was tested. Relative deviations (2) for all 50 coefficients were calculated. Next, starting with the three coefficients with the smallest relative deviations, the coefficients with the least relative deviations among the remaining coefficients were successively added to the model. The induction termination criterion is the formation of a compact area of relative deviations in 4 steps.
It is easy to see the advantages of the induction process compared with the reduction. First, the calculation of relative deviations does not need to be repeated at each step. Suffice it to calculate them at the beginning of the induction. Secondly, the number of steps may be smaller.
So, on the test example of the Lorentz attractor reconstruction, the validity of the basic principles of the reduction principle was checked.

Neural Network Reduction Case
Reduction method has been widely used for a long time to regulate mathematical models in a representation of ordinary differential equations. Here, we develop an application of the reduction principle to models based on neural networks.
There was a lot of research to simplify the structure of neural networks (NN) [9 -13]. To simplify the models, the first and second derivatives of the objective function were used. However, our proposed solution for this problem is simple, universal and does not depend on the method of network training and type of neural network.
We consider the example of neural network application for model reduction that approximates the economic system of stocks profits, bonds and interest rates on deposits based on time series of the macroeconomic indicators S ec and indicators of stock market S fin .
The model input are 8 economic indicators: • the consumer price index isc; • the money supply mf; the household income dn; • the public spending dv; • the gross domestic product dgp; • the average rate on deposits dep; • the index of bonds bond; • the trading system index of the first stock pfts. The output data is the average rate of the profit of deposits dep, index of bonds bond, and the trading system index of the first stock pfts. Input and output data were acquired quarterly during the period from 2002 to 2013, and represented by 44 samples [2].
According to the acquired data we train the three-layer recursive neural network with 12 neurons in the hidden layer by using back propagation method. The activation function is represented as the sigmoid with parameter α=0.5. There are 276 variable parameters of transfers between neurons.
The structure of the neural network is shown in the Figure 4. As a result of the reduction of the structure of the neural network model, as a rule, the number of neurons in the input and hidden layers decreases to the optimal number at which the classifying ability of the model will be maximum and at the same time not lower than the initial one (before reduction).
The method of identification is the "back propagation" with the approximation criterion RMS (Root Mean Square) error of the output data reproduction. The iterative network reduction is a sequential removing of the connections according to the above algorithm, but without stopping reduction.
The graph of mean square error of approximation of one output variable depending on the number of iteration is shown in Figure 5.    We found that the smallest error value is 0.09 on the 99-th iteration, while 35% of connections are removed. The reduction of the network reduced the inaccuracy 16 times. The next step is estimation of neurons in hidden layers using well-known technics. To determine the number of neurons in the hidden layer of a neural network, it is customary to use a consequence of the Arnold -Kolmogorov -Hecht -Nielsen theorem, according to which the maximum number of neurons in the hidden layer of the perceptron is limited by the right side of expression [13]: where Nh is the number of neurons in the hidden layer, and Nin is the number of neurons in the input layer. Thus, this expression determines the upper limit of the number of neurons in the hidden layer of the perceptron neural network model. Consequently, such a number of neurons can lead to redundancy in the structure of the model and, as a consequence, to ineffectiveness of its practical use.
In our case the input architecture of NN satisfies the Arnold -Kolmogorov -Hecht -Nielsen theorem. As results, the standard estimation without error analysis can't optimize the NN.
Let us compare results obtained b Dropout technic. The main idea behind Dropout is instead of training one NN to train an ensemble of several NNs and then average the results. The standard approach implemented in Python 3 shows the average number of disabled neurons is proportional to np, where n is number of neurons in the initial NN, p is Dropout coefficient. (Figure 7). The main advantage of the proposed method is ability to evaluate not only the number of removed connections, but also the accuracy of the model. The Dropout technic is based on Rademacher complexity [19]. So, the complexity of the proposed method is much less.

Sun Influence on the Earth's Seismic Activities Model
A crucial task of geo-heliogenic interactions is to model the influence of the Sun on Earth's earthquake and the intensity of near-surface infrasound.
We construct a dynamic model, the variables of which are the intensity of the solar wind s(k) during k-th day, the average daily earth seismic activity g(k), and the average daily intensity of the near-surface infrasound z(k) recorded during m days: s(k); g(k); z(k); k=1,…,m.
The variable s(k) will be the input signal of the model, and the variables g(k) and z(k) are output signals, and z(k) depends on g(k).
The model is chosen as a system of ordinary differential equations: . The obtained model has high accuracy of reproduction of experimental values only due to reduction. Thus, the relative mean square error of the approximation s(k) model (16), constructed for 119 -197 days in 1999, is 2.11 •10 -4 . This provides a practical tool for researching and forecasting the activity of simulated geo-heliogenic variables.

Conclusions
In this paper we propose simple principle of mathematical model reduction. This principle works for mathematical models of any nature, what is explored in this paper on examples. It is only necessary to perform the model parameters identification and the continuous dependence of parameters on the identification conditions. We analyzed the application for the neural network and the differential equations models. The given examples demonstrate efficient normalizing properties of the reduction principle for the mathematical models in the form of neural networks. Mathematical models in the form of ordinary differential equations due to reduction can reproduce complex experimental dependencies and serve as a tool for studying and predicting the behavior of real systems. The efficiency of the proposed approach comparing to existing methods are verified by different examples explored in the paper. The Dropout technic for neural network reduction is based on Rademacher complexity. So, the complexity of the proposed method is much less.
The results of the developed method can be used in modeling and analysis of the complex systems, particularly for economical modeling [20], energy systems [21].