A Preliminary Study on Dimension-Reduction Algorithm for Variational Methods in Three Dimensions

Three Dimensional Variational data assimilation or analysis (3DVAR) is one of most classical methods for providing the initial values for numerical models. In this method, the dimensions of the background error covariance and the observational error covariance matrices are large. Therefore, it is difficult to get the inverse of the covariance matrices and to reduce the orders of these matrices without information loss. With the use of the Sylvester Equation, on the basis of a new linear regression, a new cost function for 3DVAR was given. For the first-guess m×n field, there is an approximate 1−(m 2 +n 2 )/(mn×mn) reduction with m>1 & n>1 by using the cost function. The results of the numerical experiments show that the effect of this algorithm is no worse than that of the old cost function for 3DVAR.


Introduction
Variational data assimilation is widely used all over the world (Chen et al., 2015;Kalnay, 2003;Kristian et al., 2009;Warner, 2011) in order to provide the first guesses (initial values) for atmospheric or oceanic numerical predictions, which is an initial value problem.In these methods, the background error covariance and observational error covariance matrices have large dimensions (Kalnay, 2003;Warner, 2011).
Reducing the order of the covariance matrices is still an important problem.
It is well known that the structure or estimation of the covariance matrices in the cost function (Bannister, 2008a;Bannister, 2008b;Federico, 2013;Liu et al., 2010;Parrish et al., 1992) is the key to solving the variational analysis problem, but this type of matrix is nearly ill-conditioned in the large dimension case.In order to get a better estimation of these matrices, many algorithms (Bannister, 2008a;Bannister, 2008b;Cohn et al., 1996;Farrell et al., 2010) for these large matrices have been developed.Some of them are applied by many meteorological centers, such as the NMC (National Meteorological Center) method (Parrish et al., 1992).Naturally, reduction-order algorithms are also developed; proper orthogonal decomposition is one of these methods, and many works on reduction-order methods are based on it (Cao et al., 2007;Fang et al., 2014;Lawless et al., 2008).The others are based on statistical methods, such as empirical orthogonal function (EOF) decomposition (Frolov et al., 2009;Hoteit et al., 2006;Robert et al., 2005;Shen et al., 2014;Zhao et al., 2012) and the work of Kleist et al. (2008).By reducing the computational burden, more or less information of the covariance matrix is neglected.It is obvious that both keeping the covariance matrices' information and reducing the computational burden are important.Although those methods are effective, fewer of them can truly reduce the dimensions of the covariance matrices.In previous research (Chen et al., 2017;2019), a more generalized linear regression model was developed.By using this method, Chen et al. ( 2019) found that the Sylvester Equation (SE) can be applied in data assimilation and that the SE can reduce the computational burden with less information loss.In this paper, the theory of the SE and the use of the SE to solve the Three Dimensional Variational analysis (3DVAR) problem are presented in section 2, some numerical experiments are presented in section 3, and the conclusions and a discussion are provided in section 4.

The SE
Eq. ( 1) in the m×n unknown matrix X is called the SE as follows: where A and B are m×m and n×n square matrices, respectively, and C is m×n (Simoncini, 2013).Eq. ( 1) is a linear matrix equation and is widely used in controlling systems, numerical analysis and even image processing.
Eq. ( 1) can be rewritten as the following linear vector equation: where I n is the n×n identity matrix, ○ × is the Kronecker product notation, T represents the matrix transposition, and vec() stacks the columns of X into a column vector (Deif et al., 1995;Simoncini, 2013).

The more generalized linear regression model
, which can also be solved by least square method.But for more generalized linear regression model, this expression seems to be limited by its form.
For the expression for the matrix-form data Y m×p and X n×q , it is difficult to solving directly by using the least square method, but the most important principle is all the same that the error of the regression model must be smallest.
So, for the matrix-form data Y m×p and X n×q , the simplest regression model is How to estimate the coefficients matrixes?Taking the most important principle about the error into consideration, here noting the matrix Y m×p in the model as Ŷ m×p , as taking Y m×p as observations.The error of the model can be written as ∑(y ij −ŷ ij ) 2 , which is the sum of two matrixes' trace of the expression: With using matrix differential rule, we can get the expression of coefficients matrixes but nonlinear.By using the Sylvester Equation, it seems easier for solving: at first, giving the estimation of the coefficients matrixes A1 by the expression Ŷ1 m×max(p,q) =A1 m×n X n×max(p,q) +B1 m×max (p,q) , in which the elements in the part from min (p,q) to max(p,q) of matrix X or Y can be filled with random numbers.For example, p=min(p,q) and q=max(p,q), [Ŷ1 m×p ,R m×(q−p) ]=A1 m×n X n×q +B1 m×q , where matrix R m×(q−p) is filled with random numbers.
By the least square method and the Sylvester Equation, we can estimate A1 and B1.
Then by the form

3DVAR
In 3DVAR, the scale cost function J(x) is defined as follows: where B is the background error covariance matrix, R is the observational error covariance matrix, x is the unknown vector, x b is the background or the first guess vector, y is the observation vector and H[] is the observation operator and assumed to be linear.The cost function can be defined as the distance between the analysis and the background, weighted by the inverse of the background error covariance, plus the distance to the observations, weighted by the inverse of the observations error covariance (Kalnay, 2003).And, Eq. ( 1) can be written as: where Trace[] is the trace of the matrix in square brackets.Eq. ( 1) is minimized in order to obtain the analysis by solving the following equation: after solving Eq. ( 4), the following equation is used to obtain the analysis vector: where ), one form of the solution of equation ( 4).
Taking the one point situation into consideration, the cost function J(x) can be written as: taking the same views that the cost function is a quadratic function of the analysis increments (x−x b ), then the gradient of the cost function J with respect to (x−x b ) or equation ( 4) can be written as: then the analysis can be written as: which is minimum variance estimator and for the error (ε a , ε b , ε o (for y)) expression, For the vector form, we can also get the form by set (b −1 +r −1 ) −1 b −1 and (b −1 +r −1 ) −1 r −1 of each elements in the verctor x b and y as element of the weight diagonal matrixes D 1 and D 2 in x=D 1 x b +D 2 H −1 (y) respected to the elements in the vector, here H −1 (y) just means we can get the point information, and with any other practical meaning.The errors of the analysis (ε) can be considered as the sum of the  , 2017; 2019), the analysis X (x=vec(X), X is the analysis field and m×n) can be written as: where, E i (i=1, 2, 3, 4) is the regression coefficient matrices, H[] is the observation operator and assumed to be linear, H −1 is the generalized inverse to the H.Because X, X b and H −1 (Y) are the different forms of the same field, the regression coefficient matrices are the identity matrices weighted a constant number and the The errors of the analysis (ε) can be considered as the sum of the errors of background (ε b ) and observation (ε o ): The distance between the analysis and the truth field can be also written as where W 1 and W 2 are the expectations of ε T ε and εε T .
Taking the expectation of ε T ε as an example: on the basis of the assumptions above, the expectation can be written as: where a 1 and a 2 are constants related to the variances of the errors; in addition, a 1 +a 2 =1, and both a 1 and a 2 are greater than zero.So, it is natural to rewrite the cost function as the following equation: where where B 1 (with a dimension of m×m) and B 2 (with a dimension of n×n) are the background error covariance matrices, R 1 (with a dimension of m×m) and R 2 (with a dimension of n×n) are the observational error covariance matrices, Y is the observation and with a dimension of m×n, here, we just fill the observation matrix Y with all observation data along the diagonal of the matrix which ensure the matrix is a nonsingular matrix, and the others in the matrix just filled with zero.For the vertical levels, just put them together as [X 1 ;X 2 ;X 3 ;....;X n ], then use equation ( 6) to get the analysis.Therefore, the analysis can be conducted by minimizing Eq. ( 6).But, it should be noted that the cost function (Eqs.( 6) and ( 7)) is not exactly the same as the regular cost function (Eq.( 3)).Differentiating J 1 (X) in Eq. ( 7) with respect to X gives the following:

Discussion in the Ideal Case
Similar to other terms, we substitute these equations into the following equation: Then, Eq. ( 9) can be rewritten as follows: The other terms in Eq. ( 11) are as follows: and with G=Y-AX b D. The analysis can be written as the solution to Eq. ( 11) plus the background as follows: In the ideal case, A and D are the identity matrices.So the Eqs.(11-13) can be written as the SE as follows: . Compared with the following vector form: with g=y−H[x b ], the following matrix form is with vector form in Δ: For the case that errors made at different locations are uncorrelated and the variances (means) of the errors are same, Eq. ( 17) are similar to the Eq. ( 16) with a constant k: [Eq. ( 17)]=k[Eq.( 16)], as all error covariance matrices in Eq. ( 17) are the identity matrices multiplied by the estimation of the variance of the errors, which means Eq. ( 15) is equivalent to Eq. ( 17). Figure 1a shows the ideal field with a dimension of 100×100.Figure 1b shows the background that is generated by the ideal field plus a normal distribution error (the background x b is generated by the command xb=peaks(100)+normrnd(μ,σ,m,n) in MATLAB with μ=1, σ= 0.3 , and the observation is generated in the same way with μ=0 and σ=rand(100  2.5260, the root mean squared error (RMSE) is 0.6692, the mean absolute error (MAE1) is 0.5331, and the mean error (ME) is 0.0011.Figure 1c shows the observation.The MAE is 2.9656, the RMSE is 0.7176, the MAE1 is 0.5759 and the ME is -0.0015.Figure 1d shows the analysis that is obtained by Eq. ( 15).The MAE is 2.0902, the RMSE is 0.4909, the MAE1 is 0.3908, and the ME is −0.0001.The results

Numerical Experiments in the Ideal Case
show that the analysis field provided by Eq. ( 15) is acceptable in practice.In order to compare Eq. ( 15) and ( 16), the experiment (table 1 shows the comparison) with a dimension of 40×40 was established.Both results show that the effect of Eq. ( 15) is acceptable.

Numerical Experiments in the General Case
In the general case, the observational stations cannot be at each the computational grid point.Therefore, there are two ways to resolve this situation.The first way is to interpolate the background onto the observation stations.Contrary to the first way, the second way is to interpolate the observations onto the computational grid.
The second way seems easier with respect to Eq. ( 15).The resulting interpolation function is given as follows: where p(x 0 ,y 0 ) is the estimation of the position (x 0 ,y 0 ), f(x i ,y i ) is the observation of the position (x i ,y i ), d i is the distances that are arranged from small to large, and q is a constant.Using Eq. ( 18), we can get the statistical feature values of the observation at each computational grid point.
Figure 2a shows the observation stations that are generated by random numbers.
The position of each station is marked with an asterisk.The total number of sites is 900, and the total number of computational grid points is 2500.
In the numerical experiments, the observed error distribution of each station obeys a normal distribution.The mean is 0, and the variance of each point is a random number from 0 to 1.The error of the background field in each grid obeys a normal distribution.The mean value is 1, and the variance is a constant number that belongs to the interval (0,1).
The estimation of the mean of the observed error at each computational point is 0.
The mean values estimated distribution from Monte Carlo simulations is shown in The others are the same as in figure 1 and the comparison is shown in table 2.  (f) the analysis of Eq. ( 16).
By comparing the analysis field by Eqs. ( 15) and ( 16), the MAE, RMSE and MAE1 are smaller for the case with Eq. ( 15), which means that the errors are concentrated in a smaller interval: for this case in table 2. as an example, the maximum error is 1.1933, but another is 2.1347.The reason that the case in table 2. is better than both case in table 1 and another case in table 2. may be more information about each grid are taken into consideration than others.As the information of each grid are same, the cases in the table 1. are close to each other, but in the case with Eq.
(15), the errors are concentrated in a smaller interval, too.
The mean values by Eq.( 18) is close to 0 in the figure 2a

Summary and discussion
With the numerical results from section 3, the new cost function for 3DVAR within the SE significantly reduces the computational magnitude.The dimension of the covariance matrices B i and R i (i=1,2) in Eq. ( 6) is approximately 2×(m 2 +n 2 ), and the dimension of the background error covariance matrix B in Eq. ( 3) is approximately m 2 ×n 2 .The rate of both is 2×(m 2 +n 2 )/(m 2 ×n 2 ), which is a decreasing function of both m and n.For m=n=10, the rate is 0.04, which means that there is an approximately 96% reduction.Without the loss of generality, by setting m≤n, the rate takes the following form of an inequality:   Eq. ( 19) means that there is an approximate reduction of between 4/n 2 and 4/m 2 .
For a high-resolution limited area model or a global model with m, n>100, there is approximately a 99% reduction in and less information for the covariance matrices loss.

Preprints
(www.preprints.org)| NOT PEER-REVIEWED | Posted: 5 December 2019 case, the observational stations are located in each computational grid point; in another word, for the ideal case, in each grid, there are two values: the back ground value and the observation value.On the basis of matrix theory and a more generalized linear regression model, H[X] can be written as H[X]≡AXD, where A and D are the regression coefficient matrices (in the more generalized linear regression, if there is linear correlation between the two anomalies X and Y, the relation can be written as Y=AXD,Chen et al., 2017; 2019).
and peaks(100) returns a 100×100 matrix.The function normrnd(μ,σ,m,n) generates random numbers from the normal distribution with mean parameter μ and standard deviation parameter σ, where scalars m and n are the row and column dimensions of output.The function rand(100) returns a 100×100 matrix containing pseudorandom values drawn from the standard uniform distribution on the open interval (0,1).The function sqrt returns the square root of input).The maximum absolute error (MAE) is

Figure 1 .
Figure 1.The numerical experiment's results (the ideal case).(a) The ideal field, (b)

Preprints
figure 2b, and the comparison of the estimation of the variance is shown in figure2c.

Figure 2 .
Figure 2. The numerical experiment's results (the general case).(a) the estimation figures imply that the estimations of the statistical feature values are acceptable.Both

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 5 December 2019 errors
of background (ε b ) and observation (ε o ): , where a 1 +a 2 =1 and a 1 and a 2 are constants related to the variances of the errors with the assumption that the parameters of statistic characters in each grid and observation station are same..All of this is easy, but difficult for the matrix form.So, by using the more generalized linear regression model, just give the forms with the more generalized linear regression model.On the basis of matrix theory and a more generalized linear regression model (Chen et al.

Table 1 .
The numerical experiment's results (with a dimension of 40×40)

Table 2 .
The numerical experiment's results (with a dimension of 50×50)