Serial and Parallel Iterative Splitting Methods: Algorithms and Applications

The properties of iterative splitting methods with serial versions have been analyzed since recent years, see [1] and [3]. We extend the iterative splitting methods to a class of parallel versions, which allow to reduce the computational time and keep the beneﬁt of the higher accuracy with each iterative step. Parallel splitting methods are nowadays important to solve large problems, which can be splitted in subproblems and computed independently with the diﬀerent processors. We present a novel parallel iterative splitting method, which is based on the multi-splitting methods, see [2], [10] and [15]. Such a ﬂexibilisation with multisplitting methods allow to decompose large iterative splitting methods and recover the beneﬁt of their underlying waveform-relaxation (WR) methods. We discuss the convergence results of the parallel iterative splitting methods, while we could reformulate such an error to a summation of the individual WR methods. We discuss the numerical convergence of the serial and parallel iterative splitting methods and present diﬀerent numerical applications to validate the beneﬁt of the parallel versions.


Introduction
Iterative splitting methods are nowadays important solver methods to solve large systems of ordinary, partial or stochastic differential equations, see [1], [3] and [7].Iterative splitting methods are based on two solver ideas, while in the first part we separate the full operators into different sub-operators and reduce the computational time for such sub-computation, an additional benefit is the iterative part, which allows to solve a relaxation problem, like known in waveform-relaxation method or Picard's iterative method, see [6], [11] [12] and [13].Both parts reduce the computational time and the complexity as if we solve all parts (full operator and direct method) together, see [5].Such iterative splitting methods can be used to solve with less computational amount an approximate solution of the ordinary differential equations (ODEs) or semi-discretized partial differential equations (PDEs), see [5] and [13].The bottleneck of all the iterative methods are given by the large sizes of the iterative matrices, see [10] and [15], therefore we considered parallel versions of the iterative splitting method, see [2] and [4].We concentrate on solving linear evolution equations, such as the differential equation, where A ∈ IR m × IR m is the full operator with m is the finite dimension of the operator and A l are suboperators.Further c ∈ C 1 ([0, T ]; IR m ) is the solution and c 0 ∈ IR m is the initial condition.Based on the decomposition of the large scale differential equation with operator A into different smaller sub-differential equations with operators A l , where l = 1, . . ., L and L is the number of processors, we distribute the computational time to many processors and reduce the computational time, see [15].Further, we modify the synchronous parallel splitting method with chaotic (asynchronous) ideas, such that the computation and communication of the various processors can be done completely independently, see [14].The outline of this paper is as follows.The serial iterative splitting method is explained in Section 2. The parallel iterative splitting method is introduced in Section 3. We discuss the theoretical results in Section 4. The numerical examples are presented in Section 5.In Section 6, we discuss the theoretical and practical results.

Serial iterative splitting method
We consider a two-level iterative splitting method, which is discussed for two operators in [3] and for L-operators in [8].
Based on the differential equation ( 1), we have the following decomposition of the operator A: -A = L l=1 A l , while A l are the sub-operators for the iterative part (solver part), -and B l = A − A l are the sub-operators for the relaxation part (right hand side part).where we have l = 1, . . ., L.
The serial iterative splitting method is given in the following algorithm.Here, we apply a fixed-splitting discretization step-size τ , namely, on the time-interval [t n , t n+1 ].We solve the following sub-problems consecutively for i = 0, L, . . ., (m − 1)L: . . .
where we assume for the first initialisation c 0 (t) = 0.0, further c n is the known split approximation at the time-level t = t n .The split approximation at the time-level t = t n+1 is defined as c n+1 = c (m−1)L (t n+1 ).The stopping criterion is ||c (m−1)L − c (m−2)L || ≤ err and then we have the solution c(t n+1 ) = c (m−1)L (t n+1 ).The solutions are given as: . . ., where t ∈ [t n , t n+1 ].
The integrals can be solved by Trapezoidal-or Simpsons-rule.
We define the error-function as e i (t) = c(t) − c i (t) with e 0 (t) = c(t) − c 0 and the maximum-norm ||e i || = max t∈[0,T ] ||e i (t)|| ∞ and the maximum operator norm ||A l || = ||A l || ∞ .Theorem 2.1 We have bounded operators A l ∈ IR m × IR m .Then the iterations (2)- (5), which are applied with i = 0, L, . . ., (m − 1)L, with L are the number of operators, for the Cauchy-problem (1) is of order O(τ mL ).Proof 2.2 The proof is done for the 2-level method in the paper [8].We apply a recursive argument of the iterative scheme with the L operators and obtain: where C l is given with C l = O(τ ), see also [8].
In the following, we concentrate extend the two-level iterative splitting method to a parallel iterative splitting method, while we could transform the two-level method with multisplitting method.

Parallel iterative splitting method
In the following, we parallelise the serial iterative splitting approach with the idea of multisplittingapproaches.
We present in the following approaches: -Multi-splitting iterative approach, -Two operator iterative splitting approach.

Multi-splitting iterative approach
The problem is given as where A l is a non-singular and B l is the rest matrix.Further, we have the decomposition of the parallel computable vectors: where c i is the i-th iterative solution and c i,l are the parallel computable solutions in the i-th iterative step.E is the identity matrix and E l are diagonal matrices with positive entries.The multisplitting iterative approach is given as: where the initalisation is c 0 (t) = c(t n ), we have i = 1, . . ., I iterative steps.The stopping criterion is ||c i − c i−1 || ≤ err and then we have the solution c(t n+1 ) = c i (t n+1 ).
Benefit: -Parallel implementation (the method is designed for parallel contributions) -Good error balance between the different operators Drawback: -Balances in the decomposition of E l important to damp large errors

Parallel splitting with two operators: Classical version
We have to apply the following algorithm, which is applied with synchronisation: where c n is the known split approximation at the time-level t = t n .The split approximation at the time-level t = t n+1 is defined as c n+1 = c i (t n+1 ).We have the stopping criterion The solutions are given as: where t ∈ [t n , t n+1 ].
The integrals can be solved by higher order integration-rules, e.g., with the Trapezoidal-or Simpson'srule.

Parallel splitting with two operators: Modern version
We have the following algorithm, which is applied without synchronization: where c n is the known split approximation at the time-level t = t n .Processor l, l = 1, 2 runs the iterations independently until the stopping criterion ||c i,l − c i−1,l || ≤ err is reached .Then, the approximations are synchronized where processor l stops at iteration i l .
The split approximation at the time-level t = t n+1 is defined as c n+1 = c i (t n+1 ).

Theoretical Results
In the following, we deal with the m-dimensional initial value problem in the non-homogeneous form, see also the homogeneous form in Equation (1): where A = M + N and f is the right hand side.
Further, we deal in the following with the proof-ideas related to Waveform relaxation methods, see [9] and [14].
The initial value problem (20) is solved with the multisplitting Waveform-relaxation method, which is given as: where A is given in Equation (1).Further c 0 (t) = c 0 is the starting condition.
For the multisplitting approach, we have the following Definition 4.1.Definition 4.1 We have L ≥ 1 is the number of the splittings.Further, we have A, A l , B l , E l , which are real-valued m × m matrices.Such that we obtain the multisplitting triple (A l , B l , E l ) for l = 1, . . ., L: -A = A l + B l and B l = L k=1,k =l A k with l = 1, . . ., L, -The matrices E l are nonnegative diagonal matrices and satisfy: L l=1 E l = I , where I is the identity matrix.
-s l (i + 1) ≤ i + 1 indicates the iteration, where the l-th component is computed prior to i + 1.
-The multisplitting approach based on the Waveform-relaxation in the classical version is given in the following notation: -The multisplitting approach based on the Waveform-relaxation in the modern version is given in the following notation:

Convergence Analysis
The solution of the multisplitting Waveform-relaxation method ( 22) and (23), is given as following: We solve the individual equations (22) as: can be written as: where we have where k l (t) = exp(tA l ) B l for l = 1, . . ., L. Further, we apply the multisplitting notation (23) and obtain the summations: where k(t) = L l=1 E l k l (t) and we obtain the standard Waveform-relaxation method as: We assume, that the Lemma 4.2 is fulfilled, see also [9].Lemma 4.2 We assume that the following items are equivalent: -We assume c(t) is a solution of the initial value problem (20).-c(t) is a solution of each multisplitted equation c(t) = K l c(t) + φ l (t), c(0) = c 0 , l = 1, . . ., L.
-c(t) is the solution of the fixpoint equation c(t) = Kc(t) + φ l (t).
Based on the assumptions, we derive in the following the errors and the convergence results.
In the Theorem 4.3 we derive the error of the i-th approximation, see also [9].Theorem 4.3 There exists a constant C := L l=1 C l , which is given to estimate the kernel k of the multisplitting waveform-relaxation operator, such that we obtain ||k|| T = C. Then the error of the i-th approximation of the classical multisplitting WR method ( 22)-( 23) is given by Proof 4. 4 We have given We apply the Lemma 4.2 and follow with the iterative approach: where is K i u(t) is the i-times convolution and we have we apply the estimation of the Waveform-relaxation, see [11], and obtain: where we have The error estimate is than given as We apply Then we obtain the estimation We have the following new convergence Theorem 4.5 based on the extension of the classical convergence Theorem version 4.3.Theorem 4.5 There exists a constant C := L l=1 C l , which is given to estimate the kernel k of the multisplitting waveform-relaxation operator, such that we obtain ||k|| T = C. Then the convergence of the modern multisplittting WR method (24)-( 25) is given by where i min = min L l=1 s l (i), where s l (i) ≤ i are the retarded iterations of the l-th processor.Proof 4. 6 We start with the estimation of the i-th iteration: Then, we have the recursion: where we apply , based on the idea in [11], and we obtain: where i min = min L l=1 s l (i).Remark 4.7 For the parallel error, we have the order O(τ m ), if we assume that all processors have at least m iterative cycles, while for the serial error, we have the order O(τ mL ).Means in the serial version, we have to apply mL iterative steps in sum to obtain the result, while in the parallel version, we only apply m iterative steps, while L processors share the computation to solve the L sub-equations.Further, we can assume, that the sub-equations are faster to solve, while the sub-operators are much more smaller and simpler to handle.Such that we have t sub ≤ t f ull L , while t sub is the time to solve a sub-problem and t f ull the time to solve the full problem.Therefore, we have a benefit in the parallel distribution and we obtain faster the higher order O(τ mL ) as with the serial version.Remark 4.8 We can modify the parallel method with inner and outer iterative cycles.Means we deal with L inner cycles for each processor and apply m outer cycles for all the processors.For such methods, we also obtain the higher order O(τ mL ) but more faster than in the serial version.

Numerical examples
In the following, we deal with different numerical example to verify and test the theoretical results.We deal with: -Only time-dependent problem: We apply ordinary differential equation to verify the theoretical results.
-Linear time and spatial dependent problem: We apply a diffusion equation with different spatial dependent operators and test the application to partial differential equations.-Nonlinear time and spatial dependent problem: We apply a mixed diffusion-convection with Burgers equation to test and verify the application to nonlinear problems.

First Example: Matrix problem
For a first test example, consider the matrix equation, the exact solution is exp( We split the matrix as, -Two operator approach -Multiple operator approach where the E 1 and E 2 are given as: We include Tables 1 and 2 corresponding to Multi-splitting iterative approach classical and modern version with the above partitions and using different discretizations in [0, 1] of step h allowing a maximum of 10 iterations and a tolerance of 10 −3 .We can see in the results the relative and absolute errors for each component of the solution and the mean of iterations performed in order to reach the tolerance.Remark 5.1 We applied the multisplitting method with the classical (synchronous) and modern (chaotic) approach.We receive the same accuracy of the numerical results, means the methods are as same accurate.
We obtain some more benefits of the modern method, if we apply large time-steps, such that the solution of one sub-problem can be achieved faster and benefit the solution of the second sub-problem.For such small unbalances computations the modern approach is more efficient.We deal with the following diffusion problem: where we have the analytical solution u an (x, t) = exp(−3t) sin x sin y sin z, with x = (x, y, z In operator notation, we write as following: where ∂z 2 and we assume, that the zero-boundary conditions (Dirichlet boundary conditions) are embedded.
The problem is discretized by using a 4-D mesh in Ω×[0, T ].Denote by u i,j,k,t the approximated value of the solution at node (x i , y j , z k , t) for a given t.For the time-integration, we apply the integral formulation, see Equations ( 6)- (9).
For the spatial discretization, we test a second and fourth order scheme: , second order approach, (56) , 4th-order approach, (57) where we have the analogous operators for the y and z derivations.
In order to establish the convergence of the algorithms, we compute the solution u(∆x, h) obtained using spatial and temporal steps ∆x = ∆y = ∆z and h, respectively.We use different measures to estimate the convergence.On one hand, we can compare the outcome of the method u(∆x, h) with the exact solution u ana for every point of the mesh, which shows the convergence of the method.On the other hand, we can compare u(∆x, h) with the result obtained halving the time or space steps, h/2 or ∆x/2, at the points shared by the corresponding meshes.This allows to analyze how the results depend on these steps.Denote by e i,j,k (∆x, h) the difference between the results at a mesh point (x i , y j , z k , T ), obtained using two different time or space steps, and by δ i,j,k (∆x, h) the difference with the analytical solution at the same point.In the tables, we will denote the maximum errors by and and the mean errors by and where N is the number of spatial nodes at time T .
In the following, we discuss different decompositions of the multi-operator splitting approach: -Directional decomposition: We decompose into the different directions: Here, we have the benefit of decomposing the different directions.
The drawback is related to the unbalanced decomposition, while the matrices have different sparse entries.Therefore the exponential matrices of the operators are different in their sparse behaviour and the error can not be optimal reduced.
We can reduce the unbalanced problem, if we deal with the idea to use ∆t ≈ ∆x, see [8].Then, we obtain at least a second order scheme (related to the spatial discretization).
At least, we test also the strong relation of the time-and spatial scales with respect to the explicit CFL condition, means, we have: where ∆x = ∆y = ∆z.We apply the splitting algorithms with the directional decomposition in [0, 1].The splitting is iterated until a tolerance of 10 −8 or a maximum of 10 iterations are reached.The values shown in the tables correspond to maximum or mean values at T = 1.Tables 3 and 4 present the results obtained using different number of temporal steps and the second and fourth order schemes for the spatial discretization, respectively.In tables 5 and 6, we fix the number of temporal steps to 1280 and vary the number of spatial intervals.3 Results using the second order scheme for the spatial discretization, with 10 spatial subintervals and different number of temporal steps Table 3 shows that, using the second order scheme to approximate the spatial derivatives, the order of convergence of all the considered methods with respect to the temporal step h is 1, because the estimated errors are approximately proportional to h.The differences with respect to the analytical solution are almost constant, due to the spatial discretization error error.The serial iterative splitting method is unstable for h = 0.1, whereas the parallel splitting methods converge for all the cases shown in the table.When the methods converge, there is no difference in the final approximation among the considered methods, but the parallel methods require about double number of iterations to converge.5 Results using the second order scheme for the spatial discretization, with 320 temporal steps and different number of spatial subintervals In Table 5, the spatial step is reduced simultaneously in the three dimensions, in order to keep the increments equal, because the performance of the method is better in this case.The number of time steps is big enough to ensure the convergence in the case of smaller spatial subinterval and is used in all the cases to allow the comparison depending only on the spatial step.The error is halved when the step in each spatial dimension is halved, which increases the computational cost by a factor of 8, thus the behavior of the method is poor.The parallel methods obtain slightly less approximation to the analytical result and require more iterations than the serial iterative method.The modern parallel method needs more iterations per step to converge than the classical parallel method.
Table 6 compares the results obtained varying the number of spatial subintervals.The error is divided by about 20 when the step in each spatial dimension is halved, which indicates a convergence of order higher than linear with respect to the cube of the spatial step, improving the behavior of the method with respect to the order two scheme.There is no noticeable difference in the results among the different methods, although the parallel ones require double number of iterations than the sequential method to converge.
-Balanced decomposition: We decompose into the different directions:  6 Results using the fourth order scheme for the spatial discretization, with 320 temporal steps and different number of subintervals in each dimension Here, we have the benefit of equal load balances of the matrices, such that the exp-matrices have the same sparse structure.7 Results using the second order scheme for the spatial discretization, with 10 spatial subintervals and different number of temporal steps -Mixed decomposition: We decompose into the different directions: where = [0, 2/3], means for = 0, we have the directional decomposition, while for = 2/3 we have the balanced decomposition.
Here, we have the benefit, e.g., for small = 0.1, we have nearly a directional decomposition scheme.
Based on the = 0, we stabilise the scheme.Remark 5.2 Based on the balanced decompositon with = 2/3, we do not have problems with the splitting approaches and obtain optimal results.For the pure unbalanced decomposition, means = 0, we decompose into different directions.Here, we have to restrict us to the exact second order approach, which is ∆t ≈ ∆x.
Remark 5.3 We obtain the benefit of the classical and modern parallel iterative splitting method based on larger time-steps and more iterative steps.In such an optimal version, we are much faster than the serial version and also the result is more accurate.For very fine time-steps, we do not see an improvement in the accuracy, but we see a benefit in the computational time, means the parallel versions are more faster.

Third Example: Mixed convection-diffusion and Burgers equation
We deal with a partial differential equation, which is a 2D example of a mixed convection-diffusion and Burgers equation.For such equations, we can find analytical solution.The model problem is given as: u(x, y, 0) = u ana (x, y, 0), (x, y) ∈ Ω, (68) where Ω = [0, 1] × [0, 1], T = 1.25, and µ is the viscosity.The analytical solution is given as where we compute f (x, y, t) accordingly.As in the previous example, denote by u(∆, h) the numerical solution obtained using spatial subintervals of amplitude ∆ = ∆x = ∆y, time steps h and a tolerance of tol = 10 −6 , allowing a maximum of 40 iterations.On one hand, we will compare the numerical solution with the exact one u ana for every point of the mesh, which shows the convergence of the method.On the other hand, we will compare u(∆, h) with the result obtained halving the time steps, h/2, at the points shared by the corresponding meshes.Denote by e i,j,k (∆, h) the difference between the results obtained using two different time steps, h and h/2, at a common mesh point (x i , y j , t k ), and by δ i,j,k (∆, h) the difference with the analytical solution at the same point.In the tables, we will use the error estimates given by and and the mean errors by and where N is the total number of nodes (x i , y j , t k ).
In order to obtain the temporal cost of the parallel schemes, we measure the time consumed by processor l in each iteration m, in [t k , t k+1 ], tp k,l,m .In the classical parallel scheme the processors synchronize at each iteration, so the cost for this time interval is tp k = m max l=1,2 tp k,l,m , whereas, in the modern parallel scheme, the processors iterate independently in [t k , t k+1 ] performing m l , l = 1, 2 iterations until convergence, thus, the cost is tp k = max l=1,2 m tp k,l,m .The final cost is obtained adding the results of all time subintervals.The values have been obtained using etime, running Matlab 2015a on a desktop computer with an Intel Core i7 processor in a 64 bits operating system with Windows 7 Professional.
In the following, we discuss different decompositions of the multi-operator splitting approach: -Directional decomposition: We decompose into the different directions (x and y direction): Before studying the convergence of the method, we analyze the influence of parameter µ.Tables 9  and 8 compare the results obtained for different values of µ using the second and the fourth order discretization scheme, respectively, taking 10 subintervals in each spatial dimension and 640 time steps.As we can observe, the error generally increases with µ.Along this example, we will take µ = 0.5, because for this value, the methods require the minimum number of iterations, except in one case.9 Results of the directional decomposition using the fourth order scheme for the spatial discretization, with 10 spatial subintervals, 640 temporal intervals and different values of µ.
As we can observe, the mean error is approximately proportional to the inverse of the viscosity, µ.Instead, the number of iterations and the execution time is not affected by the viscosity changes, except for the last case of the modern parallel algorithm, where the maximum number of iterations is reached in each step, consuming also more execution time.In what follows, we will take the viscosity parameter µ = 0.5.
We will first analyze the influence on the convergence of the number of time steps, using second or fourth order approximations for the spatial derivatives.
Table 10 show that the estimated errors are roughly proportional to the square of the time step, although, in the end, the differences with the analytical solution δ decrease more slowly, due to the discretization error.The classical and modern parallel methods converge more slowly in the case nt = 160, but, in the other cases, they have better performance in terms of error estimates and temporal cost.The utilization of fourth order approximations instead of second order approximations for the discretization spatial derivatives also gives better results for more than 160 time intervals, at a slightly higher cost.10 Results of the directional decomposition using the fourth order scheme for the spatial discretization, with 10 spatial subintervals and different number of temporal steps Now we analyze the dependence on the number of spatial subintervals.Tables 11 and 12 display the errors obtained varying the number of space subintervals, considering 1280 time steps, µ = 0.5, tolerance 10 −6 and maximum number of iterations 40.11 Results of the directional decomposition using the second order scheme for the spatial discretization, with 1280 time subintervals and different number of spatial steps The best approximations are obtained for 10 subintervals in each spatial dimension.The error increment for more space intervals can be due to the fact that the condition number of the exponential matrix utilized to solve the differential system increases fast with the number of spatial intervals, making the solution less reliable.
The parallel splittings obtain better results in error and temporal cost than the serial method in almost all cases.The utilization of fourth order approximations for the spatial derivatives gives less error, mainly for small number of spatial intervals.
The temporal cost increases between linearly and quadratically with the number of spatial steps, being higher for the fourth order scheme.12 Results of the directional decomposition using the fourth order scheme for the spatial discretization, with 1280 time subintervals and different number of spatial steps -Convection and diffusion decomposition: Here, we decompose to an explicit part, which is the convection, and into an implicit part, which is the diffusion.
Tables 13 and 14 analyze the convergence of the different splitting methods for the convection and diffusion decomposition varying the time step.13 Results for the convection diffusion decomposition using the second order scheme for the spatial discretization with 10 spatial subintervals and different number of temporal steps Tables 13 and 14 shows that, for a fixed spatial step, the estimated errors are proportional to the temporal step, in contrast with the results for the directional decomposition where the errors are of second order with respect time.This behavior can be due to the fact that in the directional decomposition, the solution can be obtained decomposing the matrices in blocks corresponding to the variables in a row or column of the spatial mesh because the spatial derivatives are separated in the split, whereas in the case of the convection and diffusion decomposition, in each part of the split, they appear derivatives in both directions, preventing the solution in blocks.In the fourth order case, the modern parallel algorithm with 160 temporal steps diverges.14 Results for the convection diffusion decomposition using the fourth order scheme for the spatial discretization with 10 spatial subintervals and different number of temporal steps The dependence on the spatial step is analyzed in tables 15 and 16.The behavior is similar to the case of the directional decomposition, presenting an increment of the estimated error with the number of spatial subintervals for a fixed time step.In the second order scheme, the δ-errors decrease with the space step.In the fourth order scheme, the δ-errors are lower, but they slightly increase when the space step decreases.For this scheme, the modern parallel algorithm with 20 spatial steps diverges.The temporal cost is relatively high in the case of 20 subintervals, due tho the computational overhead for dealing with big matrices.15 Results for the convection diffusion decomposition using the second order scheme for the spatial discretization with 10 temporal steps and different number of spatial subintervals The use of order approximations to the space derivatives has little influence in the results of the algorithm.
-Balanced decomposition: We decompose into: where is an arbitrary parameter that can be tuned in order tho achieve the maximum efficiency.We first examine the influence of the parameter on the convergence of the different splitting schemes.Tables 17 and 18 show the influence of the parameter on the convergence of the methods.The method  16 Results for the convection diffusion decomposition using the fourth order scheme for the spatial discretization with 1280 temporal subintervals and different number of spatial steps has the same behavior for values of which are symmetric with respect to 0.5.The results are quite uniform for in the range [−0.1, 1.1] whereas for other parameter values, the method diverges.Only the classical parallel algorithm obtains the result for = 2, whereas the serial iterative method even fails for = 1.1 in the second order as well as in the fourth order schemes.17 Results for the balanced decomposition using the second order scheme for the spatial discretization with 640 temporal subintervals and different values of parameter We fix the parameter = 0.9 for the analysis of the method.Table 19 compare the results of the considered splitting algorithms for different number of temporal steps using the fourth order schemes, respectively.The fourth order scheme provides better approximations than the second order one.The parallel methods behave better than the serial method for more than 160 temporal steps.
Finally, we compare the temporal cost of the different combinations of algorithms for different number of time steps and 10 space subintervals.The results are shown in table 20.But for the case of 160 time steps, the classical parallel algorithm obtains the best results for the different analyzed decomposition methods.For 1280 time steps, the modern parallel algorithm also outperforms the serial iterative algorithm.
Here, we have the benefit, e.g., for small = 0.1, we have nearly a directional decomposition scheme.
Based on the = 0, we stabilize the scheme.
Remark 5.4 We also compared the computational time of the serial iterative, classical parallel iterative and modern parallel iterative method.We obtain a speedup for the parallel versions of a factor nearly L class ≤ L, if we apply L-processors.For the modern parallel version, we obtain with the same accuracy of the results a small reduction of the speedup with L modern < L class .19 Results for the balanced decomposition with = 0.9 using the fourth order scheme for the spatial discretization with 10 spatial subintervals and different number of temporal steps Remark 5.5 Based on balancing the decomposition of the operators with , we also obtain the benefit of the classical and modern parallel iterative splitting method with larger time-steps and more iterative steps to improve the accuracy.We can accelerate the speed of computations with the parallel versions and obtain speed ups with the modern parallel version.

Conclusion
The extensions of iterative splitting methods to parallel versions allow to reduce the computational time.We obtain the same accuracy as in the serial version.The improvements are obtained with larger time-steps and additional iterative steps, while we could reduce the computational time with the parallel methods.The benefit is of course the balance to multiple processors with additional memories.Further, we could apply the resources to improve with additional steps the accuracy of the approximations.We circumvent the problem of the memory of the algorithm, see [8], which we have if we obtain only a serial method.Based on the parallel distribution, we have additional iterative steps for each processor and we distribute such a memory to all processors.For large scale numerical experiments, we could present the benefit of the parallel resources.
In future, we consider more real-life problems and extend the parallel iterative methods to stochastic differential equations.
Results using the fourth order scheme for the spatial discretization, with 10 subintervals in each dimension and different number of temporal steps

Table 4
shows that, using the fourth order scheme, the order of convergence of with respect to the temporal step h is 2, being the estimated errors proportional to h 2 .
Results of the directional decomposition using the second order scheme for the spatial discretization, with 10 spatial subintervals, 640 temporal intervals and different values of µ.
Results for the balanced decomposition using the fourth order scheme for the spatial discretization with 640 temporal subintervals and different values of parameter