Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU Dmitry Belousov, Elena Akimova Institute of Mathematics and Mechanics of Ural Branch of RAS, Ural Federal University named the first President of Russia B.N.Yeltsin 1
Systems of linear algebraic equations (SLAE) with block-fivediagonal matrices arise when solving some mathematical modelling problems, in particular, geoelectrics and diffusion problems. Introduction After using a finite-difference approximation with improved accuracy, the problem is reduced to solving a SLAE with block-fivediagonal matrix [1]. 2
SLAE 0 0 0 1 0 2 0 1 0 1 1 1 2 1 3 1 , , С Y D Y E Y F BY C Y DY E Y F  − + =  − + − + =  We consider the system: (1) 2 1 1 2 2 1 1 1 1 1 1 1 1 , 2,..., 1; , , i i i i i i i i i i i N N N N N N N N N N N N N N N N AY BY CY DY EY F i N A Y B Y C Y D Y F A Y B Y C Y F − − + + − − + + − + + + +  − + − + = = −  − + − =  − + = 3
After a finite-difference approximation improved accuracy, the geoelectrics problems and diffusion problems can be reduced to solving SLAE with block- fivediagonal matrices with following DATA STRUCTURE Fig. 1. Block-fivediagonal matrix fivediagonal matrices with following structure which presented in Fig. 1. We are proposing the Parallel Conjugate Gradient Method with regularization (PCMR), conjugate gradient method with preconditioner (PCGM), and parallel square root method (PSRM) to solve such SLAEs. 4
Parallel Conjugate Gradient Method 0 0 0 0 , ,r b Ax p r= − = We have the systems of the following form: The iterative process for the method has following form: Ax b= ,Ax b A A Eα= = +We add parameters for regularization: (2) 2 1 12 21 1 1 2 2 2 , , , , , ( , ) , . k k k k k k k k k kk k k k k k k k k r b Ax p r r x x p r r Ap Ap p r p r p r α α α β β + + + + + = − = = + = = − = + = The condition for stopping the CGM iterative process with preconditioner is /k Ar b b ε− < (3) 5
Parallel Conjugate Gradient Method (PCGM) with preconditioner 1 1 ,C Ax C b− − = max max min min ( ) ( ), ( ) , ( ) ,cond A cond A cond A cond A λ λ λ λ << = = ɶ ɶ ɶ ɶ Ax b= (4) Initial System to be replaced by the system that can be converged by interative method essentially faster. We have the following condition to choose the preconditioner: 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 , , , ( , ) , , , ( , ) ( , ) , , . ( , ) k k k k k k k k k k kk k k k k k k k k k k k k r b Ax p C r z p r z x x p r r Ap Ap p r z z C r p z p r z α α α β β − + + + + + − + + + = − = = = + = = − = = + = The condition for stopping the CGM iterative process with preconditioner is / .k Az b b ε− < (5) min minλ λɶ 6
Parallel Conjugate Gradient Method (PCGM) Parallelization The parallelization of the CGM methods is based on splitting the matrix A by horizontal lines into m blocks, and the solution vector z and the vector of the right-hand part b of the SLAE are divided into m parts so that n=m x L, where n is the dimension of the system of equations, m is the number of processors, and L is the number of lines Fig. 2. Distribution of data among processors lines 7
Parallel Square Root Method (PSRM) = = T A S S= where S is an upper triangular matrix with positive elements on the main diagonal, is the transposed matrixT S This method is based on decomposition of the symmetric positive definite matrix A in the product: , .T S y b Sz y= = 1 1 11 1 1 1 ,, , 2,3,...., ; , 1, 2,...,1. n n nn i n i ki k i ik k k k i i i ii ii z y sy b s b s y y s z y i n z i n n s s − = = + ==     − −   = = = = − −    ∑ ∑ To solve systems, we use the recursion formulas: (6) 8
Parallel Square Root Method parallelization 1 - Processor 2 - processor … m – processor Fig. 2. Distribution of the i-th line among processors 9
Parallel matrix sweep algorithm 10 (7)
(8) (9) 11 (10) (11) (12)
12
Numerical experiments and results The parallel algorithms: conjugate gradient method with preconditioner (PCGM), conjugate gradient method with regularization (PCGR) and parallel square root method (PSRM), were implemented on: • Multi-core Intel processor using the technology of OpenMP, • and the Windows API development tools. The basic data and quasi-model solution of the problem were provided by the Department of Borehole Geophysics of the Institute of Petroleum Geology and Geophysics of the Siberian Branch of RAS (Novosibirsk). 13
Numerical experiments and results Fig 3. Quasy model solution 14
Method Computing system T,sec.(Windows API) T, sec. (OpenMP) PCGM Intel Core I5 (1 core ) 165 145 Intel Core I5 (2 core ) 142 127 Intel Core I5 (4 core ) 118 101 mTmT Computation times Table 1. Computation times of solving SLAE PCGR Intel Core I5 (1 core ) 1805 1804 Intel Core I5 (2 core ) 1490 1450 Intel Core I5 (4 core ) 1288 1230 PSRM Intel Core I5 (1 core ) 26 18 Intel Core I5 (2 core ) 18 13 Intel Core I5 (4 core ) 9 7 15
The numerical solution of the problem is compared with the quasi-model solution by means of calculating the relative error / ,M N M Y Y Yσ = − M Y is the quasi-model solution of the problem N Y is the numerical solution of the problem This problem was solved by the parallel conjugate gradient method with preconditioner, parallel matrix sweep algorithm, and parallel square rootpreconditioner, parallel matrix sweep algorithm, and parallel square root method. The numerical solution of the problem coincides with the quasi-model solution with accuracy 7 7 7 10 , 2 10 , 6 10 .PCGM PCGR PSRMσ σ σ− − − ≈ ≈ ⋅ ≈ ⋅ 11 6 5max max min min ( ) 1.3 10 , 1.4 10 , 1.1 10 0,cond A λ λ λ λ − = ≈ ⋅ = ⋅ = ⋅ > A priori we find the condition number of the original matrix A 16
Conclusion We have proposed the methods of solving the SLAE with block- fivediagonal matrices: 1. Parallel square root method with preconditioner are proposed. 2. Parallel square root method with regularization are proposed. 3. Parallel conjugate gradient method.3. Parallel conjugate gradient method. 4. Parallel matrix sweep algorithm. Algorithms , instead 4 are implemented on the Intel multi-core processor. Investigation of efficiency and optimization of parallel algorithms for solving the problem with quasi-model data are performed. 17
References 1. Dashevsky, J.A, Surodina, I.V., Epov, M.I.: Quasi-three-dimensional mathematical modelling of diagrams of axisymmetric direct current probes in anisotropic profiles. Siberian J. of Industrial Mathematics. Vol. 5, № 3 (11), pp. 76–91 (2002). 2. Gorbachev, I.I., Popov, V.V. & Akimova, E.N. Computer simulation of the diffusion interaction between carbonitride precipitates and austenitic matrix with allowance for the possibility of variation of their composition, The Physics of Metals and Metallography 2006, Volume 102, Issue 1, pp 18–28, http://link.springer.com/article/10.1134/S0031918X06070039 3. Faddeev, V.K., Faddeeva, V.N.: Computational methods of linear algebra. M.: Gos. Isdat.3. Faddeev, V.K., Faddeeva, V.N.: Computational methods of linear algebra. M.: Gos. Isdat. Fizmat. Lit. (1963). 4. Elena N. Akimova, Dmitry V. Belousov: Parallel algorithms for solving linear systems with block-tridiagonal matrices on multi-core CPU with GPU, Journal of Computational Science 2012 Volume 3, Issue 6, Pages 445–449, http://www.sciencedirect.com/science/article/pii/S1877750312000932 5. Akimova E.N. A parallel matrix sweep algorithm for solving linear system with block- fivediagonal matrices // AIP Conf. Proc.1648, 850028. 2015. Rhodes, Greece, 22–28 Sept. 2014. 5 p., http://scitation.aip.org/content/aip/proceeding/aipcp/10.1063/1.4913083 6. Voevodin, Vl.V.: Parallel programming technologies. URL: http://parallel.ru. 7. Samarskii, A.A., Nikolaev E.S.: Methods for solving the grid equations. M. Nauka. (1978). 8. Process and Thread Functions. https://msdn.microsoft.com/en- us/library/windows/desktop/ms684847(v=vs.85).aspx. 18
Thanks for Attention! Contact Dmitry Belousov Elena AkimovaDmitry Belousov Elena Akimova rtfdeamon@mail.ru aen15@yandex.ru T: +79068088664 Institute of Mathematics and Mechanics of UrB RAS 19

Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

  • 1.
    Parallel algorithms forsolving linear systems with block-fivediagonal matrices on multi-core CPU Dmitry Belousov, Elena Akimova Institute of Mathematics and Mechanics of Ural Branch of RAS, Ural Federal University named the first President of Russia B.N.Yeltsin 1
  • 2.
    Systems of linearalgebraic equations (SLAE) with block-fivediagonal matrices arise when solving some mathematical modelling problems, in particular, geoelectrics and diffusion problems. Introduction After using a finite-difference approximation with improved accuracy, the problem is reduced to solving a SLAE with block-fivediagonal matrix [1]. 2
  • 3.
    SLAE 0 0 01 0 2 0 1 0 1 1 1 2 1 3 1 , , С Y D Y E Y F BY C Y DY E Y F  − + =  − + − + =  We consider the system: (1) 2 1 1 2 2 1 1 1 1 1 1 1 1 , 2,..., 1; , , i i i i i i i i i i i N N N N N N N N N N N N N N N N AY BY CY DY EY F i N A Y B Y C Y D Y F A Y B Y C Y F − − + + − − + + − + + + +  − + − + = = −  − + − =  − + = 3
  • 4.
    After a finite-differenceapproximation improved accuracy, the geoelectrics problems and diffusion problems can be reduced to solving SLAE with block- fivediagonal matrices with following DATA STRUCTURE Fig. 1. Block-fivediagonal matrix fivediagonal matrices with following structure which presented in Fig. 1. We are proposing the Parallel Conjugate Gradient Method with regularization (PCMR), conjugate gradient method with preconditioner (PCGM), and parallel square root method (PSRM) to solve such SLAEs. 4
  • 5.
    Parallel Conjugate GradientMethod 0 0 0 0 , ,r b Ax p r= − = We have the systems of the following form: The iterative process for the method has following form: Ax b= ,Ax b A A Eα= = +We add parameters for regularization: (2) 2 1 12 21 1 1 2 2 2 , , , , , ( , ) , . k k k k k k k k k kk k k k k k k k k r b Ax p r r x x p r r Ap Ap p r p r p r α α α β β + + + + + = − = = + = = − = + = The condition for stopping the CGM iterative process with preconditioner is /k Ar b b ε− < (3) 5
  • 6.
    Parallel Conjugate GradientMethod (PCGM) with preconditioner 1 1 ,C Ax C b− − = max max min min ( ) ( ), ( ) , ( ) ,cond A cond A cond A cond A λ λ λ λ << = = ɶ ɶ ɶ ɶ Ax b= (4) Initial System to be replaced by the system that can be converged by interative method essentially faster. We have the following condition to choose the preconditioner: 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 , , , ( , ) , , , ( , ) ( , ) , , . ( , ) k k k k k k k k k k kk k k k k k k k k k k k k r b Ax p C r z p r z x x p r r Ap Ap p r z z C r p z p r z α α α β β − + + + + + − + + + = − = = = + = = − = = + = The condition for stopping the CGM iterative process with preconditioner is / .k Az b b ε− < (5) min minλ λɶ 6
  • 7.
    Parallel Conjugate GradientMethod (PCGM) Parallelization The parallelization of the CGM methods is based on splitting the matrix A by horizontal lines into m blocks, and the solution vector z and the vector of the right-hand part b of the SLAE are divided into m parts so that n=m x L, where n is the dimension of the system of equations, m is the number of processors, and L is the number of lines Fig. 2. Distribution of data among processors lines 7
  • 8.
    Parallel Square RootMethod (PSRM) = = T A S S= where S is an upper triangular matrix with positive elements on the main diagonal, is the transposed matrixT S This method is based on decomposition of the symmetric positive definite matrix A in the product: , .T S y b Sz y= = 1 1 11 1 1 1 ,, , 2,3,...., ; , 1, 2,...,1. n n nn i n i ki k i ik k k k i i i ii ii z y sy b s b s y y s z y i n z i n n s s − = = + ==     − −   = = = = − −    ∑ ∑ To solve systems, we use the recursion formulas: (6) 8
  • 9.
    Parallel Square RootMethod parallelization 1 - Processor 2 - processor … m – processor Fig. 2. Distribution of the i-th line among processors 9
  • 10.
    Parallel matrix sweepalgorithm 10 (7)
  • 11.
  • 12.
  • 13.
    Numerical experiments andresults The parallel algorithms: conjugate gradient method with preconditioner (PCGM), conjugate gradient method with regularization (PCGR) and parallel square root method (PSRM), were implemented on: • Multi-core Intel processor using the technology of OpenMP, • and the Windows API development tools. The basic data and quasi-model solution of the problem were provided by the Department of Borehole Geophysics of the Institute of Petroleum Geology and Geophysics of the Siberian Branch of RAS (Novosibirsk). 13
  • 14.
    Numerical experiments andresults Fig 3. Quasy model solution 14
  • 15.
    Method Computing systemT,sec.(Windows API) T, sec. (OpenMP) PCGM Intel Core I5 (1 core ) 165 145 Intel Core I5 (2 core ) 142 127 Intel Core I5 (4 core ) 118 101 mTmT Computation times Table 1. Computation times of solving SLAE PCGR Intel Core I5 (1 core ) 1805 1804 Intel Core I5 (2 core ) 1490 1450 Intel Core I5 (4 core ) 1288 1230 PSRM Intel Core I5 (1 core ) 26 18 Intel Core I5 (2 core ) 18 13 Intel Core I5 (4 core ) 9 7 15
  • 16.
    The numerical solutionof the problem is compared with the quasi-model solution by means of calculating the relative error / ,M N M Y Y Yσ = − M Y is the quasi-model solution of the problem N Y is the numerical solution of the problem This problem was solved by the parallel conjugate gradient method with preconditioner, parallel matrix sweep algorithm, and parallel square rootpreconditioner, parallel matrix sweep algorithm, and parallel square root method. The numerical solution of the problem coincides with the quasi-model solution with accuracy 7 7 7 10 , 2 10 , 6 10 .PCGM PCGR PSRMσ σ σ− − − ≈ ≈ ⋅ ≈ ⋅ 11 6 5max max min min ( ) 1.3 10 , 1.4 10 , 1.1 10 0,cond A λ λ λ λ − = ≈ ⋅ = ⋅ = ⋅ > A priori we find the condition number of the original matrix A 16
  • 17.
    Conclusion We have proposedthe methods of solving the SLAE with block- fivediagonal matrices: 1. Parallel square root method with preconditioner are proposed. 2. Parallel square root method with regularization are proposed. 3. Parallel conjugate gradient method.3. Parallel conjugate gradient method. 4. Parallel matrix sweep algorithm. Algorithms , instead 4 are implemented on the Intel multi-core processor. Investigation of efficiency and optimization of parallel algorithms for solving the problem with quasi-model data are performed. 17
  • 18.
    References 1. Dashevsky, J.A,Surodina, I.V., Epov, M.I.: Quasi-three-dimensional mathematical modelling of diagrams of axisymmetric direct current probes in anisotropic profiles. Siberian J. of Industrial Mathematics. Vol. 5, № 3 (11), pp. 76–91 (2002). 2. Gorbachev, I.I., Popov, V.V. & Akimova, E.N. Computer simulation of the diffusion interaction between carbonitride precipitates and austenitic matrix with allowance for the possibility of variation of their composition, The Physics of Metals and Metallography 2006, Volume 102, Issue 1, pp 18–28, http://link.springer.com/article/10.1134/S0031918X06070039 3. Faddeev, V.K., Faddeeva, V.N.: Computational methods of linear algebra. M.: Gos. Isdat.3. Faddeev, V.K., Faddeeva, V.N.: Computational methods of linear algebra. M.: Gos. Isdat. Fizmat. Lit. (1963). 4. Elena N. Akimova, Dmitry V. Belousov: Parallel algorithms for solving linear systems with block-tridiagonal matrices on multi-core CPU with GPU, Journal of Computational Science 2012 Volume 3, Issue 6, Pages 445–449, http://www.sciencedirect.com/science/article/pii/S1877750312000932 5. Akimova E.N. A parallel matrix sweep algorithm for solving linear system with block- fivediagonal matrices // AIP Conf. Proc.1648, 850028. 2015. Rhodes, Greece, 22–28 Sept. 2014. 5 p., http://scitation.aip.org/content/aip/proceeding/aipcp/10.1063/1.4913083 6. Voevodin, Vl.V.: Parallel programming technologies. URL: http://parallel.ru. 7. Samarskii, A.A., Nikolaev E.S.: Methods for solving the grid equations. M. Nauka. (1978). 8. Process and Thread Functions. https://msdn.microsoft.com/en- us/library/windows/desktop/ms684847(v=vs.85).aspx. 18
  • 19.
    Thanks for Attention! Contact DmitryBelousov Elena AkimovaDmitry Belousov Elena Akimova rtfdeamon@mail.ru aen15@yandex.ru T: +79068088664 Institute of Mathematics and Mechanics of UrB RAS 19