An MPI-CUDA implementation and optimization for parallel Sparse Equations and Least Squares (LSQR)

He Huang, Liqiang Wang, En Jui Lee, Po Chen

研究成果: Conference article

22 引文 斯高帕斯(Scopus)

摘要

LSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve large-scale linear systems in seismic tomography. This paper presents a parallel MPI-CUDA implementation for LSQR solver. On CUDA level, our contributions include: (1) utilize CUBLAS and CUSPARSE to compute major steps in LSQR; (2) optimize memory copy between host memory and device memory; (3) develop a CUDA kernel to perform transpose SpMV without transposing the matrix in memory or preserving additional copy. On MPI level, our contributions include: (1) decompose both matrix and vector to increase parallelism; (2) design a static load balancing strategy. In our experiment, the single GPU code achieves up to 17.6× speedup with 15.7 GFlops in single precision and 15.2× speedup with 12.0 GFlops in double precision compared with the original serial CPU code. The MPI-GPU code achieves up to 3.7× speedup with 268 GFlops in single precision and 3.8× speedup with 223 GFlops in double precision on 135 MPI tasks compared with the corresponding MPI-CPU code. The MPI-GPU code scales on both strong and weak scaling tests. In addition, our parallel implementations have better performance than the LSQR subroutine in PETSc library.

原文English
頁(從 - 到)76-85
頁數10
期刊Procedia Computer Science
9
DOIs
出版狀態Published - 2012 一月 1
事件12th Annual International Conference on Computational Science, ICCS 2012 - Omaha, NB, United States
持續時間: 2012 六月 42012 六月 6

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

指紋 深入研究「An MPI-CUDA implementation and optimization for parallel Sparse Equations and Least Squares (LSQR)」主題。共同形成了獨特的指紋。

  • 引用此