TY - GEN
T1 - A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs
AU - Guo, Ping
AU - Huang, He
AU - Chen, Qichang
AU - Wang, Liqiang
AU - Lee, En-Jui
AU - Chen, Po
PY - 2011/9/7
Y1 - 2011/9/7
N2 - Sparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an innovative performance-model driven approach for partitioning sparse matrix into appropriate formats, and auto-tuning configurations of CUDA kernels to improve the performance of SpMV on GPUs. This paper makes the following contributions: (1) Propose an empirical CUDA performance model to predict the execution time of SpMV CUDA kernels. (2) Design and implement a model-driven partitioning framework to predict how to partition the target sparse matrix into one or more partitions and transform each partition into appropriate storage format, which is based on the fact that the different storage formats of sparse matrix can significantly affect the performance of SpMV. (3) Integrate the model-driven partitioning with our previous auto-tuning framework to automatically adjust CUDA-specific parameters to optimize performance on specific GPUs. Compared to the NVIDIA's existing implementations, our approach shows a substantial performance improvement. It has 222%, 197%, and 33% performance improvement on the average for CSR vector kernel, ELL kernel and HYB kernel, respectively.
AB - Sparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an innovative performance-model driven approach for partitioning sparse matrix into appropriate formats, and auto-tuning configurations of CUDA kernels to improve the performance of SpMV on GPUs. This paper makes the following contributions: (1) Propose an empirical CUDA performance model to predict the execution time of SpMV CUDA kernels. (2) Design and implement a model-driven partitioning framework to predict how to partition the target sparse matrix into one or more partitions and transform each partition into appropriate storage format, which is based on the fact that the different storage formats of sparse matrix can significantly affect the performance of SpMV. (3) Integrate the model-driven partitioning with our previous auto-tuning framework to automatically adjust CUDA-specific parameters to optimize performance on specific GPUs. Compared to the NVIDIA's existing implementations, our approach shows a substantial performance improvement. It has 222%, 197%, and 33% performance improvement on the average for CSR vector kernel, ELL kernel and HYB kernel, respectively.
UR - http://www.scopus.com/inward/record.url?scp=80052311496&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052311496&partnerID=8YFLogxK
U2 - 10.1145/2016741.2016744
DO - 10.1145/2016741.2016744
M3 - Conference contribution
AN - SCOPUS:80052311496
SN - 9781450308885
T3 - Proceedings of the TeraGrid 2011 Conference: Extreme Digital Discovery, TG'11
BT - Proceedings of the TeraGrid 2011 Conference
T2 - TeraGrid 2011 Conference: Extreme Digital Discovery, TG'11
Y2 - 18 July 2011 through 21 July 2011
ER -