Adaptive block size for dense QR factorization in hybrid CPU-GPU systems via statistical modeling

Ray-Bing Chen, Yaohung M. Tsai, Weichung Wang

研究成果: Article同行評審

7 引文 斯高帕斯(Scopus)

摘要

QR factorization is a computational kernel of scientific computing. How can the latest computer be used to accelerate this task? We investigate this topic by proposing a dense QR factorization algorithm with adaptive block sizes on a hybrid system that contains a central processing unit (CPU) and a graphic processing unit (GPU). To maximize the use of CPU and GPU, we develop an adaptive scheme that chooses block size at each iteration. The decision is based on statistical surrogate models of performance and an online monitor, which avoids unexpected occasional performance drops. We modify the highly optimized CPU-GPU based QR factorization in MAGMA to implement the proposed schemes. Numerical results suggest that our approaches are efficient and can lead to near-optimal block sizes. The proposed algorithm can be extended to other one-sided factorizations, such as LU and Cholesky factorizations.

原文English
頁(從 - 到)70-85
頁數16
期刊Parallel Computing
40
發行號5-6
DOIs
出版狀態Published - 2014 一月 1

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence

指紋 深入研究「Adaptive block size for dense QR factorization in hybrid CPU-GPU systems via statistical modeling」主題。共同形成了獨特的指紋。

引用此