Adaptive block size for dense QR factorization in hybrid CPU-GPU systems via statistical modeling

Ray Bing Chen, Yaohung M. Tsai, Weichung Wang

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)


QR factorization is a computational kernel of scientific computing. How can the latest computer be used to accelerate this task? We investigate this topic by proposing a dense QR factorization algorithm with adaptive block sizes on a hybrid system that contains a central processing unit (CPU) and a graphic processing unit (GPU). To maximize the use of CPU and GPU, we develop an adaptive scheme that chooses block size at each iteration. The decision is based on statistical surrogate models of performance and an online monitor, which avoids unexpected occasional performance drops. We modify the highly optimized CPU-GPU based QR factorization in MAGMA to implement the proposed schemes. Numerical results suggest that our approaches are efficient and can lead to near-optimal block sizes. The proposed algorithm can be extended to other one-sided factorizations, such as LU and Cholesky factorizations.

Original languageEnglish
Pages (from-to)70-85
Number of pages16
JournalParallel Computing
Issue number5-6
Publication statusPublished - 2014 May

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence


Dive into the research topics of 'Adaptive block size for dense QR factorization in hybrid CPU-GPU systems via statistical modeling'. Together they form a unique fingerprint.

Cite this