TY - JOUR
T1 - Hybrid OpenMP/AVX acceleration of a Split HLL Finite Volume Method for the Shallow Water and Euler Equations
AU - Liu, Ji Yueh
AU - Smith, Matthew R.
AU - Kuo, Fang An
AU - Wu, Jong Shin
N1 - Funding Information:
The authors acknowledge financial support from the Taiwan National Science Council (Grant Number NSC 99-2221-E-492 -005 -MY3 and NSC 102-2221-E-006 -115 ). We are also grateful to Acer, Nvidia and Intel for various loaned and donated hardware components.
Publisher Copyright:
© 2014 Elsevier Ltd.
PY - 2015/3/1
Y1 - 2015/3/1
N2 - Presented is the application of the Split Harten, Lax and van Leer (SHLL) technique applied to parallel computation using a hybrid OpenMP/AVX parallelization paradigm for the Shallow Water Equations and Euler Equations. The key behind the ease of parallelization of the SHLL method for both governing equations is the mathematical/vector splitting in each coordinate direction - this splitting results in a high degree of locality, producing a scheme which is embarrassingly parallel and well suited for the vectorization capacities offered by vector-computing architectures. Here we demonstrate this capacity using the SIMD capacities of modern CPUs, namely the Advanced Vector eXtensions (AVX) capability of recent CPUs. The main feature of AVX is the capacity to perform SIMD operations on 8 floating point variables in parallel - an increase from 4 floating point variables as possible using the previous SIMD Streaming Extensions (SSE). Furthermore, since modern CPU's employ a large number of cores, we further extend the performance by using AVX on each available CPU core using shared memory (OpenMP) parallelization. We present a direction-split higher order extension to both the SHLL method and apply it to AVX through the use of intrinsic functions in the flux computation and state computation modules. High performance is obtained by ensuring that all flux computations are performed using only AVX intrinsic functions - no computations are performed in serial. Through this approach, a single workstation with 2× Xeon CPU's (16 physical cores) allows a performance increase of over 117 times that of a single core alone in the flux evaluation kernel.
AB - Presented is the application of the Split Harten, Lax and van Leer (SHLL) technique applied to parallel computation using a hybrid OpenMP/AVX parallelization paradigm for the Shallow Water Equations and Euler Equations. The key behind the ease of parallelization of the SHLL method for both governing equations is the mathematical/vector splitting in each coordinate direction - this splitting results in a high degree of locality, producing a scheme which is embarrassingly parallel and well suited for the vectorization capacities offered by vector-computing architectures. Here we demonstrate this capacity using the SIMD capacities of modern CPUs, namely the Advanced Vector eXtensions (AVX) capability of recent CPUs. The main feature of AVX is the capacity to perform SIMD operations on 8 floating point variables in parallel - an increase from 4 floating point variables as possible using the previous SIMD Streaming Extensions (SSE). Furthermore, since modern CPU's employ a large number of cores, we further extend the performance by using AVX on each available CPU core using shared memory (OpenMP) parallelization. We present a direction-split higher order extension to both the SHLL method and apply it to AVX through the use of intrinsic functions in the flux computation and state computation modules. High performance is obtained by ensuring that all flux computations are performed using only AVX intrinsic functions - no computations are performed in serial. Through this approach, a single workstation with 2× Xeon CPU's (16 physical cores) allows a performance increase of over 117 times that of a single core alone in the flux evaluation kernel.
UR - http://www.scopus.com/inward/record.url?scp=85027917120&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027917120&partnerID=8YFLogxK
U2 - 10.1016/j.compfluid.2014.11.011
DO - 10.1016/j.compfluid.2014.11.011
M3 - Article
AN - SCOPUS:85027917120
SN - 0045-7930
VL - 110
SP - 181
EP - 188
JO - Computers and Fluids
JF - Computers and Fluids
ER -