Hybrid OpenMP/AVX Acceleration of a Split Harten Lax and van Leer Method for the Euler Equations

  • 劉 季燁

Student thesis: Master's Thesis


Presented is the Split Harten Lax and van Leer (SHLL) method applied to parallel computation using a hybrid OpenMP/AVX (Advanced Vector eXtensions) parallelization paradigm The governing equations in each directional coordinate of SHLL method have been mathematically vector split for the purpose of parallelization This splitting results in a high degree of locality and has been previously successfully applied to parallel computation using Graphics Processing Units (GPU) The same principles which allow high performance on GPU devices also permit high performance using AVX as demonstrated in the present study The High performance was obtained by ensuring that all flux computations were performed using only AVX intrinsic functions with no computations performed in serial The major feature of AVX is the capacity to perform SIMD operations on 8 floating point variables in parallel which is an extension on the previous SIMD Streaming Extensions (SSE) using 4 floating point variables Through the use of intrinsic functions these 8 parallel computation registers may be treated as 8 additional computing core per each physical core Since modern CPU’s employ a large number of physical cores the performance can be further extended by using all 8 AVX computational registers on each available CPU core using shared memory OpenMP parallelization effectively employing 8P cores where P is the number of actual physical cores available In addition to the development of this highly efficient parallel computing tool the SHLL equations have been reformulated and are shown to possess two dissipation coefficients as opposed to the single dissipation coefficient previously believed present The ideal dissipation coefficients ?_1 and ?_2 of SHLL scheme for one dimensional problems including the Shock-tube and Shock-acoustic wave interaction problems were investigated through the use of error analysis Additionally several two dimensional problems including the Euler-four-shock interaction and Euler-four-contact interaction problem are also shown and discussed with regards to the ideal dissipation coefficients Careful manipulation of these coefficients leads to performance approaching 4th order spatial accuracy and results are shown to be an improvement upon previously published 3rd order accurate techniques The parallel performance for various problems in first and second order using a varying number of cells are presented using a single workstation with dual Xeon CPU’s (16 physical cores) Intel i7-3930K and Intel i3-3220K The best reported speedup – the computational performance compared to a single core - was over 326 times using dual E52670 Xeon CPU’s when computing the flux evaluation kernel
Date of Award2014 Jun 25
Original languageEnglish
SupervisorMatt-Hew Smith (Supervisor)

Cite this