Hybrid OpenMP/AVX acceleration of a higher order quiet direct simulation method for the euler equations

Matthew R. Smith, Ji Yueh Liu, Fang An Kuo, Jong Shin Wu

Research output: Contribution to journalConference article

2 Citations (Scopus)

Abstract

Presented is the Quiet Direct Simulation (QDS) applied to parallel computation using a hybrid OpenMP/AVX parallelization paradigm. Due to the high locality of the QDS scheme, the method has been successfully applied to parallel computation using Graphics Processing Units (GPU) - we show here that the same principles which allow high performance on GPU devices also permit high performance when using Advanced Vector extensions (AVX). Furthermore, since modern CPU's employ a large number of cores, we can further extend the performance by using AVX on each available CPU core using shared memory (OpenMP) parallelization. We present a simple direction-split higher order extension to the QDS method, and then apply it to AVX through the use of intrinsic functions in the flux computation and state computation modules. High performance is obtained by ensuring that all flux computations are performed using only AVX intrinsic functions - no computations are performed in serial. Through this approach, a single workstation with 2x Xeon CPU's (16 physical cores) allows a performance increase of over 177 times that of a single core alone. We also demonstrate that built-in optimization does not fully exploit AVX parallelization through the examination of assembly code.

Original languageEnglish
Pages (from-to)152-157
Number of pages6
JournalProcedia Engineering
Volume61
DOIs
Publication statusPublished - 2013 Jan 1
Event25th International Conference on Parallel Computational Fluid Dynamics, ParCFD 2013 - Changsha, China
Duration: 2013 May 202013 May 24

Fingerprint

Euler equations
Program processors
Fluxes
Computer workstations
Data storage equipment

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Smith, Matthew R. ; Liu, Ji Yueh ; Kuo, Fang An ; Wu, Jong Shin. / Hybrid OpenMP/AVX acceleration of a higher order quiet direct simulation method for the euler equations. In: Procedia Engineering. 2013 ; Vol. 61. pp. 152-157.
@article{8c39f94a2bb84678b748415f036e5797,
title = "Hybrid OpenMP/AVX acceleration of a higher order quiet direct simulation method for the euler equations",
abstract = "Presented is the Quiet Direct Simulation (QDS) applied to parallel computation using a hybrid OpenMP/AVX parallelization paradigm. Due to the high locality of the QDS scheme, the method has been successfully applied to parallel computation using Graphics Processing Units (GPU) - we show here that the same principles which allow high performance on GPU devices also permit high performance when using Advanced Vector extensions (AVX). Furthermore, since modern CPU's employ a large number of cores, we can further extend the performance by using AVX on each available CPU core using shared memory (OpenMP) parallelization. We present a simple direction-split higher order extension to the QDS method, and then apply it to AVX through the use of intrinsic functions in the flux computation and state computation modules. High performance is obtained by ensuring that all flux computations are performed using only AVX intrinsic functions - no computations are performed in serial. Through this approach, a single workstation with 2x Xeon CPU's (16 physical cores) allows a performance increase of over 177 times that of a single core alone. We also demonstrate that built-in optimization does not fully exploit AVX parallelization through the examination of assembly code.",
author = "Smith, {Matthew R.} and Liu, {Ji Yueh} and Kuo, {Fang An} and Wu, {Jong Shin}",
year = "2013",
month = "1",
day = "1",
doi = "10.1016/j.proeng.2013.07.108",
language = "English",
volume = "61",
pages = "152--157",
journal = "Procedia Engineering",
issn = "1877-7058",
publisher = "Elsevier BV",

}

Hybrid OpenMP/AVX acceleration of a higher order quiet direct simulation method for the euler equations. / Smith, Matthew R.; Liu, Ji Yueh; Kuo, Fang An; Wu, Jong Shin.

In: Procedia Engineering, Vol. 61, 01.01.2013, p. 152-157.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Hybrid OpenMP/AVX acceleration of a higher order quiet direct simulation method for the euler equations

AU - Smith, Matthew R.

AU - Liu, Ji Yueh

AU - Kuo, Fang An

AU - Wu, Jong Shin

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Presented is the Quiet Direct Simulation (QDS) applied to parallel computation using a hybrid OpenMP/AVX parallelization paradigm. Due to the high locality of the QDS scheme, the method has been successfully applied to parallel computation using Graphics Processing Units (GPU) - we show here that the same principles which allow high performance on GPU devices also permit high performance when using Advanced Vector extensions (AVX). Furthermore, since modern CPU's employ a large number of cores, we can further extend the performance by using AVX on each available CPU core using shared memory (OpenMP) parallelization. We present a simple direction-split higher order extension to the QDS method, and then apply it to AVX through the use of intrinsic functions in the flux computation and state computation modules. High performance is obtained by ensuring that all flux computations are performed using only AVX intrinsic functions - no computations are performed in serial. Through this approach, a single workstation with 2x Xeon CPU's (16 physical cores) allows a performance increase of over 177 times that of a single core alone. We also demonstrate that built-in optimization does not fully exploit AVX parallelization through the examination of assembly code.

AB - Presented is the Quiet Direct Simulation (QDS) applied to parallel computation using a hybrid OpenMP/AVX parallelization paradigm. Due to the high locality of the QDS scheme, the method has been successfully applied to parallel computation using Graphics Processing Units (GPU) - we show here that the same principles which allow high performance on GPU devices also permit high performance when using Advanced Vector extensions (AVX). Furthermore, since modern CPU's employ a large number of cores, we can further extend the performance by using AVX on each available CPU core using shared memory (OpenMP) parallelization. We present a simple direction-split higher order extension to the QDS method, and then apply it to AVX through the use of intrinsic functions in the flux computation and state computation modules. High performance is obtained by ensuring that all flux computations are performed using only AVX intrinsic functions - no computations are performed in serial. Through this approach, a single workstation with 2x Xeon CPU's (16 physical cores) allows a performance increase of over 177 times that of a single core alone. We also demonstrate that built-in optimization does not fully exploit AVX parallelization through the examination of assembly code.

UR - http://www.scopus.com/inward/record.url?scp=84891707934&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84891707934&partnerID=8YFLogxK

U2 - 10.1016/j.proeng.2013.07.108

DO - 10.1016/j.proeng.2013.07.108

M3 - Conference article

AN - SCOPUS:84891707934

VL - 61

SP - 152

EP - 157

JO - Procedia Engineering

JF - Procedia Engineering

SN - 1877-7058

ER -