A Halide-based Synergistic Computing Framework for Heterogeneous Systems

Shih Wei Liao, Shao Yun Kuang, Chia Lung Kao, ChiaHeng Tu

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for different processors of the machine, e.g., OpenMP for the CPU and CUDA for the GPU, the new ones tend to offer a unified programming model to abstract details of heterogeneous computing engines. One such programming model is Halide that is designed for high performance image processing. Halide programmers are allowed to map data and computation to either the CPUs or GPUs through high-level C++ functions, which are converted to various code targets, including x86, ARM, CUDA, and OpenCL, by the Halide compiler. Nevertheless, it becomes complex when the programmers attempt to write a Halide program for cooperative computation on both the CPU and GPU. In this work, we propose the synergistic computing framework that extends Halide to improve program execution performance. Several key issues are tackled, including data coherence, workload partitioning, job dispatching and communication/synchronization, so that the Halide programmers are allowed to take advantage of the heterogeneous computing engines with the two developed C++ classes, one is for static workload partitioning/dispatching and the other is the dynamic counterpart. Furthermore, optimizations are developed to improve performance by generating adequate the CPU code, and eliminating extra memory copies. We characterize and discuss the performance of two image processing programs and our framework on the heterogeneous platforms, i.e., Android Nexus 7 smartphone and x86-based computers. Our results show that significant performance gain can be achieved while the CPU and GPU execute a program synergistically with the proposed framework.

Original languageEnglish
Pages (from-to)219-233
Number of pages15
JournalJournal of Signal Processing Systems
Volume91
Issue number3-4
DOIs
Publication statusPublished - 2019 Mar 1

Fingerprint

Heterogeneous Systems
Program processors
Programming Model
Heterogeneous Computing
Dispatching
Computing
C++
Workload
Partitioning
Image Processing
Engine
OpenMP
Compiler
Image processing
Field Programmable Gate Array
Synchronization
Programming
Engines
High Performance
Tend

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Theoretical Computer Science
  • Signal Processing
  • Information Systems
  • Modelling and Simulation
  • Hardware and Architecture

Cite this

Liao, Shih Wei ; Kuang, Shao Yun ; Kao, Chia Lung ; Tu, ChiaHeng. / A Halide-based Synergistic Computing Framework for Heterogeneous Systems. In: Journal of Signal Processing Systems. 2019 ; Vol. 91, No. 3-4. pp. 219-233.
@article{cc8dba840fd146f2a9d339163a2d492c,
title = "A Halide-based Synergistic Computing Framework for Heterogeneous Systems",
abstract = "New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for different processors of the machine, e.g., OpenMP for the CPU and CUDA for the GPU, the new ones tend to offer a unified programming model to abstract details of heterogeneous computing engines. One such programming model is Halide that is designed for high performance image processing. Halide programmers are allowed to map data and computation to either the CPUs or GPUs through high-level C++ functions, which are converted to various code targets, including x86, ARM, CUDA, and OpenCL, by the Halide compiler. Nevertheless, it becomes complex when the programmers attempt to write a Halide program for cooperative computation on both the CPU and GPU. In this work, we propose the synergistic computing framework that extends Halide to improve program execution performance. Several key issues are tackled, including data coherence, workload partitioning, job dispatching and communication/synchronization, so that the Halide programmers are allowed to take advantage of the heterogeneous computing engines with the two developed C++ classes, one is for static workload partitioning/dispatching and the other is the dynamic counterpart. Furthermore, optimizations are developed to improve performance by generating adequate the CPU code, and eliminating extra memory copies. We characterize and discuss the performance of two image processing programs and our framework on the heterogeneous platforms, i.e., Android Nexus 7 smartphone and x86-based computers. Our results show that significant performance gain can be achieved while the CPU and GPU execute a program synergistically with the proposed framework.",
author = "Liao, {Shih Wei} and Kuang, {Shao Yun} and Kao, {Chia Lung} and ChiaHeng Tu",
year = "2019",
month = "3",
day = "1",
doi = "10.1007/s11265-017-1283-1",
language = "English",
volume = "91",
pages = "219--233",
journal = "Journal of Signal Processing Systems",
issn = "1939-8018",
publisher = "Springer New York",
number = "3-4",

}

A Halide-based Synergistic Computing Framework for Heterogeneous Systems. / Liao, Shih Wei; Kuang, Shao Yun; Kao, Chia Lung; Tu, ChiaHeng.

In: Journal of Signal Processing Systems, Vol. 91, No. 3-4, 01.03.2019, p. 219-233.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A Halide-based Synergistic Computing Framework for Heterogeneous Systems

AU - Liao, Shih Wei

AU - Kuang, Shao Yun

AU - Kao, Chia Lung

AU - Tu, ChiaHeng

PY - 2019/3/1

Y1 - 2019/3/1

N2 - New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for different processors of the machine, e.g., OpenMP for the CPU and CUDA for the GPU, the new ones tend to offer a unified programming model to abstract details of heterogeneous computing engines. One such programming model is Halide that is designed for high performance image processing. Halide programmers are allowed to map data and computation to either the CPUs or GPUs through high-level C++ functions, which are converted to various code targets, including x86, ARM, CUDA, and OpenCL, by the Halide compiler. Nevertheless, it becomes complex when the programmers attempt to write a Halide program for cooperative computation on both the CPU and GPU. In this work, we propose the synergistic computing framework that extends Halide to improve program execution performance. Several key issues are tackled, including data coherence, workload partitioning, job dispatching and communication/synchronization, so that the Halide programmers are allowed to take advantage of the heterogeneous computing engines with the two developed C++ classes, one is for static workload partitioning/dispatching and the other is the dynamic counterpart. Furthermore, optimizations are developed to improve performance by generating adequate the CPU code, and eliminating extra memory copies. We characterize and discuss the performance of two image processing programs and our framework on the heterogeneous platforms, i.e., Android Nexus 7 smartphone and x86-based computers. Our results show that significant performance gain can be achieved while the CPU and GPU execute a program synergistically with the proposed framework.

AB - New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for different processors of the machine, e.g., OpenMP for the CPU and CUDA for the GPU, the new ones tend to offer a unified programming model to abstract details of heterogeneous computing engines. One such programming model is Halide that is designed for high performance image processing. Halide programmers are allowed to map data and computation to either the CPUs or GPUs through high-level C++ functions, which are converted to various code targets, including x86, ARM, CUDA, and OpenCL, by the Halide compiler. Nevertheless, it becomes complex when the programmers attempt to write a Halide program for cooperative computation on both the CPU and GPU. In this work, we propose the synergistic computing framework that extends Halide to improve program execution performance. Several key issues are tackled, including data coherence, workload partitioning, job dispatching and communication/synchronization, so that the Halide programmers are allowed to take advantage of the heterogeneous computing engines with the two developed C++ classes, one is for static workload partitioning/dispatching and the other is the dynamic counterpart. Furthermore, optimizations are developed to improve performance by generating adequate the CPU code, and eliminating extra memory copies. We characterize and discuss the performance of two image processing programs and our framework on the heterogeneous platforms, i.e., Android Nexus 7 smartphone and x86-based computers. Our results show that significant performance gain can be achieved while the CPU and GPU execute a program synergistically with the proposed framework.

UR - http://www.scopus.com/inward/record.url?scp=85029574848&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029574848&partnerID=8YFLogxK

U2 - 10.1007/s11265-017-1283-1

DO - 10.1007/s11265-017-1283-1

M3 - Article

VL - 91

SP - 219

EP - 233

JO - Journal of Signal Processing Systems

JF - Journal of Signal Processing Systems

SN - 1939-8018

IS - 3-4

ER -