TY - JOUR
T1 - A Halide-based Synergistic Computing Framework for Heterogeneous Systems
AU - Liao, Shih Wei
AU - Kuang, Shao Yun
AU - Kao, Chia Lung
AU - Tu, Chia Heng
N1 - Publisher Copyright:
© 2017, Springer Science+Business Media, LLC.
PY - 2019/3/1
Y1 - 2019/3/1
N2 - New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for different processors of the machine, e.g., OpenMP for the CPU and CUDA for the GPU, the new ones tend to offer a unified programming model to abstract details of heterogeneous computing engines. One such programming model is Halide that is designed for high performance image processing. Halide programmers are allowed to map data and computation to either the CPUs or GPUs through high-level C++ functions, which are converted to various code targets, including x86, ARM, CUDA, and OpenCL, by the Halide compiler. Nevertheless, it becomes complex when the programmers attempt to write a Halide program for cooperative computation on both the CPU and GPU. In this work, we propose the synergistic computing framework that extends Halide to improve program execution performance. Several key issues are tackled, including data coherence, workload partitioning, job dispatching and communication/synchronization, so that the Halide programmers are allowed to take advantage of the heterogeneous computing engines with the two developed C++ classes, one is for static workload partitioning/dispatching and the other is the dynamic counterpart. Furthermore, optimizations are developed to improve performance by generating adequate the CPU code, and eliminating extra memory copies. We characterize and discuss the performance of two image processing programs and our framework on the heterogeneous platforms, i.e., Android Nexus 7 smartphone and x86-based computers. Our results show that significant performance gain can be achieved while the CPU and GPU execute a program synergistically with the proposed framework.
AB - New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for different processors of the machine, e.g., OpenMP for the CPU and CUDA for the GPU, the new ones tend to offer a unified programming model to abstract details of heterogeneous computing engines. One such programming model is Halide that is designed for high performance image processing. Halide programmers are allowed to map data and computation to either the CPUs or GPUs through high-level C++ functions, which are converted to various code targets, including x86, ARM, CUDA, and OpenCL, by the Halide compiler. Nevertheless, it becomes complex when the programmers attempt to write a Halide program for cooperative computation on both the CPU and GPU. In this work, we propose the synergistic computing framework that extends Halide to improve program execution performance. Several key issues are tackled, including data coherence, workload partitioning, job dispatching and communication/synchronization, so that the Halide programmers are allowed to take advantage of the heterogeneous computing engines with the two developed C++ classes, one is for static workload partitioning/dispatching and the other is the dynamic counterpart. Furthermore, optimizations are developed to improve performance by generating adequate the CPU code, and eliminating extra memory copies. We characterize and discuss the performance of two image processing programs and our framework on the heterogeneous platforms, i.e., Android Nexus 7 smartphone and x86-based computers. Our results show that significant performance gain can be achieved while the CPU and GPU execute a program synergistically with the proposed framework.
UR - http://www.scopus.com/inward/record.url?scp=85029574848&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029574848&partnerID=8YFLogxK
U2 - 10.1007/s11265-017-1283-1
DO - 10.1007/s11265-017-1283-1
M3 - Article
AN - SCOPUS:85029574848
SN - 1939-8018
VL - 91
SP - 219
EP - 233
JO - Journal of Signal Processing Systems
JF - Journal of Signal Processing Systems
IS - 3-4
ER -