Optimization of stride prefetching mechanism and dependent warp scheduling on GPGPU

Tsung Han Tsou, Dun Jie Chen, Sheng Yang Hung, Yu Hsiang Wang, Chung Ho Chen

研究成果: Conference contribution

摘要

In this paper, we propose a data prefetching scheme, History-Awoken Stride (HAS) prefetching, optimized with a warp scheduler, Prefetched-Then-Executed (PTE), and evaluate the performance on the platform that we developed. Our platform is a single instruction, multiple thread (SIMT) GPGPU environment, supporting OpenCL 1.2 runtime and TensorFlow framework with CUDA-on-CL technology. Enormous amount of executing threads in GPU demands critical memory performance. HAS exploits history table of related memory accesses in intra-warp and inter-warp of the same workgroup as well as among workgroups, and uses address strides and warp status to monitor the prefetching progress of the executed warp. PTE precisely issues warps according to prefetching status from HAS. The experimental results of LeNet-5 inference and 11 PolyBench test programs on CAS-GPU show that our mechanism can achieve an average IPC performance improvement of 10.4%, and 7.8% reduction in data cache miss rate. The prefetch accuracy can reach 67.7%, and the proportion of prefetch request arrived at the appropriate time reaches 48.2%.

原文English
主出版物標題2020 IEEE International Symposium on Circuits and Systems, ISCAS 2020 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9781728133201
出版狀態Published - 2020
事件52nd IEEE International Symposium on Circuits and Systems, ISCAS 2020 - Virtual, Online
持續時間: 2020 10月 102020 10月 21

出版系列

名字Proceedings - IEEE International Symposium on Circuits and Systems
2020-October
ISSN(列印)0271-4310

Conference

Conference52nd IEEE International Symposium on Circuits and Systems, ISCAS 2020
城市Virtual, Online
期間20-10-1020-10-21

All Science Journal Classification (ASJC) codes

  • 電氣與電子工程

指紋

深入研究「Optimization of stride prefetching mechanism and dependent warp scheduling on GPGPU」主題。共同形成了獨特的指紋。

引用此