DOcyclical: A Latency-Resistant Cyclic Multi-Threading Approach for Automatic Program Parallelization

Hairong Yu, Guohui Li, Jianjun Li, Lihchyun Shu

Research output: Contribution to journalArticle

Abstract

Chip multiprocessors have been proposed for many years and have become the prevalent architecture for high-performance general-purpose processors. Currently, the search for automatic parallelization techniques that can take full advantage of processor resources is still an active research area. The cyclic multi-threading (CMT) approach, a popular parallelization paradigm, is widely applicable to many applications and delivers good performance scalability. Despite so, its performance could be quite sensitive to fluctuations in communication latencies without substantive operations that prefetch synchronization signals. To address this problem, we propose a novel CMT technique called ${rm DO}- rm cyclical}}$ that employs a priority-based scheme to reduce greatly the frequency of cross-core loop-carried dependences, hence removes considerable amount of communication latency from critical paths of loop executions. Further, it is the priority-based scheme that keeps all processors busy most of time while maintaining processor load balanced. To demonstrate the capacities of $rm DO rm cyclical}}$, we have evaluated it by using the SPEC CPU2006 and StreamIt benchmarks on three real platforms. Experimental results show that our method is much less sensitive to fluctuations in communication latencies, compared with traditional cyclical multi-threading techniques. Besides, $rm DO rm cyclical}}$ outperforms other well-known parallelization methods, including decoupled software pipelines (DSWP), PS-DSWP and HELIX, in terms of speedup by 21-50, 16-27 and 15-25%, respectively, on the three platforms.

Original languageEnglish
Pages (from-to)1155-1173
Number of pages19
JournalComputer Journal
Volume59
Issue number8
DOIs
Publication statusPublished - 2016 Aug 1

Fingerprint

Communication
Pipelines
Scalability
Synchronization

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

@article{8cad0e8fbd5a45b8b30657ebfe6df97d,
title = "DOcyclical: A Latency-Resistant Cyclic Multi-Threading Approach for Automatic Program Parallelization",
abstract = "Chip multiprocessors have been proposed for many years and have become the prevalent architecture for high-performance general-purpose processors. Currently, the search for automatic parallelization techniques that can take full advantage of processor resources is still an active research area. The cyclic multi-threading (CMT) approach, a popular parallelization paradigm, is widely applicable to many applications and delivers good performance scalability. Despite so, its performance could be quite sensitive to fluctuations in communication latencies without substantive operations that prefetch synchronization signals. To address this problem, we propose a novel CMT technique called ${rm DO}- rm cyclical}}$ that employs a priority-based scheme to reduce greatly the frequency of cross-core loop-carried dependences, hence removes considerable amount of communication latency from critical paths of loop executions. Further, it is the priority-based scheme that keeps all processors busy most of time while maintaining processor load balanced. To demonstrate the capacities of $rm DO rm cyclical}}$, we have evaluated it by using the SPEC CPU2006 and StreamIt benchmarks on three real platforms. Experimental results show that our method is much less sensitive to fluctuations in communication latencies, compared with traditional cyclical multi-threading techniques. Besides, $rm DO rm cyclical}}$ outperforms other well-known parallelization methods, including decoupled software pipelines (DSWP), PS-DSWP and HELIX, in terms of speedup by 21-50, 16-27 and 15-25{\%}, respectively, on the three platforms.",
author = "Hairong Yu and Guohui Li and Jianjun Li and Lihchyun Shu",
year = "2016",
month = "8",
day = "1",
doi = "10.1093/comjnl/bxv125",
language = "English",
volume = "59",
pages = "1155--1173",
journal = "Computer Journal",
issn = "0010-4620",
publisher = "Oxford University Press",
number = "8",

}

DOcyclical : A Latency-Resistant Cyclic Multi-Threading Approach for Automatic Program Parallelization. / Yu, Hairong; Li, Guohui; Li, Jianjun; Shu, Lihchyun.

In: Computer Journal, Vol. 59, No. 8, 01.08.2016, p. 1155-1173.

Research output: Contribution to journalArticle

TY - JOUR

T1 - DOcyclical

T2 - A Latency-Resistant Cyclic Multi-Threading Approach for Automatic Program Parallelization

AU - Yu, Hairong

AU - Li, Guohui

AU - Li, Jianjun

AU - Shu, Lihchyun

PY - 2016/8/1

Y1 - 2016/8/1

N2 - Chip multiprocessors have been proposed for many years and have become the prevalent architecture for high-performance general-purpose processors. Currently, the search for automatic parallelization techniques that can take full advantage of processor resources is still an active research area. The cyclic multi-threading (CMT) approach, a popular parallelization paradigm, is widely applicable to many applications and delivers good performance scalability. Despite so, its performance could be quite sensitive to fluctuations in communication latencies without substantive operations that prefetch synchronization signals. To address this problem, we propose a novel CMT technique called ${rm DO}- rm cyclical}}$ that employs a priority-based scheme to reduce greatly the frequency of cross-core loop-carried dependences, hence removes considerable amount of communication latency from critical paths of loop executions. Further, it is the priority-based scheme that keeps all processors busy most of time while maintaining processor load balanced. To demonstrate the capacities of $rm DO rm cyclical}}$, we have evaluated it by using the SPEC CPU2006 and StreamIt benchmarks on three real platforms. Experimental results show that our method is much less sensitive to fluctuations in communication latencies, compared with traditional cyclical multi-threading techniques. Besides, $rm DO rm cyclical}}$ outperforms other well-known parallelization methods, including decoupled software pipelines (DSWP), PS-DSWP and HELIX, in terms of speedup by 21-50, 16-27 and 15-25%, respectively, on the three platforms.

AB - Chip multiprocessors have been proposed for many years and have become the prevalent architecture for high-performance general-purpose processors. Currently, the search for automatic parallelization techniques that can take full advantage of processor resources is still an active research area. The cyclic multi-threading (CMT) approach, a popular parallelization paradigm, is widely applicable to many applications and delivers good performance scalability. Despite so, its performance could be quite sensitive to fluctuations in communication latencies without substantive operations that prefetch synchronization signals. To address this problem, we propose a novel CMT technique called ${rm DO}- rm cyclical}}$ that employs a priority-based scheme to reduce greatly the frequency of cross-core loop-carried dependences, hence removes considerable amount of communication latency from critical paths of loop executions. Further, it is the priority-based scheme that keeps all processors busy most of time while maintaining processor load balanced. To demonstrate the capacities of $rm DO rm cyclical}}$, we have evaluated it by using the SPEC CPU2006 and StreamIt benchmarks on three real platforms. Experimental results show that our method is much less sensitive to fluctuations in communication latencies, compared with traditional cyclical multi-threading techniques. Besides, $rm DO rm cyclical}}$ outperforms other well-known parallelization methods, including decoupled software pipelines (DSWP), PS-DSWP and HELIX, in terms of speedup by 21-50, 16-27 and 15-25%, respectively, on the three platforms.

UR - http://www.scopus.com/inward/record.url?scp=84992135512&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84992135512&partnerID=8YFLogxK

U2 - 10.1093/comjnl/bxv125

DO - 10.1093/comjnl/bxv125

M3 - Article

AN - SCOPUS:84992135512

VL - 59

SP - 1155

EP - 1173

JO - Computer Journal

JF - Computer Journal

SN - 0010-4620

IS - 8

ER -