TY - JOUR
T1 - (Formula presented.)
T2 - a data dependence and stride reference patterns profiling infrastructure
AU - Yu, Hairong
AU - Li, Guohui
AU - Shu, Lih Chyun
N1 - Publisher Copyright:
© 2016, Springer Science+Business Media New York.
PY - 2016/2/1
Y1 - 2016/2/1
N2 - Despite the widespread use of multi-core processors in modern computer systems, developing software tools so as to make best use of available computing resources has never been more urgent. This is because a considerable amount of spurious dependence and cache misses lurking in general-purpose applications restricts seriously the extraction of potential parallelism on the nowadays prevalent multi-core machines. Existing tools are limited in their ability to thoroughly detect data dependence and provide prefetched objects simultaneously. Further, some of the tools are unable to profile large-scale applications. To address this problem, we propose a novel profiler, called (Formula presented.) , that performs both data dependence and stride reference profiling. Data dependence profiling employs a hash-based scheme to detect actual data dependence while filtering out useless dependence via timestamps. Stride reference profiling employs value profiling to profile the stride pattern for each dynamic load and select the profitable loads as prefetched objects for compilers. To demonstrate the effectiveness of (Formula presented.) , we have evaluated it using several SPEC CPU2006, MPI2007 and OMP2012 benchmarks on an Intel i7-4700 machine. Experimental results show that (Formula presented.) produces accurate profiling results, including expected data dependence and prefetched objects, which in turn contributes to more opportunities for extracting parallelism.
AB - Despite the widespread use of multi-core processors in modern computer systems, developing software tools so as to make best use of available computing resources has never been more urgent. This is because a considerable amount of spurious dependence and cache misses lurking in general-purpose applications restricts seriously the extraction of potential parallelism on the nowadays prevalent multi-core machines. Existing tools are limited in their ability to thoroughly detect data dependence and provide prefetched objects simultaneously. Further, some of the tools are unable to profile large-scale applications. To address this problem, we propose a novel profiler, called (Formula presented.) , that performs both data dependence and stride reference profiling. Data dependence profiling employs a hash-based scheme to detect actual data dependence while filtering out useless dependence via timestamps. Stride reference profiling employs value profiling to profile the stride pattern for each dynamic load and select the profitable loads as prefetched objects for compilers. To demonstrate the effectiveness of (Formula presented.) , we have evaluated it using several SPEC CPU2006, MPI2007 and OMP2012 benchmarks on an Intel i7-4700 machine. Experimental results show that (Formula presented.) produces accurate profiling results, including expected data dependence and prefetched objects, which in turn contributes to more opportunities for extracting parallelism.
UR - http://www.scopus.com/inward/record.url?scp=84954419672&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84954419672&partnerID=8YFLogxK
U2 - 10.1007/s11227-015-1612-8
DO - 10.1007/s11227-015-1612-8
M3 - Article
AN - SCOPUS:84954419672
SN - 0920-8542
VL - 72
SP - 770
EP - 788
JO - Journal of Supercomputing
JF - Journal of Supercomputing
IS - 2
ER -