TY - GEN
T1 - Mapping visual signal processing onto multi-core platform via algorithm/architecture co-exploration
AU - Chen, Chun Fu
AU - Lee, Gwo Giun Chris
AU - Yu, Zheng Han
AU - Huang, Chun His
PY - 2014/12/15
Y1 - 2014/12/15
N2 - Degree of parallelism and data communication should be investigated to achieve high performance for mapping algorithm onto multi-core platform since multi-core platform would concurrently process multiple tasks and lots of data would be transferred between storages and processors. This paper proposes a method to resolve the burden of increase on data transfer rate in parallel processing via the analysis on the dependency matrix of data flow graph. The proposed method does not bias any multi-core platform since it just considers the intrinsic characteristics of algorithm, i.e., data flow graph. This paper utilizes dependency matrix, which conveys the causality of data transfer, for quantifying data transfer rate and corresponding storage requirement; as a consequence, a feasible mapping result, which has smaller data transfer rate and acceptable storage requirement, was exploited. Furthermore, in conjunction with degree of parallelism quantification, this paper presents a comprehensive exploration on design space for mapping algorithm onto multi-core platform through dependency matrix. IBM Cell Broadband Engine is selected to be the targeted multi-core platform in this paper. Experimental results show that when six cores are used, our result can speedup 5.75x on average as compared to single-core case; in addition, by integrating the proposed method on data transfer analysis, about 46% cycles of data transfer could be saved and overall performance could be further increased to 7.51x on average in comparison with the scenario of single-core without data reuse.
AB - Degree of parallelism and data communication should be investigated to achieve high performance for mapping algorithm onto multi-core platform since multi-core platform would concurrently process multiple tasks and lots of data would be transferred between storages and processors. This paper proposes a method to resolve the burden of increase on data transfer rate in parallel processing via the analysis on the dependency matrix of data flow graph. The proposed method does not bias any multi-core platform since it just considers the intrinsic characteristics of algorithm, i.e., data flow graph. This paper utilizes dependency matrix, which conveys the causality of data transfer, for quantifying data transfer rate and corresponding storage requirement; as a consequence, a feasible mapping result, which has smaller data transfer rate and acceptable storage requirement, was exploited. Furthermore, in conjunction with degree of parallelism quantification, this paper presents a comprehensive exploration on design space for mapping algorithm onto multi-core platform through dependency matrix. IBM Cell Broadband Engine is selected to be the targeted multi-core platform in this paper. Experimental results show that when six cores are used, our result can speedup 5.75x on average as compared to single-core case; in addition, by integrating the proposed method on data transfer analysis, about 46% cycles of data transfer could be saved and overall performance could be further increased to 7.51x on average in comparison with the scenario of single-core without data reuse.
UR - http://www.scopus.com/inward/record.url?scp=84920270394&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84920270394&partnerID=8YFLogxK
U2 - 10.1109/SiPS.2014.6986094
DO - 10.1109/SiPS.2014.6986094
M3 - Conference contribution
AN - SCOPUS:84920270394
T3 - IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
BT - IEEE Workshop on Signal Processing Systems, SiPS
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE Workshop on Signal Processing Systems, SiPS 2014
Y2 - 20 October 2014 through 22 October 2014
ER -