Quantifying intrinsic parallelism using linear algebra for algorithm/architecture coexploration

Gwo-Giun Lee, He Yuan Lin, Chun Fu Chen, Tsung Yuan Huang

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Degree of parallelism (DoP) is an essential complexity metric that characterizes the number of independent operation sets (IOSs) that can be concurrently executed within an algorithm. This paper presents a generic framework to identify IOSs and to quantify the DoP based on rank theorem in linear algebra. This framework is applied to extract algorithmic parallelisms at various granularities, namely, multigrain parallelism. Our parallelism is intrinsic and platform independent and can provide insights into architectural information, thus facilitating mapping onto generic platforms and early back annotation for modifying algorithms. It plays a significant role in the concurrent optimization of both algorithms and architectures, referred to as Algorithm/Architecture Coexploration (AAC), by trading off between the DoP and the number of operations (NoO). This paper reports three case studies for AAC. The case study on an IDCT reveals that our framework accurately quantifies the parallelism for mapping the algorithm onto generic platforms, including FPGA and multicore systems. The IDCT parallelized by our technique surpasses a conventional spectral parallelization. By exploiting fine-grain parallelism, this paper presents a better porting of a discrete wavelet transform (DWT) onto single instruction multiple data (SIMD) machines compared with a commercial compiler. A high-quality deinterlacer is implemented on a low-cost multicore platform for real-time high-definition applications by analyzing the multigrain parallelism. These case studies reveal the effectiveness of our parallel analysis framework which is applicable to generic systems. Compared with traditional graph traversal techniques, our linear algebraic approach impressively features low complexity and is practical for complicated algorithms.

Original languageEnglish
Article number6025347
Pages (from-to)944-957
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume23
Issue number5
DOIs
Publication statusPublished - 2012 Mar 20

Fingerprint

Linear algebra
Discrete wavelet transforms
Field programmable gate arrays (FPGA)
Costs

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

@article{55c5c490bd1349c4bac9dc719a6461a1,
title = "Quantifying intrinsic parallelism using linear algebra for algorithm/architecture coexploration",
abstract = "Degree of parallelism (DoP) is an essential complexity metric that characterizes the number of independent operation sets (IOSs) that can be concurrently executed within an algorithm. This paper presents a generic framework to identify IOSs and to quantify the DoP based on rank theorem in linear algebra. This framework is applied to extract algorithmic parallelisms at various granularities, namely, multigrain parallelism. Our parallelism is intrinsic and platform independent and can provide insights into architectural information, thus facilitating mapping onto generic platforms and early back annotation for modifying algorithms. It plays a significant role in the concurrent optimization of both algorithms and architectures, referred to as Algorithm/Architecture Coexploration (AAC), by trading off between the DoP and the number of operations (NoO). This paper reports three case studies for AAC. The case study on an IDCT reveals that our framework accurately quantifies the parallelism for mapping the algorithm onto generic platforms, including FPGA and multicore systems. The IDCT parallelized by our technique surpasses a conventional spectral parallelization. By exploiting fine-grain parallelism, this paper presents a better porting of a discrete wavelet transform (DWT) onto single instruction multiple data (SIMD) machines compared with a commercial compiler. A high-quality deinterlacer is implemented on a low-cost multicore platform for real-time high-definition applications by analyzing the multigrain parallelism. These case studies reveal the effectiveness of our parallel analysis framework which is applicable to generic systems. Compared with traditional graph traversal techniques, our linear algebraic approach impressively features low complexity and is practical for complicated algorithms.",
author = "Gwo-Giun Lee and Lin, {He Yuan} and Chen, {Chun Fu} and Huang, {Tsung Yuan}",
year = "2012",
month = "3",
day = "20",
doi = "10.1109/TPDS.2011.230",
language = "English",
volume = "23",
pages = "944--957",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "5",

}

Quantifying intrinsic parallelism using linear algebra for algorithm/architecture coexploration. / Lee, Gwo-Giun; Lin, He Yuan; Chen, Chun Fu; Huang, Tsung Yuan.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 23, No. 5, 6025347, 20.03.2012, p. 944-957.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Quantifying intrinsic parallelism using linear algebra for algorithm/architecture coexploration

AU - Lee, Gwo-Giun

AU - Lin, He Yuan

AU - Chen, Chun Fu

AU - Huang, Tsung Yuan

PY - 2012/3/20

Y1 - 2012/3/20

N2 - Degree of parallelism (DoP) is an essential complexity metric that characterizes the number of independent operation sets (IOSs) that can be concurrently executed within an algorithm. This paper presents a generic framework to identify IOSs and to quantify the DoP based on rank theorem in linear algebra. This framework is applied to extract algorithmic parallelisms at various granularities, namely, multigrain parallelism. Our parallelism is intrinsic and platform independent and can provide insights into architectural information, thus facilitating mapping onto generic platforms and early back annotation for modifying algorithms. It plays a significant role in the concurrent optimization of both algorithms and architectures, referred to as Algorithm/Architecture Coexploration (AAC), by trading off between the DoP and the number of operations (NoO). This paper reports three case studies for AAC. The case study on an IDCT reveals that our framework accurately quantifies the parallelism for mapping the algorithm onto generic platforms, including FPGA and multicore systems. The IDCT parallelized by our technique surpasses a conventional spectral parallelization. By exploiting fine-grain parallelism, this paper presents a better porting of a discrete wavelet transform (DWT) onto single instruction multiple data (SIMD) machines compared with a commercial compiler. A high-quality deinterlacer is implemented on a low-cost multicore platform for real-time high-definition applications by analyzing the multigrain parallelism. These case studies reveal the effectiveness of our parallel analysis framework which is applicable to generic systems. Compared with traditional graph traversal techniques, our linear algebraic approach impressively features low complexity and is practical for complicated algorithms.

AB - Degree of parallelism (DoP) is an essential complexity metric that characterizes the number of independent operation sets (IOSs) that can be concurrently executed within an algorithm. This paper presents a generic framework to identify IOSs and to quantify the DoP based on rank theorem in linear algebra. This framework is applied to extract algorithmic parallelisms at various granularities, namely, multigrain parallelism. Our parallelism is intrinsic and platform independent and can provide insights into architectural information, thus facilitating mapping onto generic platforms and early back annotation for modifying algorithms. It plays a significant role in the concurrent optimization of both algorithms and architectures, referred to as Algorithm/Architecture Coexploration (AAC), by trading off between the DoP and the number of operations (NoO). This paper reports three case studies for AAC. The case study on an IDCT reveals that our framework accurately quantifies the parallelism for mapping the algorithm onto generic platforms, including FPGA and multicore systems. The IDCT parallelized by our technique surpasses a conventional spectral parallelization. By exploiting fine-grain parallelism, this paper presents a better porting of a discrete wavelet transform (DWT) onto single instruction multiple data (SIMD) machines compared with a commercial compiler. A high-quality deinterlacer is implemented on a low-cost multicore platform for real-time high-definition applications by analyzing the multigrain parallelism. These case studies reveal the effectiveness of our parallel analysis framework which is applicable to generic systems. Compared with traditional graph traversal techniques, our linear algebraic approach impressively features low complexity and is practical for complicated algorithms.

UR - http://www.scopus.com/inward/record.url?scp=84862820208&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862820208&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2011.230

DO - 10.1109/TPDS.2011.230

M3 - Article

VL - 23

SP - 944

EP - 957

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 5

M1 - 6025347

ER -