TY - GEN
T1 - A Programmable Systolic-Array AI Accelerator System with High-Performance Model Quantization and Heart Disease Classification Algorithm Design
AU - Wang, Kuan Cheng
AU - Ku, Ming Yueh
AU - Lee, Shuenn Yuh
AU - Chen, Ju Yi
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - This work introduces a heart disease classification system. The system includes electrocardiography (ECG) arrhythmia classification and phonocardiography (PCG) heart-valve diseases classification algorithm, achieving 97.4% and 99.1% accuracy. Additionally, the paper presents a procedure for lightweight convolutional neural network (CNN) model quantization with an 8-bit fix-point and 0.1% accuracy loss. Furthermore, this study proposes a programmable artificial intelligence (AI) accelerator with an application-specific instruction set processor (ASIP) and systolic array architecture to achieve high-performance computing. Moreover, we introduce a matrix mapping unit (MMU) and the pipeline state register (PSR) to facilitate switching between CNN and matrix multiplication, resulting in a reduction of over 50% in timing overhead. The chip is implemented on Xilinx's PYNQ-Z2 and achieves a power consumption of 106 mW, with a classification latency of 6.8ms / 21ms (arrhythmia/valve diseases).
AB - This work introduces a heart disease classification system. The system includes electrocardiography (ECG) arrhythmia classification and phonocardiography (PCG) heart-valve diseases classification algorithm, achieving 97.4% and 99.1% accuracy. Additionally, the paper presents a procedure for lightweight convolutional neural network (CNN) model quantization with an 8-bit fix-point and 0.1% accuracy loss. Furthermore, this study proposes a programmable artificial intelligence (AI) accelerator with an application-specific instruction set processor (ASIP) and systolic array architecture to achieve high-performance computing. Moreover, we introduce a matrix mapping unit (MMU) and the pipeline state register (PSR) to facilitate switching between CNN and matrix multiplication, resulting in a reduction of over 50% in timing overhead. The chip is implemented on Xilinx's PYNQ-Z2 and achieves a power consumption of 106 mW, with a classification latency of 6.8ms / 21ms (arrhythmia/valve diseases).
UR - https://www.scopus.com/pages/publications/105010638708
UR - https://www.scopus.com/pages/publications/105010638708#tab=citedBy
U2 - 10.1109/ISCAS56072.2025.11043735
DO - 10.1109/ISCAS56072.2025.11043735
M3 - Conference contribution
AN - SCOPUS:105010638708
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
BT - ISCAS 2025 - IEEE International Symposium on Circuits and Systems, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Symposium on Circuits and Systems, ISCAS 2025
Y2 - 25 May 2025 through 28 May 2025
ER -