TY - JOUR
T1 - An Efficient Implementation of Convolutional Neural Network With CLIP-Q Quantization on FPGA
AU - Cheng, Wei
AU - Lin, Ing Chao
AU - Shih, Yun Yang
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2022/10/1
Y1 - 2022/10/1
N2 - Convolutional neural networks (CNNs) have achieved tremendous success in the computer vision domain recently. The pursue for better model accuracy drives the model size and the storage requirements of CNNs as well as the computational complexity. Therefore, Compression Learning by InParallel Pruning-Quantization (CLIP-Q) was proposed to reduce a vast amount of weight storage requirements by using a few quantized segments to represent all weights in a CNN layer. Among various quantization strategies, CLIP-Q is suitable for hardware accelerators because it reduces model size significantly while maintaining the full-precision model accuracy. However, the current CLIP-Q approach did not consider the hardware characteristics and it is not straightforward when mapped to a CNN hardware accelerator. In this work, we propose a software-hardware codesign platform that includes a modified version of CLIP-Q algorithm and a hardware accelerator, which consists of 5×5 reconfigurable convolutional arrays with input and output channel parallelization. Additionally, the proposed CNN accelerator maintains the same accuracy of a full-precision CNN in Cifar-10 and Cifar-100 datasets.
AB - Convolutional neural networks (CNNs) have achieved tremendous success in the computer vision domain recently. The pursue for better model accuracy drives the model size and the storage requirements of CNNs as well as the computational complexity. Therefore, Compression Learning by InParallel Pruning-Quantization (CLIP-Q) was proposed to reduce a vast amount of weight storage requirements by using a few quantized segments to represent all weights in a CNN layer. Among various quantization strategies, CLIP-Q is suitable for hardware accelerators because it reduces model size significantly while maintaining the full-precision model accuracy. However, the current CLIP-Q approach did not consider the hardware characteristics and it is not straightforward when mapped to a CNN hardware accelerator. In this work, we propose a software-hardware codesign platform that includes a modified version of CLIP-Q algorithm and a hardware accelerator, which consists of 5×5 reconfigurable convolutional arrays with input and output channel parallelization. Additionally, the proposed CNN accelerator maintains the same accuracy of a full-precision CNN in Cifar-10 and Cifar-100 datasets.
UR - http://www.scopus.com/inward/record.url?scp=85135739668&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85135739668&partnerID=8YFLogxK
U2 - 10.1109/TCSI.2022.3193031
DO - 10.1109/TCSI.2022.3193031
M3 - Article
AN - SCOPUS:85135739668
SN - 1549-8328
VL - 69
SP - 4093
EP - 4102
JO - IEEE Transactions on Circuits and Systems I: Regular Papers
JF - IEEE Transactions on Circuits and Systems I: Regular Papers
IS - 10
ER -