TY - JOUR
T1 - A Novel, Efficient Implementation of a Local Binary Convolutional Neural Network
AU - Lin, Ing Chao
AU - Tang, Chi Huan
AU - Ni, Chi Ting
AU - Hu, Xing
AU - Shen, Yu Tong
AU - Chen, Pei Yin
AU - Xie, Yuan
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2021/4
Y1 - 2021/4
N2 - In order to reduce the computational complexity of convolutional neural networks (CNNs), the local binary convolutional neural network (LBCNN) has been proposed. In the LBCNN, a convolutional layer is divided into two sublayers. Sublayer 1 is a sparse ternary-weighted convolutional layer, and Sublayer 2 is a 1 1 convolutional layer. With the use of two sublayers, the LBCNN has lower computational complexity and uses less memory than the CNN. In this brief, we propose a platform that includes a weight preprocessor and layer accelerator for the LBCNN. The proposed weight preprocessor takes advantage of the sparsity in the LBCNN and encodes the weight offline. The layer accelerator effectively uses the encoded data to reduce computational complexity and memory accesses for an inference. When compared to the state-of-the-art design, the experimental results show that the number of clock cycles are reduced by 76.32%, and memory usage is reduced by 39.41%. The synthesized results show that the clock period is reduced by 4.76%; the cell area is reduced by 46.48%, and the power consumption is reduced by 40.87%. The inference accuracy is the same as that of the state-of-the-art design.
AB - In order to reduce the computational complexity of convolutional neural networks (CNNs), the local binary convolutional neural network (LBCNN) has been proposed. In the LBCNN, a convolutional layer is divided into two sublayers. Sublayer 1 is a sparse ternary-weighted convolutional layer, and Sublayer 2 is a 1 1 convolutional layer. With the use of two sublayers, the LBCNN has lower computational complexity and uses less memory than the CNN. In this brief, we propose a platform that includes a weight preprocessor and layer accelerator for the LBCNN. The proposed weight preprocessor takes advantage of the sparsity in the LBCNN and encodes the weight offline. The layer accelerator effectively uses the encoded data to reduce computational complexity and memory accesses for an inference. When compared to the state-of-the-art design, the experimental results show that the number of clock cycles are reduced by 76.32%, and memory usage is reduced by 39.41%. The synthesized results show that the clock period is reduced by 4.76%; the cell area is reduced by 46.48%, and the power consumption is reduced by 40.87%. The inference accuracy is the same as that of the state-of-the-art design.
UR - http://www.scopus.com/inward/record.url?scp=85096866695&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096866695&partnerID=8YFLogxK
U2 - 10.1109/TCSII.2020.3036012
DO - 10.1109/TCSII.2020.3036012
M3 - Article
AN - SCOPUS:85096866695
SN - 1549-7747
VL - 68
SP - 1413
EP - 1417
JO - IEEE Transactions on Circuits and Systems II: Express Briefs
JF - IEEE Transactions on Circuits and Systems II: Express Briefs
IS - 4
M1 - 9249011
ER -