TY - GEN
T1 - Deep Convolutional Neural Network on iOS mobile devices (Invited Paper)
AU - Chen, Chun Fu
AU - Lee, Gwo Giun
AU - Sritapan, Vincent
AU - Lin, Ching Yung
N1 - Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2016/12/9
Y1 - 2016/12/9
N2 - Deep Convolutional Neural Network (CNN) draws significant attention in the computer vision community by facilitating machines with more intelligence in understanding visual signals; however, its computation complexity has also increased significantly. To achieve ubiquitous machine intelligence, deep CNN is required to be ported onto local devices rather than cloud-based solution due to low latency consideration. Hence, in this paper, we propose a method to explore the design space for porting deep CNN onto iOS mobile devices, with attempts in maximizing data reusability, which alleviates the high bandwidth burden in the convolution layers of CNN. Furthermore, effective data reuse also makes possible the parallelization of all computing threads without data loading latency. On the other hand, deep CNN is usually over-parametrized with many unused convolution kernels. Based on Algorithm/Architecture Co-Exploration, we introduced a method in pruning redundant kernels in deep CNN with ignorable performance degradation on validation dataset (0.06% loss). This reduces 29% of operations and 34% of storage size on a 16-layer CNN. We used iPhone 6s and iPad Pro for case studies, and ported 8-layer and 16-layer CNNs onto targeted devices. The data reusability strategy improves computation speed up to 1.3×; and redundant kernel removal increases computation speed to 1.43×. As a result, we achieved high computation efficiency and have thus enhanced the capability of machine intelligence on local mobile devices.
AB - Deep Convolutional Neural Network (CNN) draws significant attention in the computer vision community by facilitating machines with more intelligence in understanding visual signals; however, its computation complexity has also increased significantly. To achieve ubiquitous machine intelligence, deep CNN is required to be ported onto local devices rather than cloud-based solution due to low latency consideration. Hence, in this paper, we propose a method to explore the design space for porting deep CNN onto iOS mobile devices, with attempts in maximizing data reusability, which alleviates the high bandwidth burden in the convolution layers of CNN. Furthermore, effective data reuse also makes possible the parallelization of all computing threads without data loading latency. On the other hand, deep CNN is usually over-parametrized with many unused convolution kernels. Based on Algorithm/Architecture Co-Exploration, we introduced a method in pruning redundant kernels in deep CNN with ignorable performance degradation on validation dataset (0.06% loss). This reduces 29% of operations and 34% of storage size on a 16-layer CNN. We used iPhone 6s and iPad Pro for case studies, and ported 8-layer and 16-layer CNNs onto targeted devices. The data reusability strategy improves computation speed up to 1.3×; and redundant kernel removal increases computation speed to 1.43×. As a result, we achieved high computation efficiency and have thus enhanced the capability of machine intelligence on local mobile devices.
UR - http://www.scopus.com/inward/record.url?scp=85013192807&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85013192807&partnerID=8YFLogxK
U2 - 10.1109/SiPS.2016.31
DO - 10.1109/SiPS.2016.31
M3 - Conference contribution
AN - SCOPUS:85013192807
T3 - IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
SP - 130
EP - 135
BT - Proceedings - IEEE International Workshop on Signal Processing Systems, SiPS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE International Workshop on Signal Processing Systems, SiPS 2016
Y2 - 26 October 2016 through 28 October 2016
ER -