Deep Convolutional Neural Network (CNN) draws significant attention in the computer vision community by facilitating machines with more intelligence in understanding visual signals; however, its computation complexity has also increased significantly. To achieve ubiquitous machine intelligence, deep CNN is required to be ported onto local devices rather than cloud-based solution due to low latency consideration. Hence, in this paper, we propose a method to explore the design space for porting deep CNN onto iOS mobile devices, with attempts in maximizing data reusability, which alleviates the high bandwidth burden in the convolution layers of CNN. Furthermore, effective data reuse also makes possible the parallelization of all computing threads without data loading latency. On the other hand, deep CNN is usually over-parametrized with many unused convolution kernels. Based on Algorithm/Architecture Co-Exploration, we introduced a method in pruning redundant kernels in deep CNN with ignorable performance degradation on validation dataset (0.06% loss). This reduces 29% of operations and 34% of storage size on a 16-layer CNN. We used iPhone 6s and iPad Pro for case studies, and ported 8-layer and 16-layer CNNs onto targeted devices. The data reusability strategy improves computation speed up to 1.3×; and redundant kernel removal increases computation speed to 1.43×. As a result, we achieved high computation efficiency and have thus enhanced the capability of machine intelligence on local mobile devices.