Hosting the deep learning model on the cloud may not be the best solution in many cases, for instance, IoT applications or autonomous system where low latency or enhanced security is desirable. Deep learning on the edge alleviates the above issues, and provides benefits of local computation. In this paper, we present the development of an open ISA (instruction set architecture) general purpose GPU aimed at edge computation. Our GPU, CASLab GPU, uses license-free, royalty-free HSAIL ISA specification and supports OpenCL1.2/2.0 APIs for heterogeneous computing. CASLab GPU also supports TensorFlow framework with CUDA-on-CL technology. CASLab GPU IP with configurable SIMT Core design tailors directly to the computing need of on-device learning and inference. The GPU is developed in ESL design methodology which incorporates GPU micro-architecture exploration, power modelling of the GPU, and the co-simulation of the GPU software stack.