We present an embedded-based robot arm grasp detection system. This system has two subsystems: an embedded vision subsystem and an embedded robot arm control subsystem. In the former, a template matching algorithm is processed into an Nvidia Jetson TX2 developer kit to detect objects. And, then, control a robot arm to grasp and place. Although embedded systems have some preferred benefits, such as: cost, weight, size and power consumption; the slow processing speed is a significant drawback. To deal with this problem, we propose methods to reduce the number of calculations on the measurement of similarity. After testing on 40 templates with 200 test images, the result shows that the average of execution time is up to 10x faster than the original. The average execution time on middle size templates, (100 \sim 200) \times (100 \sim 200) pixels, obtains 0.176sec. In addition, the angle of objects is determined with a small angle interval: 1 degree.