In this paper, two hardware-oriented fast motion estimation algorithms and their implementations into a 2-D systolic array for variable block size motion estimation architecture are presented. Two hardware oriented algorithms are proposed to increase the coding speed and reduce the computation complexity of the fast motion estimation (FME) algorithm. The results show that the proposed FME algorithm can speed up 71% coding time of the original standard with slightly PSNR loss and bit rate increase. Therefore, the hardware architecture designs for the proposed algorithms with considerations of both motion vector cost and the sum of absolute difference (SAD) distortion are implemented. The chip, which is realized in CMOS TSMC 0.13μm 1P8M technology, can be operated at 200MHz with gate count 191k including the memory modules.