In H.264/AVC reference software, the decoding process outputs the 4x4 block data from components of the intra prediction and motion compensation to the external memory. The processed data in IDCT&IQ will be transferred to external memory. When applying the de-blocking filter process, the data in external memory will be loaded to deblocking filter and processed data stored back to external memory again. Therefore, the decoding process in reference software wastes a lot of time on unnecessary memory access. In this paper, we use a 4x4 block as an operation unit in the decoding process to reduce memory access. The processing order is changed in the de-blocking filter to be based on decoding order of 4x4 block. By using our proposed method, the average decoding MIPS of one frame are improved by about 3.6 times faster as shown in the simulation results.