In this paper, we study and analyze the computational complexity of H.264/AVC baseline profile decoder based on SimpleScalar/ARM simulator. The simulation result shows that the memory reference, the operations of content activity check, and the edge filtering are known to be very time consuming in the embedded system. In order to reduce the memory reference and improve overall system performance, we proposed a new efficient VLSI architecture to accelerate the processing of deblocking filter. The proposed architecture is called "Adaptive Edge Filtering Operation (AEFO)," which could be embedded in a platform-based architecture as a co-processor. As a result, the performance of the embedded system using AEFO is 1.66 times faster than software implementation. Moreover, the number of total memory references for loading and storage is reduced by 34% and 36% respectively.