In this paper, we design a novel architecture for computing all transforms required in H.264/AVC high profile decoder. This flexible architecture design can compute all transforms including 8 and 4-point integer transforms as well as 4 and 2-point Hardamard transforms such that we can reduce the implementation chip area dramatically. With 8 pixels/cycle throughput, this proposed design can complete the computation in 95 clock cycles with 8×8 inverse transform involved or 54 clock cycles without 8×8 inverse transform for one macroblock. Simulation results show that the implemented area is 18.5k gate counts, and the maximum clock frequency is 125 MHz. For the real-time requirement, the architecture can deal with all existed frame sizes in 4:2:0 format. For example, if this architecture is operated at 106 MHz, it achieves 4096×2304@30 frames/sec.