It is common that an SOC contains hundreds or even thousands of heterogeneous embedded memories. Many of these embedded memories have wide data words, leading to high routing penalty from the BIST circuits. Previous BIST schemes solve the problem using serial interface, e.g., based on the IEEE 1500 architecture and novel scan approaches, to reduce the routing area overhead. However, serial approaches do not allow at-speed test and diagnosis, and are very slow. In this paper, we propose a hybrid BIST architecture that reduces the routing penalty, while allowing at-speed test and diagnosis of the memory cores. The test time is close to that of a typical parallel BIST method. Experimental results show that the proposed BIST can effectively reduce the area overhead.