Among the stereo matching algorithms, the semi-global matching (SGM) is an efficient and high-accuracy method. However, its huge demand for memory access and high computational complexity makes it difficult to achieve a real-time and efficient processing on hardware. Based on the spatial redundancy found in the matching cost, we propose some effective techniques to reduce the requirement of on-chip and off-chip memory, while simultaneously greatly lower the computational complexity. Experimental results present that the proposed SGM algorithm reduces the computational complexity by 71%–74% and has almost the same quality of disparity map compared with the original 8-path SGM. The proposed 3-path fully-pipelined architecture is implemented on the Xilinx VCU-106 with a throughput of 1920 × 1080/54 fps. We also synthesize and layout it with TSMC 40 nm standard library, leading to an area of 8.1 mm2 with throughput of 1920 × 1080/192 fps. The million disparity estimation per second (MDE/s) of the proposed design reaches up to 50,960, which outperforms conventional ASIC implementations.
All Science Journal Classification (ASJC) codes