This paper presents an algorithm that is capable of generating 3-D video from monocular video through a hierarchical approach that characterizes the low-level features and high-level semantics of video content to generate depth map. Color and texture features are used to locally indicate video’s characteristics followed by segmentation of the video into several regions. Subsequently, semantics, including shape and motion semantics, of segmented regions are delineated from a higher perspective to refine segmentation result via measuring the interrelations among segmented regions. Subsequently, according to refined segmentation map and semantics of segmented regions, the proposed method generates a depth map with high stability using both spatial and temporal information. The stable depth map minimized visual quality degradation, such as flicker and blurring when viewing the 3-D video. The experimental results show the capability of the proposed algorithm in generating a high quality and stable depth map. In addition, subjective viewing evaluation shows that the proposed algorithm surpasses those of the commercial products for converting 2-D video to 3-D video, including TriDef 3D and CyberLink Power DVD.
All Science Journal Classification (ASJC) codes