We conduct video scene detection with the aids of web-based context, especially for travel videos captured by amateur photographers in journeys. Correlations between personal videos and predefined travel schedules, which are used to retrieve related data from general-purpose image/video search engines, are discovered. Because scene boundaries are clearly defined in travel schedules, we segment videos into scenes by checking the discovered cross-media correlation. To make different modalities comparable, keyframes extracted from videos and images retrieved from web are represented by visual word histograms, and the problem of correlation determination is then transformed as an approximate sequence matching problem. We prioritize different visual words according to statistics of retrieved data, and evaluate similarity between images based on the weighting scheme. To systematically determine scene boundaries after finding cross-media correlation, we introduce an energy minimization framework to jointly consider visual, temporal, and context information. Experimental results verify the effectiveness of the proposed idea, and show that it is promising to utilize cross-media correlation and web-based context in media analysis.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence