We propose an approach to conduct video scene detection especially for travel videos captured by amateur photographers in journeys. The correlation between a travel video and its corresponding text-based travel schedule is discovered. Because scene boundaries are clearly defined in schedules, we segment videos into scenes by checking the discovered cross-media correlation. To make these two modalities comparable, photos related to the visited scenic spots are retrieved from image search engines, by the keywords extracted from text-based schedules. Sequences of keyframes and retrieved photos are represented as visual word histograms, and the problem of correlation determination is then transformed as an approximate sequence matching problem. The experimental results verify the effectiveness of the proposed idea, and show the promising research direction of utilizing cross-media correlation in media analysis.