Music can enhance our emotional reactions to videos and images while videos and images can enrich our emotional response to music Cross-modality retrieval technology can be used to recommend appropriate music for a given video and vice versa However the heterogeneity gap caused by the inconsistent distribution between different data modalities complicates learning the common representation space from different modalities Accordingly we propose an emotion-aware music-video cross-modal generative adversarial network (EMVGAN) model to build an affective common embedding space to bridge the heterogeneity gap among different data modalities The evaluation results revealed that the proposed EMVGAN model can learn affective common representations with convincing performance while outperforming other existing models Furthermore the satisfactory performance of the proposed network encouraged us to undertake the music-video bidirectional retrieval task The results of the subjective evaluations by the 40 recruited participants indicated a similar consistency and emotional relationship between the retrieved music videos and official music videos
Date of Award | 2020 |
---|
Original language | English |
---|
Supervisor | Wei-Ta Chu (Supervisor) |
---|
EMVGAN: Emotion-Aware Music-Video Common Representation Learning via Generative Adversarial Networks
雨芝, 蔡. (Author). 2020
Student thesis: Doctoral Thesis