This work presents a music timbre transfer model that aims to transfer the style of a music clip while preserving the semantic content Compared to the existing music timbre transfer models our model can achieve many-to-many timbre transfer between different instruments The proposed method is based on an autoencoder framework which comprises two pretrained encoders and one decoder trained in an unsupervised manner To learn more representative features for the encoders we produced a parallel dataset called MI-Para synthesized from MIDI files and digital audio workstations We evaluated the content preservation and success of style transfer objectively and subjectively The F0 consistency and hit rate were used to objectively evaluate the performance of the transferred output For subjective evaluation we recruited 78 subjects for a listening test of scoring the transferred audio Our model outperforms the architecture proposed in the work of many-to-many voice conversion Through the evaluations we validated the effectiveness of the proposed framework In addition to the performance measurement we also demonstrated that based on the state-of-the-art triplet network the content encoder can learn meaningful content representations with the collected parallel dataset which has no manually labeled annotations of music content Moreover we assume that it still takes a great deal of effort to produce such a parallel dataset To scale up the application scenario of our proposed method we also demonstrated that our model can achieve a many-to-many style transfer by training in a semi-supervised manner with a smaller parallel dataset
Date of Award | 2020 |
---|
Original language | English |
---|
Supervisor | Wei-Ta Chu (Supervisor) |
---|
Semi-supervised Many-to-many Music Timbre Transfer
瑜真, 張. (Author). 2020
Student thesis: Doctoral Thesis