Expressive Speech Synthesis Based on Emotion Control Vector Generation Using MDS-based Space Transformation

  • 黃 喻豐

Student thesis: Master's Thesis


In human-machine interaction speech-related techniques provide user a convenient way to interact with the computer The speech synthesis based on Hidden Markov Model (HMM) has been developed It can synthesize stable and smooth speech In recent years demand for synthetic speech with more variability and expressivity has been increasing In control vector-based expressive speech synthesis the emotion/style control vector defined in the categorical (CAT) emotion space is uneasy to be precisely defined by the user to synthesize the speech with the desired emotion/style This thesis applies the arousal-valence (AV) space to the multiple regression hidden semi-Markov model (MRHSMM)-based synthesis framework for expressive speech synthesis In this study the user can designate a specific emotion by defining the AV values in the AV space The multidimensional scaling (MDS) method is adopted to project the AV emotion space and the categorical (CAT) emotion space onto their corresponding orthogonal coordinate systems A transformation approach is thus proposed to transform the AV values to the emotion control vector in the CAT emotion space for MRHSMM-based expressive speech synthesis In the synthesis phase given the input text and desired emotion with the transformed emotion control vector the speech with the desired emotion is generated from the MRHSMMs In experimental participants were invited to do subjective test on the emotion parameters and the quality of the synthesized speech Experimental result shows the proposed method is helpful for the user to easily and precisely determine the desired emotion for expressive speech synthesis
Date of Award2016 Sept 2
Original languageEnglish
SupervisorChung-Hsien Wu (Supervisor)

Cite this