TY - GEN
T1 - Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis
AU - Huang, Yi Chin
AU - Wu, Chung Hsien
AU - Weng, Sz Ting
PY - 2012
Y1 - 2012
N2 - In this paper, a novel hierarchical prosodic unit selection method is proposed based on pitch contour pattern retrieval, in order to obtained natural pitch contour of the personalized synthetic voice. In this framework, a hierarchical prosodic unit based on Fujisaki model is used to take local pitch contour variation and global intonation of utterance into account. Furthermore, novel ways of integrating pitch contour pattern of prosodic units in the prosodic model are invents in order to improve the selection mechanism of the appropriate pitch contour. A novel prosodic unit selection method is proposed based on sentence retrieval, which not only uses the traditional linguistic cue as selection criterion, but also the shape of the pitch contour. Also, the codewords of pitch patterns in the training corpus and synthesized corpus were constructed by the proposed method and were used to map the relation between training codeword and synthesized corpus. Finally, the language model of pitch pattern is adopted to find the proper pitch pattern sequence of input text. The evaluation results demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of model-based method.
AB - In this paper, a novel hierarchical prosodic unit selection method is proposed based on pitch contour pattern retrieval, in order to obtained natural pitch contour of the personalized synthetic voice. In this framework, a hierarchical prosodic unit based on Fujisaki model is used to take local pitch contour variation and global intonation of utterance into account. Furthermore, novel ways of integrating pitch contour pattern of prosodic units in the prosodic model are invents in order to improve the selection mechanism of the appropriate pitch contour. A novel prosodic unit selection method is proposed based on sentence retrieval, which not only uses the traditional linguistic cue as selection criterion, but also the shape of the pitch contour. Also, the codewords of pitch patterns in the training corpus and synthesized corpus were constructed by the proposed method and were used to map the relation between training codeword and synthesized corpus. Finally, the language model of pitch pattern is adopted to find the proper pitch pattern sequence of input text. The evaluation results demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of model-based method.
UR - http://www.scopus.com/inward/record.url?scp=84874454139&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874454139&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP.2012.6423536
DO - 10.1109/ISCSLP.2012.6423536
M3 - Conference contribution
AN - SCOPUS:84874454139
SN - 9781467325059
T3 - 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012
SP - 79
EP - 83
BT - 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012
T2 - 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012
Y2 - 5 December 2012 through 8 December 2012
ER -