Synthesis of spontaneous speech with syllable contraction using state-based context-dependent voice transformation

Chung Hsien Wu, Yi Chin Huang, Chung Han Lee, Jun Cheng Guo

研究成果: Article同行評審

4 引文 斯高帕斯(Scopus)

摘要

Pronunciation normally varies in spontaneous speech, and is an integral aspect of spontaneous expression. This study describes a voice transformation-based approach to generating spontaneous speech with syllable contractions for Hidden Markov Model (HMM)-based speech synthesis. A multi-dimensional linear regression model is adopted as the context-dependent, state-based transformation function to convert the feature sequence of read speech to that of spontaneous speech with syllable contraction. With insufficient number of training data, the obtained transformation functions are categorized using a decision tree based on linguistic and articulatory features for better and efficient selection of suitable transformation functions. Furthermore, to cope with the problem of small parallel corpus, cross-validation of trained transformation function is performed to ensure correct transformation functions are obtained and prevent over-fitting. Consequently, pronunciation variations of syllable contraction for the trained and the unseen syllable-contracted words are generated from the transformation function retrieved from the decision tree using linguistic and articulatory features. Objective and subjective tests were used to evaluate the performance of the proposed approach. Evaluation results demonstrate that the proposed transformation function substantially improves apparent spontaneity of the synthesized speech compared to the conventional methods.

原文English
頁(從 - 到)585-595
頁數11
期刊IEEE Transactions on Audio, Speech and Language Processing
22
發行號3
DOIs
出版狀態Published - 2014 3月

All Science Journal Classification (ASJC) codes

  • 聲學與超音波
  • 電氣與電子工程

指紋

深入研究「Synthesis of spontaneous speech with syllable contraction using state-based context-dependent voice transformation」主題。共同形成了獨特的指紋。

引用此