Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation

Yi Chin Huang, Chung Hsien Wu, Ming Ge Shie

研究成果: Conference article同行評審

1 引文 斯高帕斯(Scopus)

摘要

This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.

原文English
頁(從 - 到)294-298
頁數5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2015-January
出版狀態Published - 2015
事件16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
持續時間: 2015 9月 62015 9月 10

All Science Journal Classification (ASJC) codes

  • 語言與語言學
  • 人機介面
  • 訊號處理
  • 軟體
  • 建模與模擬

指紋

深入研究「Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation」主題。共同形成了獨特的指紋。

引用此