Personalized Spontaneous Speech Synthesis Using a Small-Sized Unsegmented Semispontaneous Speech

Yi Chin Huang, Chung Hsien Wu, Yan You Chen, Ming Ge Shie, Jhing Fa Wang

研究成果: Article同行評審

3 引文 斯高帕斯(Scopus)

摘要

A systematic approach is proposed to synthesizing personalized spontaneous speech using a small-sized unsegmented speech corpus of the target speaker. First, an automatic segmentation algorithm is employed to segment and label a collected semispontaneous speech corpus of the target speaker. Then, a pretrained average voice model is adapted to the voice model of the target speaker by using the segmented data. A postfilter based on modulation spectrum is adopted to further improve the speaker similarity of the synthesized speech as well as alleviate the over-smoothing problem of the synthesized speech. For generating spontaneous speech, a smoothing method applied at the prosodic word level is proposed to improve speech fluency. For objective evaluation on spontaneous speech segmentation, the segmentation accuracy of the proposed method is superior to that of Viterbi-based forced alignment. The results of subjective listening test also show that the proposed method can improve the spontaneity and speaker similarity of the synthesized speech compared to the maximum likelihood linear regression based speaker adaptation method.

原文English
頁(從 - 到)1048-1060
頁數13
期刊IEEE/ACM Transactions on Audio Speech and Language Processing
25
發行號5
DOIs
出版狀態Published - 2017 5月

All Science Journal Classification (ASJC) codes

  • 電腦科學(雜項)
  • 聲學與超音波
  • 計算數學
  • 電氣與電子工程

指紋

深入研究「Personalized Spontaneous Speech Synthesis Using a Small-Sized Unsegmented Semispontaneous Speech」主題。共同形成了獨特的指紋。

引用此