Improving Mandarin Prosody Generation Using Alternative Smoothing Techniques

Yi Chin Huang, Chung Hsien Wu, Si Ting Weng

研究成果: Article同行評審

3 引文 斯高帕斯(Scopus)

摘要

Prosody plays a vital role for conveying both communicative meanings and specific speaking styles in speech communication. In recent years, Hidden Markov Model (HMM)-based synthesis system (HTS) has been developed in triumph, which can synthesize stable and smooth speech. However, the prosody of the synthesized speech suffers from the over-smoothing problem. Thus, a better prosodic model is required to improve the natural variability of the synthesized speech. This study exploits a hybrid method to alleviate this problem by combining the statistical and the template-based unit selection methods. First, a two-level clustering approach is proposed to obtain representative prosodic patterns (denoted by codewords) of the hierarchical prosodic structure modeled by a modified Fujisaki model. The prosodic codewords are then used to represent the prosody of each sentence in the parallel corpus consisting of the real speech corpus and the synthesized counterpart obtained from the HTS. The synthesized speech utterance is then used as the query for retrieving the prosodic codewords of the utterances in the synthesized corpus. The retrieved synthesized prosodic codewords are mapped to the prosodic codewords of the real speech based on linear mapping rules obtained from the parallel corpus. The prosodic codeword language models for prosodic word and prosodic phrase are employed respectively to choose the optimal codeword sequence of the real speech. Finally, the most likely sequence of prosodic codewords can be obtained based on the NURBS-based continuity measure for synthesizing speech with natural prosody. The experimental results of subjective and objective tests demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of the HMM-based method.

原文English
頁(從 - 到)1897-1907
頁數11
期刊IEEE/ACM Transactions on Audio Speech and Language Processing
24
發行號11
DOIs
出版狀態Published - 2016 11月

All Science Journal Classification (ASJC) codes

  • 電腦科學(雜項)
  • 聲學與超音波
  • 計算數學
  • 電氣與電子工程

指紋

深入研究「Improving Mandarin Prosody Generation Using Alternative Smoothing Techniques」主題。共同形成了獨特的指紋。

引用此