Polyglot speech synthesis based on cross-lingual frame selection using auditory and articulatory features

Chia Ping Chen, Yi Chin Huang, Chung Hsien Wu, Kuan De Lee

研究成果: Article同行評審

10 引文 斯高帕斯(Scopus)

摘要

In this paper, an approach for polyglot speech synthesis based on cross-lingual frame selection is proposed. This method requires only mono-lingual speech data of different speakers in different languages for building a polyglot synthesis system, thus reducing the burden of data collection. Essentially, a set of artificial utterances in the second language for a target speaker is constructed based on the proposed cross-lingual frame-selection process, and this data set is used to adapt a synthesis model in the second language to the speaker. In the cross-lingual frame-selection process, we propose to use auditory and articulatory features to improve the quality of the synthesized polyglot speech. For evaluation, a Mandarin-English polyglot system is implemented where the target speaker only speaks Mandarin. The results show that decent performance regarding voice identity and speech quality can be achieved with the proposed method.

原文English
文章編號2339738
頁(從 - 到)1558-1570
頁數13
期刊IEEE Transactions on Audio, Speech and Language Processing
22
發行號10
DOIs
出版狀態Published - 2014 10月 1

All Science Journal Classification (ASJC) codes

  • 聲學與超音波
  • 電氣與電子工程

指紋

深入研究「Polyglot speech synthesis based on cross-lingual frame selection using auditory and articulatory features」主題。共同形成了獨特的指紋。

引用此