Cross-lingual frame selection method for polyglot speech synthesis

Chia Ping Chen, Yi Chin Huang, Chung-Hsien Wu, Kuan De Lee

研究成果: Conference contribution

7 引文 斯高帕斯(Scopus)

摘要

A novel approach is proposed to creating a polyglot speech synthesis system without the need of collecting speech data from a bilingual (or multilingual) speaker, which is often expensive or even infeasible. Given a target speaker with data in the first language (Mandarin in this study), the basic idea is to construct artificial utterances in the second language (English) via selection of speech sample frames of the given speaker in the first language. As the speaker needs not be polyglot, this method is generally applicable to any speaker and any languages. In the search for optimal frame sequence selection, the candidate set is constrained by a decision tree for phone segments in the speech data of both languages, and the cost function depends on the context-dependent articulatory and auditory features. Evaluation results show that good performance regarding similarity (speaker identity) and naturalness (speech quality) can be achieved with the proposed method.

原文English
主出版物標題2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
頁面4521-4524
頁數4
DOIs
出版狀態Published - 2012 十月 23
事件2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan
持續時間: 2012 三月 252012 三月 30

出版系列

名字ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(列印)1520-6149

Other

Other2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
國家/地區Japan
城市Kyoto
期間12-03-2512-03-30

All Science Journal Classification (ASJC) codes

  • 軟體
  • 訊號處理
  • 電氣與電子工程

指紋

深入研究「Cross-lingual frame selection method for polyglot speech synthesis」主題。共同形成了獨特的指紋。

引用此