Cross-lingual frame selection method for polyglot speech synthesis

Chia Ping Chen, Yi Chin Huang, Chung-Hsien Wu, Kuan De Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

A novel approach is proposed to creating a polyglot speech synthesis system without the need of collecting speech data from a bilingual (or multilingual) speaker, which is often expensive or even infeasible. Given a target speaker with data in the first language (Mandarin in this study), the basic idea is to construct artificial utterances in the second language (English) via selection of speech sample frames of the given speaker in the first language. As the speaker needs not be polyglot, this method is generally applicable to any speaker and any languages. In the search for optimal frame sequence selection, the candidate set is constrained by a decision tree for phone segments in the speech data of both languages, and the cost function depends on the context-dependent articulatory and auditory features. Evaluation results show that good performance regarding similarity (speaker identity) and naturalness (speech quality) can be achieved with the proposed method.

Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages4521-4524
Number of pages4
DOIs
Publication statusPublished - 2012 Oct 23
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan
Duration: 2012 Mar 252012 Mar 30

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
CountryJapan
CityKyoto
Period12-03-2512-03-30

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Cross-lingual frame selection method for polyglot speech synthesis'. Together they form a unique fingerprint.

  • Cite this

    Chen, C. P., Huang, Y. C., Wu, C-H., & Lee, K. D. (2012). Cross-lingual frame selection method for polyglot speech synthesis. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings (pp. 4521-4524). [6288923] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2012.6288923