Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus

Yan You Chen, Chung Hsien Wu, Yi Chin Huang, Shih Lun Lin, Jhing Fa Wang

Research output: Contribution to journalArticlepeer-review

Abstract

This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.

Original languageEnglish
Pages (from-to)1052-1065
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume24
Issue number6
DOIs
Publication statusPublished - 2016 Jun

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus'. Together they form a unique fingerprint.

Cite this