Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus

Yan You Chen, Chung-Hsien Wu, Yi Chin Huang, Shih Lun Lin, Jhing Fa Wang

Research output: Contribution to journalArticle

Abstract

This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.

Original languageEnglish
Pages (from-to)1052-1065
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume24
Issue number6
DOIs
Publication statusPublished - 2016 Jun 1

Fingerprint

Prosody
Speech Synthesis
Speech synthesis
Adjustment
candidacy
adjusting
expansion
synthesis
Unit
Smoothness
Subjective Evaluation
continuity
Hybrid Approach
Linguistics
phonemes
linguistics
Corpus
Speech
labor
evaluation

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Cite this

@article{e31081b5a6d44f8f86e2ccbfabc9f98c,
title = "Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus",
abstract = "This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.",
author = "Chen, {Yan You} and Chung-Hsien Wu and Huang, {Yi Chin} and Lin, {Shih Lun} and Wang, {Jhing Fa}",
year = "2016",
month = "6",
day = "1",
doi = "10.1109/TASLP.2016.2537982",
language = "English",
volume = "24",
pages = "1052--1065",
journal = "IEEE/ACM Transactions on Speech and Language Processing",
issn = "2329-9290",
publisher = "IEEE Advancing Technology for Humanity",
number = "6",

}

Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus. / Chen, Yan You; Wu, Chung-Hsien; Huang, Yi Chin; Lin, Shih Lun; Wang, Jhing Fa.

In: IEEE/ACM Transactions on Audio Speech and Language Processing, Vol. 24, No. 6, 01.06.2016, p. 1052-1065.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus

AU - Chen, Yan You

AU - Wu, Chung-Hsien

AU - Huang, Yi Chin

AU - Lin, Shih Lun

AU - Wang, Jhing Fa

PY - 2016/6/1

Y1 - 2016/6/1

N2 - This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.

AB - This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.

UR - http://www.scopus.com/inward/record.url?scp=84969800129&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84969800129&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2016.2537982

DO - 10.1109/TASLP.2016.2537982

M3 - Article

VL - 24

SP - 1052

EP - 1065

JO - IEEE/ACM Transactions on Speech and Language Processing

JF - IEEE/ACM Transactions on Speech and Language Processing

SN - 2329-9290

IS - 6

ER -