Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus

Yan You Chen, Chung Hsien Wu, Yi Chin Huang, Shih Lun Lin, Jhing Fa Wang

研究成果: Article

摘要

This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.

原文English
頁(從 - 到)1052-1065
頁數14
期刊IEEE/ACM Transactions on Audio Speech and Language Processing
24
發行號6
DOIs
出版狀態Published - 2016 六月

指紋

Prosody
Speech Synthesis
Speech synthesis
Adjustment
candidacy
adjusting
expansion
synthesis
Unit
Smoothness
Subjective Evaluation
continuity
Hybrid Approach
Linguistics
phonemes
linguistics
Corpus
Speech
labor
evaluation

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

引用此文

Chen, Yan You ; Wu, Chung Hsien ; Huang, Yi Chin ; Lin, Shih Lun ; Wang, Jhing Fa. / Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus. 於: IEEE/ACM Transactions on Audio Speech and Language Processing. 2016 ; 卷 24, 編號 6. 頁 1052-1065.
@article{e31081b5a6d44f8f86e2ccbfabc9f98c,
title = "Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus",
abstract = "This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.",
author = "Chen, {Yan You} and Wu, {Chung Hsien} and Huang, {Yi Chin} and Lin, {Shih Lun} and Wang, {Jhing Fa}",
year = "2016",
month = "6",
doi = "10.1109/TASLP.2016.2537982",
language = "English",
volume = "24",
pages = "1052--1065",
journal = "IEEE/ACM Transactions on Speech and Language Processing",
issn = "2329-9290",
publisher = "IEEE Advancing Technology for Humanity",
number = "6",

}

Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus. / Chen, Yan You; Wu, Chung Hsien; Huang, Yi Chin; Lin, Shih Lun; Wang, Jhing Fa.

於: IEEE/ACM Transactions on Audio Speech and Language Processing, 卷 24, 編號 6, 06.2016, p. 1052-1065.

研究成果: Article

TY - JOUR

T1 - Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus

AU - Chen, Yan You

AU - Wu, Chung Hsien

AU - Huang, Yi Chin

AU - Lin, Shih Lun

AU - Wang, Jhing Fa

PY - 2016/6

Y1 - 2016/6

N2 - This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.

AB - This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.

UR - http://www.scopus.com/inward/record.url?scp=84969800129&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84969800129&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2016.2537982

DO - 10.1109/TASLP.2016.2537982

M3 - Article

AN - SCOPUS:84969800129

VL - 24

SP - 1052

EP - 1065

JO - IEEE/ACM Transactions on Speech and Language Processing

JF - IEEE/ACM Transactions on Speech and Language Processing

SN - 2329-9290

IS - 6

ER -