TY - JOUR
T1 - Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus
AU - Chen, Yan You
AU - Wu, Chung Hsien
AU - Huang, Yi Chin
AU - Lin, Shih Lun
AU - Wang, Jhing Fa
N1 - Funding Information:
This work was supported in part by the Ministry of Science and Technology under Grant MOST 102-2221-E-006-094-MY3 and in part by the Headquarters of University Advancement at the National Cheng Kung University, which is sponsored by the Ministry of Education, Taiwan
PY - 2016/6
Y1 - 2016/6
N2 - This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.
AB - This study proposes a hybrid approach to natural-sounding speech synthesis based on candidate expansion, unit selection, and prosody adjustment using a small corpus. The proposed method is more specific to tonal language, in particular Mandarin. In conventional speech synthesis studies, the quality of synthesized speech depends heavily on the size of the speech corpus. However, it is highly time-consuming and labor-intensive to prepare a large labeled corpus. In this work, candidate expansion is proposed to retrieve potential candidates that are unlikely to be retrieved using only linguistic features. The optimal unit sequence is then obtained from the expanded candidates by using the proposed unit selection mechanism at the phoneme and prosodic word levels. Finally, a prosodic word-level prosody adjustment is proposed to improve the continuity and smoothness of the prosody of the synthesized speech. To evaluate the proposed method, the Tsing-Hua corpus of speech synthesis was adopted. The results of an objective evaluation demonstrate the effectiveness of candidate expansion and the improvement of the continuity and smoothness of the prosody of the synthesized speech. The results of a subjective evaluation also show the proposed system could synthesize the speech with improved quality and naturalness, in particular for a small-sized or resource-limited corpus.
UR - http://www.scopus.com/inward/record.url?scp=84969800129&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84969800129&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2016.2537982
DO - 10.1109/TASLP.2016.2537982
M3 - Article
AN - SCOPUS:84969800129
VL - 24
SP - 1052
EP - 1065
JO - IEEE/ACM Transactions on Speech and Language Processing
JF - IEEE/ACM Transactions on Speech and Language Processing
SN - 2329-9290
IS - 6
ER -