Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation

Yi Chin Huang, Chung Hsien Wu, Ming Ge Shie

研究成果: Conference article

1 引文 (Scopus)

摘要

This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.

原文English
頁(從 - 到)294-298
頁數5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2015-January
出版狀態Published - 2015 一月 1
事件16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
持續時間: 2015 九月 62015 九月 10

指紋

Speech Synthesis
Speech synthesis
Target
Speech
Spontaneous Speech
Model
Smoothing
Modulation
Segmentation
Labels
Experimental Results
Prosodic Word
Fluency

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

引用此文

@article{023a1fbe86ca4542af3a6b268807c6ad,
title = "Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation",
abstract = "This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.",
author = "Huang, {Yi Chin} and Wu, {Chung Hsien} and Shie, {Ming Ge}",
year = "2015",
month = "1",
day = "1",
language = "English",
volume = "2015-January",
pages = "294--298",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation

AU - Huang, Yi Chin

AU - Wu, Chung Hsien

AU - Shie, Ming Ge

PY - 2015/1/1

Y1 - 2015/1/1

N2 - This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.

AB - This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.

UR - http://www.scopus.com/inward/record.url?scp=84959173118&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959173118&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84959173118

VL - 2015-January

SP - 294

EP - 298

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -