Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation

Yi Chin Huang, Chung Hsien Wu, Ming Ge Shie

Research output: Contribution to journalConference article

1 Citation (Scopus)

Abstract

This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.

Original languageEnglish
Pages (from-to)294-298
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2015-January
Publication statusPublished - 2015 Jan 1
Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
Duration: 2015 Sep 62015 Sep 10

Fingerprint

Speech Synthesis
Speech synthesis
Target
Speech
Spontaneous Speech
Model
Smoothing
Modulation
Segmentation
Labels
Experimental Results
Prosodic Word
Fluency

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

@article{023a1fbe86ca4542af3a6b268807c6ad,
title = "Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation",
abstract = "This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.",
author = "Huang, {Yi Chin} and Wu, {Chung Hsien} and Shie, {Ming Ge}",
year = "2015",
month = "1",
day = "1",
language = "English",
volume = "2015-January",
pages = "294--298",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation. / Huang, Yi Chin; Wu, Chung Hsien; Shie, Ming Ge.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2015-January, 01.01.2015, p. 294-298.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation

AU - Huang, Yi Chin

AU - Wu, Chung Hsien

AU - Shie, Ming Ge

PY - 2015/1/1

Y1 - 2015/1/1

N2 - This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.

AB - This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.

UR - http://www.scopus.com/inward/record.url?scp=84959173118&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959173118&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84959173118

VL - 2015-January

SP - 294

EP - 298

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -