Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation

Yi Chin Huang, Chung-Hsien Wu, Ming Ge Shie

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.

Original languageEnglish
Pages (from-to)294-298
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2015-January
Publication statusPublished - 2015

Fingerprint

Speech Synthesis
Speech synthesis
Target
Speech
Spontaneous Speech
Model
Smoothing
Modulation
Segmentation
Labels
Experimental Results
Prosodic Word
Fluency

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

@article{023a1fbe86ca4542af3a6b268807c6ad,
title = "Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation",
abstract = "This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.",
author = "Huang, {Yi Chin} and Chung-Hsien Wu and Shie, {Ming Ge}",
year = "2015",
language = "English",
volume = "2015-January",
pages = "294--298",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation

AU - Huang, Yi Chin

AU - Wu, Chung-Hsien

AU - Shie, Ming Ge

PY - 2015

Y1 - 2015

N2 - This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.

AB - This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.

UR - http://www.scopus.com/inward/record.url?scp=84959173118&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959173118&partnerID=8YFLogxK

M3 - Article

VL - 2015-January

SP - 294

EP - 298

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -