Pronunciation variation generation for spontaneous speech synthesis using state-based voice transformation

Chung Han Lee, Chung Hsien Wu, Jun Cheng Guo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

This study presents an approach to Hidden Markov Models (HMM)-based spontaneous speech synthesis with pronunciation variation for better spontaneity. Pronunciation variation generally occurs in spontaneous speech and plays an important role in expressing the spontaneity. In this study, a state-based transformation function is adopted to model the relation between read speech and the corresponding spontaneous speech with pronunciation variations. The transformation function is then used to generate the state-based pronunciation variations. Due to the lack of training data, the articulatory features are used to cluster the transformation functions using Classification and Regression Trees (CARTs) such that the unseen pronunciation variation with the same articulatory features can be generated from the transformation function in the same cluster. Objective and subjective tests are conducted to evaluate the performance of the proposed approach. The experimental results show that the proposed transformation function achieves a significant improvement on spontaneity in synthesized speech.

Original languageEnglish
Title of host publication2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4826-4829
Number of pages4
ISBN (Print)9781424442966
DOIs
Publication statusPublished - 2010
Event2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Dallas, TX, United States
Duration: 2010 Mar 142010 Mar 19

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
Country/TerritoryUnited States
CityDallas, TX
Period10-03-1410-03-19

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Pronunciation variation generation for spontaneous speech synthesis using state-based voice transformation'. Together they form a unique fingerprint.

Cite this