Personalized natural speech synthesis based on retrieval of pitch patterns using hierarchical Fujisaki model

Yi Chin Huang, Chung-Hsien Wu, Shih Lun Lin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

In recent years, speech synthesis based on Hidden Markov Model (HMM) has been developed, which can synthesize stable and intelligible speech with flexibility and small footprint. However, synthesized prosodic features are still incapable to convey personalization and natural property. Previous prosody models, mainly constructed from the clustered prosodic features, are unable to characterize personalized prosodic information as the linguistic cues of the input sentence are indistinguishable for all speakers. An approach to retrieval of personalized pitch patterns from the real speech corpus of the target speaker, is proposed, incorporating with the HMM-based speech synthesizer, to generate a personalized natural pitch contour. The modified Fujisaki model is adopted to depict the hierarchical pitch patterns, aiming to model local pitch contour variation and global intonation of utterances in the corpus. The codeword sequences of utterances in the training and the synthesized corpora are constructed and used to obtain the relationship of pitch patterns between the real and synthesized speech. Finally, a language model of pitch pattern is constructed to obtain an optimal pitch pattern sequence of the input sentence. The experimental results using subjective and objective evaluations demonstrated the proposed approach can substantially outperform the conventional statistical synthesis methods, in terms of naturalness and speaker similarity.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages7844-7848
Number of pages5
DOIs
Publication statusPublished - 2013 Oct 18
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: 2013 May 262013 May 31

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
CountryCanada
CityVancouver, BC
Period13-05-2613-05-31

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Personalized natural speech synthesis based on retrieval of pitch patterns using hierarchical Fujisaki model'. Together they form a unique fingerprint.

  • Cite this

    Huang, Y. C., Wu, C-H., & Lin, S. L. (2013). Personalized natural speech synthesis based on retrieval of pitch patterns using hierarchical Fujisaki model. In 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings (pp. 7844-7848). [6639191] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2013.6639191