Automatic generation of synthesis units and prosodic information for Chinese concatenative synthesis

Chung Hsien Wu, Jau Hung Chen

Research output: Contribution to journalArticlepeer-review

44 Citations (Scopus)

Abstract

In this paper, some approaches to the generation of synthesis units and prosodic information are proposed for Mandarin Chinese text-to-speech (TTS) conversion. The monosyllables are adopted as the basic synthesis units. A set of synthesis units is selected from a large continuous speech database based on two cost functions, which minimize the inter- and intra-syllable distortion. The speech database is also employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a phrase. This template tree stores the prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. The experimental results showed that the synthesized prosodic features matched quite well with their original counterparts. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches.

Original languageEnglish
Pages (from-to)219-237
Number of pages19
JournalSpeech Communication
Volume35
Issue number3-4
DOIs
Publication statusPublished - 2001 Oct 1

All Science Journal Classification (ASJC) codes

  • Software
  • Modelling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Automatic generation of synthesis units and prosodic information for Chinese concatenative synthesis'. Together they form a unique fingerprint.

Cite this