Variable-length unit selection in TTS using structural syntactic cost

Chung Hsien Wu, Chi Chun Hsia, Jiun Fu Chen, Jhing Fa Wang

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)

Abstract

This paper presents a variable-length unit selection scheme based on syntactic cost to select text-to-speech (TTS) synthesis units. The syntactic structure of a sentence is derived from a probabilistic context-free grammar (PCFG), and represented as a syntactic vector. The syntactic difference between target and candidate units (words or phrases) is estimated by the cosine measure with the inside probability of PCFG acting as a weight. Latent semantic analysis (LSA) is applied to reduce the dimensionality of the syntactic vectors. The dynamic programming algorithm is adopted to obtain a concatenated unit sequence with minimum cost. A syntactic property-rich speech database is designed and collected as the unit inventory. Several experiments with statistical testing are conducted to assess the quality of the synthetic speech as perceived by human subjects. The proposed method outperforms the synthesizer without considering syntactic property. The structural syntax estimates the substitution cost better than the acoustic features alone

Original languageEnglish
Article number4156186
Pages (from-to)1227-1235
Number of pages9
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume15
Issue number4
DOIs
Publication statusPublished - 2007 May 1

All Science Journal Classification (ASJC) codes

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Variable-length unit selection in TTS using structural syntactic cost'. Together they form a unique fingerprint.

Cite this