Idiolect extraction and generation for personalized speaking style modeling

Chung Hsien Wu, Chung Han Lee, Chung Hau Liang

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


A person's speaking style, consisting of such attributes as voice, choice of vocabulary, and the physical motions employed, not only expresses the speaker's identity but also emphasizes the content of an utterance. Speech combining these aspects of speaking style becomes more vivid and expressive to listeners. Recent research on speaking style modeling has paid more attention to speech signal processing. This approach focuses on text processing for idiolect extraction and generation to model a specific person's speaking style for the application of text-to-speech (TTS) conversion. The first stage of this study adopts a statistical method to automatically detect the candidate idiolects from a personalized, transcribed speech corpus. Based on the categorization of the detected candidate idiolects, superfluous idiolects are extracted using the fluency measure while the remaining candidates are regarded as the nonsuperfluous idiolects. In idiolect generation, the input text is converted into a target text with a particular speaker's speaking style via the insertion of superfluous idiolect or synonym substitution of nonsuperfluous idiolect. To evaluate the performance of the proposed methods experiments were conducted on a Chinese corpus collected and transcribed from the speech files of three Taiwanese politicians. The results show that the proposed method can effectively convert a source text into a target text with a personalized speaking style.

Original languageEnglish
Article number4740148
Pages (from-to)127-137
Number of pages11
JournalIEEE Transactions on Audio, Speech and Language Processing
Issue number1
Publication statusPublished - 2009 Jan 1

All Science Journal Classification (ASJC) codes

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Idiolect extraction and generation for personalized speaking style modeling'. Together they form a unique fingerprint.

Cite this