Abstract
A person's speaking style, consisting of such attributes as voice, choice of vocabulary, and the physical motions employed, not only expresses the speaker's identity but also emphasizes the content of an utterance. Speech combining these aspects of speaking style becomes more vivid and expressive to listeners. Recent research on speaking style modeling has paid more attention to speech signal processing. This approach focuses on text processing for idiolect extraction and generation to model a specific person's speaking style for the application of text-to-speech (TTS) conversion. The first stage of this study adopts a statistical method to automatically detect the candidate idiolects from a personalized, transcribed speech corpus. Based on the categorization of the detected candidate idiolects, superfluous idiolects are extracted using the fluency measure while the remaining candidates are regarded as the nonsuperfluous idiolects. In idiolect generation, the input text is converted into a target text with a particular speaker's speaking style via the insertion of superfluous idiolect or synonym substitution of nonsuperfluous idiolect. To evaluate the performance of the proposed methods experiments were conducted on a Chinese corpus collected and transcribed from the speech files of three Taiwanese politicians. The results show that the proposed method can effectively convert a source text into a target text with a personalized speaking style.
Original language | English |
---|---|
Article number | 4740148 |
Pages (from-to) | 127-137 |
Number of pages | 11 |
Journal | IEEE Transactions on Audio, Speech and Language Processing |
Volume | 17 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2009 Jan |
All Science Journal Classification (ASJC) codes
- Acoustics and Ultrasonics
- Electrical and Electronic Engineering