Duration-embedded Bi-HMM for expressive voice conversion

Chi Chun Hsia, Chung-Hsien Wu, Te Hsien Liu

研究成果: Paper

1 引文 (Scopus)

摘要

This paper presents a duration-embedded Bi-HMM framework for expressive voice conversion. First, Ward's minimum variance clustering method is used to cluster all the conversion units (sub-syllables) in order to reduce the number of conversion models as well as the size of the required training database. The duration-embedded Bi-HMM trained with the EM algorithm is built for each sub-syllable class to convert the neutral speech into emotional speech considering the duration information. Finally, the prosodic cues are included in the modification of the spectrum-converted speech. The STRAIGHT algorithm is adopted for high-quality speech analysis and synthesis. Target emotions including happiness, sadness and anger are used. Objective and perceptual evaluations were conducted to compare the performance of the proposed approach with previous methods. The results show that the proposed method exhibits encouraging potential in expressive voice conversion.

原文English
頁面1921-1924
頁數4
出版狀態Published - 2005 十二月 1
事件9th European Conference on Speech Communication and Technology - Lisbon, Portugal
持續時間: 2005 九月 42005 九月 8

Other

Other9th European Conference on Speech Communication and Technology
國家Portugal
城市Lisbon
期間05-09-0405-09-08

指紋

Speech synthesis
Speech analysis

All Science Journal Classification (ASJC) codes

  • Engineering(all)

引用此文

Hsia, C. C., Wu, C-H., & Liu, T. H. (2005). Duration-embedded Bi-HMM for expressive voice conversion. 1921-1924. 論文發表於 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.
Hsia, Chi Chun ; Wu, Chung-Hsien ; Liu, Te Hsien. / Duration-embedded Bi-HMM for expressive voice conversion. 論文發表於 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.4 p.
@conference{102e90d309c84b35919086eeb0818359,
title = "Duration-embedded Bi-HMM for expressive voice conversion",
abstract = "This paper presents a duration-embedded Bi-HMM framework for expressive voice conversion. First, Ward's minimum variance clustering method is used to cluster all the conversion units (sub-syllables) in order to reduce the number of conversion models as well as the size of the required training database. The duration-embedded Bi-HMM trained with the EM algorithm is built for each sub-syllable class to convert the neutral speech into emotional speech considering the duration information. Finally, the prosodic cues are included in the modification of the spectrum-converted speech. The STRAIGHT algorithm is adopted for high-quality speech analysis and synthesis. Target emotions including happiness, sadness and anger are used. Objective and perceptual evaluations were conducted to compare the performance of the proposed approach with previous methods. The results show that the proposed method exhibits encouraging potential in expressive voice conversion.",
author = "Hsia, {Chi Chun} and Chung-Hsien Wu and Liu, {Te Hsien}",
year = "2005",
month = "12",
day = "1",
language = "English",
pages = "1921--1924",
note = "9th European Conference on Speech Communication and Technology ; Conference date: 04-09-2005 Through 08-09-2005",

}

Hsia, CC, Wu, C-H & Liu, TH 2005, 'Duration-embedded Bi-HMM for expressive voice conversion', 論文發表於 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 05-09-04 - 05-09-08 頁 1921-1924.

Duration-embedded Bi-HMM for expressive voice conversion. / Hsia, Chi Chun; Wu, Chung-Hsien; Liu, Te Hsien.

2005. 1921-1924 論文發表於 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.

研究成果: Paper

TY - CONF

T1 - Duration-embedded Bi-HMM for expressive voice conversion

AU - Hsia, Chi Chun

AU - Wu, Chung-Hsien

AU - Liu, Te Hsien

PY - 2005/12/1

Y1 - 2005/12/1

N2 - This paper presents a duration-embedded Bi-HMM framework for expressive voice conversion. First, Ward's minimum variance clustering method is used to cluster all the conversion units (sub-syllables) in order to reduce the number of conversion models as well as the size of the required training database. The duration-embedded Bi-HMM trained with the EM algorithm is built for each sub-syllable class to convert the neutral speech into emotional speech considering the duration information. Finally, the prosodic cues are included in the modification of the spectrum-converted speech. The STRAIGHT algorithm is adopted for high-quality speech analysis and synthesis. Target emotions including happiness, sadness and anger are used. Objective and perceptual evaluations were conducted to compare the performance of the proposed approach with previous methods. The results show that the proposed method exhibits encouraging potential in expressive voice conversion.

AB - This paper presents a duration-embedded Bi-HMM framework for expressive voice conversion. First, Ward's minimum variance clustering method is used to cluster all the conversion units (sub-syllables) in order to reduce the number of conversion models as well as the size of the required training database. The duration-embedded Bi-HMM trained with the EM algorithm is built for each sub-syllable class to convert the neutral speech into emotional speech considering the duration information. Finally, the prosodic cues are included in the modification of the spectrum-converted speech. The STRAIGHT algorithm is adopted for high-quality speech analysis and synthesis. Target emotions including happiness, sadness and anger are used. Objective and perceptual evaluations were conducted to compare the performance of the proposed approach with previous methods. The results show that the proposed method exhibits encouraging potential in expressive voice conversion.

UR - http://www.scopus.com/inward/record.url?scp=33745194137&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745194137&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:33745194137

SP - 1921

EP - 1924

ER -

Hsia CC, Wu C-H, Liu TH. Duration-embedded Bi-HMM for expressive voice conversion. 2005. 論文發表於 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.