Generation of emotion control vector using MDS-based space transformation for expressive speech synthesis

Yan You Chen, Chung Hsien Wu, Yu Fong Huang

研究成果: Conference article

2 引文 (Scopus)

摘要

In control vector-based expressive speech synthesis, the emotion/style control vector defined in the categorical (CAT) emotion space is uneasy to be precisely defined by the user to synthesize the speech with the desired emotion/style. This paper applies the arousal-valence (AV) space to the multiple regression hidden semi-Markov model (MRHSMM)-based synthesis framework for expressive speech synthesis. In this study, the user can designate a specific emotion by defining the AV values in the AV space. The multidimensional scaling (MDS) method is adopted to project the AV emotion space and the categorical (CAT) emotion space onto their corresponding orthogonal coordinate systems. A transformation approach is thus proposed to transform the AV values to the emotion control vector in CAT emotion space for MRHSMM-based expressive speech synthesis. In the synthesis phase given the input text and desired emotion, with the transformed emotion control vector, the speech with the desired emotion is generated from the MRHSMMs. Experimental result shows the proposed method is helpful for the user to easily and precisely determine the desired emotion for expressive speech synthesis.

原文English
頁(從 - 到)3176-3180
頁數5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
08-12-September-2016
DOIs
出版狀態Published - 2016 一月 1
事件17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
持續時間: 2016 九月 82016 九月 16

指紋

Speech Synthesis
Speech synthesis
Vector Control
Scaling
Categorical
Semi-Markov Model
Multiple Regression
Emotion
Expressive
Multidimensional Scaling
Synthesis
Model-based

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

引用此文

@article{7edb1a7b8fbb4c56b19c1db5e97f581f,
title = "Generation of emotion control vector using MDS-based space transformation for expressive speech synthesis",
abstract = "In control vector-based expressive speech synthesis, the emotion/style control vector defined in the categorical (CAT) emotion space is uneasy to be precisely defined by the user to synthesize the speech with the desired emotion/style. This paper applies the arousal-valence (AV) space to the multiple regression hidden semi-Markov model (MRHSMM)-based synthesis framework for expressive speech synthesis. In this study, the user can designate a specific emotion by defining the AV values in the AV space. The multidimensional scaling (MDS) method is adopted to project the AV emotion space and the categorical (CAT) emotion space onto their corresponding orthogonal coordinate systems. A transformation approach is thus proposed to transform the AV values to the emotion control vector in CAT emotion space for MRHSMM-based expressive speech synthesis. In the synthesis phase given the input text and desired emotion, with the transformed emotion control vector, the speech with the desired emotion is generated from the MRHSMMs. Experimental result shows the proposed method is helpful for the user to easily and precisely determine the desired emotion for expressive speech synthesis.",
author = "Chen, {Yan You} and Wu, {Chung Hsien} and Huang, {Yu Fong}",
year = "2016",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2016-815",
language = "English",
volume = "08-12-September-2016",
pages = "3176--3180",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Generation of emotion control vector using MDS-based space transformation for expressive speech synthesis

AU - Chen, Yan You

AU - Wu, Chung Hsien

AU - Huang, Yu Fong

PY - 2016/1/1

Y1 - 2016/1/1

N2 - In control vector-based expressive speech synthesis, the emotion/style control vector defined in the categorical (CAT) emotion space is uneasy to be precisely defined by the user to synthesize the speech with the desired emotion/style. This paper applies the arousal-valence (AV) space to the multiple regression hidden semi-Markov model (MRHSMM)-based synthesis framework for expressive speech synthesis. In this study, the user can designate a specific emotion by defining the AV values in the AV space. The multidimensional scaling (MDS) method is adopted to project the AV emotion space and the categorical (CAT) emotion space onto their corresponding orthogonal coordinate systems. A transformation approach is thus proposed to transform the AV values to the emotion control vector in CAT emotion space for MRHSMM-based expressive speech synthesis. In the synthesis phase given the input text and desired emotion, with the transformed emotion control vector, the speech with the desired emotion is generated from the MRHSMMs. Experimental result shows the proposed method is helpful for the user to easily and precisely determine the desired emotion for expressive speech synthesis.

AB - In control vector-based expressive speech synthesis, the emotion/style control vector defined in the categorical (CAT) emotion space is uneasy to be precisely defined by the user to synthesize the speech with the desired emotion/style. This paper applies the arousal-valence (AV) space to the multiple regression hidden semi-Markov model (MRHSMM)-based synthesis framework for expressive speech synthesis. In this study, the user can designate a specific emotion by defining the AV values in the AV space. The multidimensional scaling (MDS) method is adopted to project the AV emotion space and the categorical (CAT) emotion space onto their corresponding orthogonal coordinate systems. A transformation approach is thus proposed to transform the AV values to the emotion control vector in CAT emotion space for MRHSMM-based expressive speech synthesis. In the synthesis phase given the input text and desired emotion, with the transformed emotion control vector, the speech with the desired emotion is generated from the MRHSMMs. Experimental result shows the proposed method is helpful for the user to easily and precisely determine the desired emotion for expressive speech synthesis.

UR - http://www.scopus.com/inward/record.url?scp=84994246619&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994246619&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2016-815

DO - 10.21437/Interspeech.2016-815

M3 - Conference article

AN - SCOPUS:84994246619

VL - 08-12-September-2016

SP - 3176

EP - 3180

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -