Generation of emotion control vector using MDS-based space transformation for expressive speech synthesis

Yan You Chen, Chung-Hsien Wu, Yu Fong Huang

Research output: Contribution to journalConference article

2 Citations (Scopus)

Abstract

In control vector-based expressive speech synthesis, the emotion/style control vector defined in the categorical (CAT) emotion space is uneasy to be precisely defined by the user to synthesize the speech with the desired emotion/style. This paper applies the arousal-valence (AV) space to the multiple regression hidden semi-Markov model (MRHSMM)-based synthesis framework for expressive speech synthesis. In this study, the user can designate a specific emotion by defining the AV values in the AV space. The multidimensional scaling (MDS) method is adopted to project the AV emotion space and the categorical (CAT) emotion space onto their corresponding orthogonal coordinate systems. A transformation approach is thus proposed to transform the AV values to the emotion control vector in CAT emotion space for MRHSMM-based expressive speech synthesis. In the synthesis phase given the input text and desired emotion, with the transformed emotion control vector, the speech with the desired emotion is generated from the MRHSMMs. Experimental result shows the proposed method is helpful for the user to easily and precisely determine the desired emotion for expressive speech synthesis.

Original languageEnglish
Pages (from-to)3176-3180
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
Publication statusPublished - 2016 Jan 1
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: 2016 Sep 82016 Sep 16

Fingerprint

Speech Synthesis
Speech synthesis
Vector Control
Scaling
Categorical
Semi-Markov Model
Multiple Regression
Emotion
Expressive
Multidimensional Scaling
Synthesis
Model-based

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

@article{7edb1a7b8fbb4c56b19c1db5e97f581f,
title = "Generation of emotion control vector using MDS-based space transformation for expressive speech synthesis",
abstract = "In control vector-based expressive speech synthesis, the emotion/style control vector defined in the categorical (CAT) emotion space is uneasy to be precisely defined by the user to synthesize the speech with the desired emotion/style. This paper applies the arousal-valence (AV) space to the multiple regression hidden semi-Markov model (MRHSMM)-based synthesis framework for expressive speech synthesis. In this study, the user can designate a specific emotion by defining the AV values in the AV space. The multidimensional scaling (MDS) method is adopted to project the AV emotion space and the categorical (CAT) emotion space onto their corresponding orthogonal coordinate systems. A transformation approach is thus proposed to transform the AV values to the emotion control vector in CAT emotion space for MRHSMM-based expressive speech synthesis. In the synthesis phase given the input text and desired emotion, with the transformed emotion control vector, the speech with the desired emotion is generated from the MRHSMMs. Experimental result shows the proposed method is helpful for the user to easily and precisely determine the desired emotion for expressive speech synthesis.",
author = "Chen, {Yan You} and Chung-Hsien Wu and Huang, {Yu Fong}",
year = "2016",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2016-815",
language = "English",
volume = "08-12-September-2016",
pages = "3176--3180",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

Generation of emotion control vector using MDS-based space transformation for expressive speech synthesis. / Chen, Yan You; Wu, Chung-Hsien; Huang, Yu Fong.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 08-12-September-2016, 01.01.2016, p. 3176-3180.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Generation of emotion control vector using MDS-based space transformation for expressive speech synthesis

AU - Chen, Yan You

AU - Wu, Chung-Hsien

AU - Huang, Yu Fong

PY - 2016/1/1

Y1 - 2016/1/1

N2 - In control vector-based expressive speech synthesis, the emotion/style control vector defined in the categorical (CAT) emotion space is uneasy to be precisely defined by the user to synthesize the speech with the desired emotion/style. This paper applies the arousal-valence (AV) space to the multiple regression hidden semi-Markov model (MRHSMM)-based synthesis framework for expressive speech synthesis. In this study, the user can designate a specific emotion by defining the AV values in the AV space. The multidimensional scaling (MDS) method is adopted to project the AV emotion space and the categorical (CAT) emotion space onto their corresponding orthogonal coordinate systems. A transformation approach is thus proposed to transform the AV values to the emotion control vector in CAT emotion space for MRHSMM-based expressive speech synthesis. In the synthesis phase given the input text and desired emotion, with the transformed emotion control vector, the speech with the desired emotion is generated from the MRHSMMs. Experimental result shows the proposed method is helpful for the user to easily and precisely determine the desired emotion for expressive speech synthesis.

AB - In control vector-based expressive speech synthesis, the emotion/style control vector defined in the categorical (CAT) emotion space is uneasy to be precisely defined by the user to synthesize the speech with the desired emotion/style. This paper applies the arousal-valence (AV) space to the multiple regression hidden semi-Markov model (MRHSMM)-based synthesis framework for expressive speech synthesis. In this study, the user can designate a specific emotion by defining the AV values in the AV space. The multidimensional scaling (MDS) method is adopted to project the AV emotion space and the categorical (CAT) emotion space onto their corresponding orthogonal coordinate systems. A transformation approach is thus proposed to transform the AV values to the emotion control vector in CAT emotion space for MRHSMM-based expressive speech synthesis. In the synthesis phase given the input text and desired emotion, with the transformed emotion control vector, the speech with the desired emotion is generated from the MRHSMMs. Experimental result shows the proposed method is helpful for the user to easily and precisely determine the desired emotion for expressive speech synthesis.

UR - http://www.scopus.com/inward/record.url?scp=84994246619&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994246619&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2016-815

DO - 10.21437/Interspeech.2016-815

M3 - Conference article

VL - 08-12-September-2016

SP - 3176

EP - 3180

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -