Emotion perception and recognition from speech

Chung-Hsien Wu, Jui Feng Yeh, Ze Jing Chuang

Research output: Chapter in Book/Report/Conference proceedingChapter

18 Citations (Scopus)

Abstract

With the increasing role of speech interfaces in human-computer interac¬tion applications, automatically recognizing emotions from human speech becomes more and more important. This chapter begins by introducing the correlations be¬tween basic speech features such as pitch, intensity, formants, MFCC, and so on, and the emotions. Several recognition methods are then described to illustrate the performance of the previously proposed models, including support vector machine (SVM), K-nearest neighbors (KNN), neural networks, and the like. To give a more practical description of an emotion recognition procedure, a new approach to emotion recognition is provided as a case study. In this case study, the Intonation Groups (IGs) of the input speech signals are first defined and extracted for feature extraction. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to char¬acterize the feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Min¬imum Classification Error (MCE) algorithm. The IG-based feature vectors compen¬sated by the compensation vectors are used to train the Gaussian Mixture Models (GMMs) for each emotional state. The emotional state with the GMM having the maximal likelihood ratio is determined as the emotion state output.

Original languageEnglish
Title of host publicationAffective Information Processing
PublisherSpringer London
Pages93-110
Number of pages18
ISBN (Print)9781848003057
DOIs
Publication statusPublished - 2009 Dec 1

Fingerprint

Human computer interaction
Interfaces (computer)
Support vector machines
Feature extraction
Neural networks
Compensation and Redress

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

Wu, C-H., Yeh, J. F., & Chuang, Z. J. (2009). Emotion perception and recognition from speech. In Affective Information Processing (pp. 93-110). Springer London. https://doi.org/10.1007/978-1-84800-306-4_6
Wu, Chung-Hsien ; Yeh, Jui Feng ; Chuang, Ze Jing. / Emotion perception and recognition from speech. Affective Information Processing. Springer London, 2009. pp. 93-110
@inbook{e491f130761649fe901ee87d04b00aa4,
title = "Emotion perception and recognition from speech",
abstract = "With the increasing role of speech interfaces in human-computer interac¬tion applications, automatically recognizing emotions from human speech becomes more and more important. This chapter begins by introducing the correlations be¬tween basic speech features such as pitch, intensity, formants, MFCC, and so on, and the emotions. Several recognition methods are then described to illustrate the performance of the previously proposed models, including support vector machine (SVM), K-nearest neighbors (KNN), neural networks, and the like. To give a more practical description of an emotion recognition procedure, a new approach to emotion recognition is provided as a case study. In this case study, the Intonation Groups (IGs) of the input speech signals are first defined and extracted for feature extraction. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to char¬acterize the feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Min¬imum Classification Error (MCE) algorithm. The IG-based feature vectors compen¬sated by the compensation vectors are used to train the Gaussian Mixture Models (GMMs) for each emotional state. The emotional state with the GMM having the maximal likelihood ratio is determined as the emotion state output.",
author = "Chung-Hsien Wu and Yeh, {Jui Feng} and Chuang, {Ze Jing}",
year = "2009",
month = "12",
day = "1",
doi = "10.1007/978-1-84800-306-4_6",
language = "English",
isbn = "9781848003057",
pages = "93--110",
booktitle = "Affective Information Processing",
publisher = "Springer London",
address = "United Kingdom",

}

Wu, C-H, Yeh, JF & Chuang, ZJ 2009, Emotion perception and recognition from speech. in Affective Information Processing. Springer London, pp. 93-110. https://doi.org/10.1007/978-1-84800-306-4_6

Emotion perception and recognition from speech. / Wu, Chung-Hsien; Yeh, Jui Feng; Chuang, Ze Jing.

Affective Information Processing. Springer London, 2009. p. 93-110.

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Emotion perception and recognition from speech

AU - Wu, Chung-Hsien

AU - Yeh, Jui Feng

AU - Chuang, Ze Jing

PY - 2009/12/1

Y1 - 2009/12/1

N2 - With the increasing role of speech interfaces in human-computer interac¬tion applications, automatically recognizing emotions from human speech becomes more and more important. This chapter begins by introducing the correlations be¬tween basic speech features such as pitch, intensity, formants, MFCC, and so on, and the emotions. Several recognition methods are then described to illustrate the performance of the previously proposed models, including support vector machine (SVM), K-nearest neighbors (KNN), neural networks, and the like. To give a more practical description of an emotion recognition procedure, a new approach to emotion recognition is provided as a case study. In this case study, the Intonation Groups (IGs) of the input speech signals are first defined and extracted for feature extraction. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to char¬acterize the feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Min¬imum Classification Error (MCE) algorithm. The IG-based feature vectors compen¬sated by the compensation vectors are used to train the Gaussian Mixture Models (GMMs) for each emotional state. The emotional state with the GMM having the maximal likelihood ratio is determined as the emotion state output.

AB - With the increasing role of speech interfaces in human-computer interac¬tion applications, automatically recognizing emotions from human speech becomes more and more important. This chapter begins by introducing the correlations be¬tween basic speech features such as pitch, intensity, formants, MFCC, and so on, and the emotions. Several recognition methods are then described to illustrate the performance of the previously proposed models, including support vector machine (SVM), K-nearest neighbors (KNN), neural networks, and the like. To give a more practical description of an emotion recognition procedure, a new approach to emotion recognition is provided as a case study. In this case study, the Intonation Groups (IGs) of the input speech signals are first defined and extracted for feature extraction. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to char¬acterize the feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Min¬imum Classification Error (MCE) algorithm. The IG-based feature vectors compen¬sated by the compensation vectors are used to train the Gaussian Mixture Models (GMMs) for each emotional state. The emotional state with the GMM having the maximal likelihood ratio is determined as the emotion state output.

UR - http://www.scopus.com/inward/record.url?scp=77950922687&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950922687&partnerID=8YFLogxK

U2 - 10.1007/978-1-84800-306-4_6

DO - 10.1007/978-1-84800-306-4_6

M3 - Chapter

AN - SCOPUS:77950922687

SN - 9781848003057

SP - 93

EP - 110

BT - Affective Information Processing

PB - Springer London

ER -

Wu C-H, Yeh JF, Chuang ZJ. Emotion perception and recognition from speech. In Affective Information Processing. Springer London. 2009. p. 93-110 https://doi.org/10.1007/978-1-84800-306-4_6