Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract)

Chung-Hsien Wu, Wei Bin Liang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This work presents an approach to emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information (AP) and semantic labels (SLs). For AP-based recognition, acoustic and prosodic features are extracted from the detected emotional salient segments of the input speech. Three types of models GMMs, SVMs, and MLPs are adopted as the base-level classifiers. A Meta Decision Tree (MDT) is then employed for classifier fusion to obtain the AP-based emotion recognition confidence. For SL-based recognition, semantic labels are used to automatically extract Emotion Association Rules (EARs) from the recognized word sequence of the affective speech. The maximum entropy model (MaxEnt) is thereafter utilized to characterize the relationship between emotional states and EARs for emotion recognition. Finally, a weighted product fusion method is used to integrate the AP-based and SL-based recognition results for final emotion decision. For evaluation, 2,033 utterances for four emotional states were collected. The experimental results reveal that the emotion recognition performance for AP-based recognition using MDT achieved 80.00%. On the other hand, an average recognition accuracy of 80.92% was obtained for SL-based recognition. Finally, combining AP information and SLs achieved 83.55% accuracy for emotion recognition.

Original languageEnglish
Title of host publication2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages477-483
Number of pages7
ISBN (Electronic)9781479999538
DOIs
Publication statusPublished - 2015 Dec 2
Event2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015 - Xi'an, China
Duration: 2015 Sep 212015 Sep 24

Publication series

Name2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015

Other

Other2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015
CountryChina
CityXi'an
Period15-09-2115-09-24

Fingerprint

Labels
Classifiers
Acoustics
Semantics
Association rules
Decision trees
Fusion reactions
Entropy

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Software

Cite this

Wu, C-H., & Liang, W. B. (2015). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract). In 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015 (pp. 477-483). [7344613] (2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ACII.2015.7344613
Wu, Chung-Hsien ; Liang, Wei Bin. / Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract). 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 477-483 (2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015).
@inproceedings{2a70e193325a4d31ac35e3bf13f7d280,
title = "Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract)",
abstract = "This work presents an approach to emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information (AP) and semantic labels (SLs). For AP-based recognition, acoustic and prosodic features are extracted from the detected emotional salient segments of the input speech. Three types of models GMMs, SVMs, and MLPs are adopted as the base-level classifiers. A Meta Decision Tree (MDT) is then employed for classifier fusion to obtain the AP-based emotion recognition confidence. For SL-based recognition, semantic labels are used to automatically extract Emotion Association Rules (EARs) from the recognized word sequence of the affective speech. The maximum entropy model (MaxEnt) is thereafter utilized to characterize the relationship between emotional states and EARs for emotion recognition. Finally, a weighted product fusion method is used to integrate the AP-based and SL-based recognition results for final emotion decision. For evaluation, 2,033 utterances for four emotional states were collected. The experimental results reveal that the emotion recognition performance for AP-based recognition using MDT achieved 80.00{\%}. On the other hand, an average recognition accuracy of 80.92{\%} was obtained for SL-based recognition. Finally, combining AP information and SLs achieved 83.55{\%} accuracy for emotion recognition.",
author = "Chung-Hsien Wu and Liang, {Wei Bin}",
year = "2015",
month = "12",
day = "2",
doi = "10.1109/ACII.2015.7344613",
language = "English",
series = "2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "477--483",
booktitle = "2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015",
address = "United States",

}

Wu, C-H & Liang, WB 2015, Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract). in 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015., 7344613, 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015, Institute of Electrical and Electronics Engineers Inc., pp. 477-483, 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015, Xi'an, China, 15-09-21. https://doi.org/10.1109/ACII.2015.7344613

Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract). / Wu, Chung-Hsien; Liang, Wei Bin.

2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 477-483 7344613 (2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract)

AU - Wu, Chung-Hsien

AU - Liang, Wei Bin

PY - 2015/12/2

Y1 - 2015/12/2

N2 - This work presents an approach to emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information (AP) and semantic labels (SLs). For AP-based recognition, acoustic and prosodic features are extracted from the detected emotional salient segments of the input speech. Three types of models GMMs, SVMs, and MLPs are adopted as the base-level classifiers. A Meta Decision Tree (MDT) is then employed for classifier fusion to obtain the AP-based emotion recognition confidence. For SL-based recognition, semantic labels are used to automatically extract Emotion Association Rules (EARs) from the recognized word sequence of the affective speech. The maximum entropy model (MaxEnt) is thereafter utilized to characterize the relationship between emotional states and EARs for emotion recognition. Finally, a weighted product fusion method is used to integrate the AP-based and SL-based recognition results for final emotion decision. For evaluation, 2,033 utterances for four emotional states were collected. The experimental results reveal that the emotion recognition performance for AP-based recognition using MDT achieved 80.00%. On the other hand, an average recognition accuracy of 80.92% was obtained for SL-based recognition. Finally, combining AP information and SLs achieved 83.55% accuracy for emotion recognition.

AB - This work presents an approach to emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information (AP) and semantic labels (SLs). For AP-based recognition, acoustic and prosodic features are extracted from the detected emotional salient segments of the input speech. Three types of models GMMs, SVMs, and MLPs are adopted as the base-level classifiers. A Meta Decision Tree (MDT) is then employed for classifier fusion to obtain the AP-based emotion recognition confidence. For SL-based recognition, semantic labels are used to automatically extract Emotion Association Rules (EARs) from the recognized word sequence of the affective speech. The maximum entropy model (MaxEnt) is thereafter utilized to characterize the relationship between emotional states and EARs for emotion recognition. Finally, a weighted product fusion method is used to integrate the AP-based and SL-based recognition results for final emotion decision. For evaluation, 2,033 utterances for four emotional states were collected. The experimental results reveal that the emotion recognition performance for AP-based recognition using MDT achieved 80.00%. On the other hand, an average recognition accuracy of 80.92% was obtained for SL-based recognition. Finally, combining AP information and SLs achieved 83.55% accuracy for emotion recognition.

UR - http://www.scopus.com/inward/record.url?scp=84964033985&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964033985&partnerID=8YFLogxK

U2 - 10.1109/ACII.2015.7344613

DO - 10.1109/ACII.2015.7344613

M3 - Conference contribution

T3 - 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015

SP - 477

EP - 483

BT - 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Wu C-H, Liang WB. Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract). In 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 477-483. 7344613. (2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015). https://doi.org/10.1109/ACII.2015.7344613