Speech emotion recognition with ensemble learning methods

Po Yuan Shih, Chia Ping Chen, Chung-Hsien Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose to apply ensemble learning methods on neural networks to improve the performance of speech emotion recognition tasks. The basic idea is to first divide unbalanced data set into balanced subsets and then combine the predictions of the models trained on these subsets. Several methods regarding the decomposition of data and the exploitation of model predictions are investigated in this study. On the public-domain FAU-Aibo database, which is used in Interspeech Emotion Challenge evaluation, the best performance we achieve is an unweighted average (UA) recall rate of 45.5% for the 5-class classification task. Furthermore, such performance is achieved with a feature space of 40-dimension. Compared to the baseline system with 384-dimension feature vector per example and an UA of 38.9%, such a performance is very impressive. Indeed, this is one of the best performances on FAU-Aibo within the static modeling framework.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2756-2760
Number of pages5
ISBN (Electronic)9781509041176
DOIs
Publication statusPublished - 2017 Jun 16
Event2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States
Duration: 2017 Mar 52017 Mar 9

Other

Other2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
CountryUnited States
CityNew Orleans
Period17-03-0517-03-09

Fingerprint

Set theory
Speech recognition
Neural networks
Decomposition

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Shih, P. Y., Chen, C. P., & Wu, C-H. (2017). Speech emotion recognition with ensemble learning methods. In 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings (pp. 2756-2760). [7952658] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2017.7952658
Shih, Po Yuan ; Chen, Chia Ping ; Wu, Chung-Hsien. / Speech emotion recognition with ensemble learning methods. 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 2756-2760
@inproceedings{236ad3aba7d641e2957ef76efc3ec2e7,
title = "Speech emotion recognition with ensemble learning methods",
abstract = "In this paper, we propose to apply ensemble learning methods on neural networks to improve the performance of speech emotion recognition tasks. The basic idea is to first divide unbalanced data set into balanced subsets and then combine the predictions of the models trained on these subsets. Several methods regarding the decomposition of data and the exploitation of model predictions are investigated in this study. On the public-domain FAU-Aibo database, which is used in Interspeech Emotion Challenge evaluation, the best performance we achieve is an unweighted average (UA) recall rate of 45.5{\%} for the 5-class classification task. Furthermore, such performance is achieved with a feature space of 40-dimension. Compared to the baseline system with 384-dimension feature vector per example and an UA of 38.9{\%}, such a performance is very impressive. Indeed, this is one of the best performances on FAU-Aibo within the static modeling framework.",
author = "Shih, {Po Yuan} and Chen, {Chia Ping} and Chung-Hsien Wu",
year = "2017",
month = "6",
day = "16",
doi = "10.1109/ICASSP.2017.7952658",
language = "English",
pages = "2756--2760",
booktitle = "2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Shih, PY, Chen, CP & Wu, C-H 2017, Speech emotion recognition with ensemble learning methods. in 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings., 7952658, Institute of Electrical and Electronics Engineers Inc., pp. 2756-2760, 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017, New Orleans, United States, 17-03-05. https://doi.org/10.1109/ICASSP.2017.7952658

Speech emotion recognition with ensemble learning methods. / Shih, Po Yuan; Chen, Chia Ping; Wu, Chung-Hsien.

2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 2756-2760 7952658.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Speech emotion recognition with ensemble learning methods

AU - Shih, Po Yuan

AU - Chen, Chia Ping

AU - Wu, Chung-Hsien

PY - 2017/6/16

Y1 - 2017/6/16

N2 - In this paper, we propose to apply ensemble learning methods on neural networks to improve the performance of speech emotion recognition tasks. The basic idea is to first divide unbalanced data set into balanced subsets and then combine the predictions of the models trained on these subsets. Several methods regarding the decomposition of data and the exploitation of model predictions are investigated in this study. On the public-domain FAU-Aibo database, which is used in Interspeech Emotion Challenge evaluation, the best performance we achieve is an unweighted average (UA) recall rate of 45.5% for the 5-class classification task. Furthermore, such performance is achieved with a feature space of 40-dimension. Compared to the baseline system with 384-dimension feature vector per example and an UA of 38.9%, such a performance is very impressive. Indeed, this is one of the best performances on FAU-Aibo within the static modeling framework.

AB - In this paper, we propose to apply ensemble learning methods on neural networks to improve the performance of speech emotion recognition tasks. The basic idea is to first divide unbalanced data set into balanced subsets and then combine the predictions of the models trained on these subsets. Several methods regarding the decomposition of data and the exploitation of model predictions are investigated in this study. On the public-domain FAU-Aibo database, which is used in Interspeech Emotion Challenge evaluation, the best performance we achieve is an unweighted average (UA) recall rate of 45.5% for the 5-class classification task. Furthermore, such performance is achieved with a feature space of 40-dimension. Compared to the baseline system with 384-dimension feature vector per example and an UA of 38.9%, such a performance is very impressive. Indeed, this is one of the best performances on FAU-Aibo within the static modeling framework.

UR - http://www.scopus.com/inward/record.url?scp=85023766525&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023766525&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2017.7952658

DO - 10.1109/ICASSP.2017.7952658

M3 - Conference contribution

SP - 2756

EP - 2760

BT - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Shih PY, Chen CP, Wu C-H. Speech emotion recognition with ensemble learning methods. In 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 2756-2760. 7952658 https://doi.org/10.1109/ICASSP.2017.7952658