Speech emotion recognition with ensemble learning methods

Po Yuan Shih, Chia Ping Chen, Chung-Hsien Wu

研究成果: Conference contribution

1 引文 (Scopus)

摘要

In this paper, we propose to apply ensemble learning methods on neural networks to improve the performance of speech emotion recognition tasks. The basic idea is to first divide unbalanced data set into balanced subsets and then combine the predictions of the models trained on these subsets. Several methods regarding the decomposition of data and the exploitation of model predictions are investigated in this study. On the public-domain FAU-Aibo database, which is used in Interspeech Emotion Challenge evaluation, the best performance we achieve is an unweighted average (UA) recall rate of 45.5% for the 5-class classification task. Furthermore, such performance is achieved with a feature space of 40-dimension. Compared to the baseline system with 384-dimension feature vector per example and an UA of 38.9%, such a performance is very impressive. Indeed, this is one of the best performances on FAU-Aibo within the static modeling framework.

原文English
主出版物標題2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面2756-2760
頁數5
ISBN(電子)9781509041176
DOIs
出版狀態Published - 2017 六月 16
事件2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States
持續時間: 2017 三月 52017 三月 9

Other

Other2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
國家United States
城市New Orleans
期間17-03-0517-03-09

指紋

Set theory
Speech recognition
Neural networks
Decomposition

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

引用此文

Shih, P. Y., Chen, C. P., & Wu, C-H. (2017). Speech emotion recognition with ensemble learning methods. 於 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings (頁 2756-2760). [7952658] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2017.7952658
Shih, Po Yuan ; Chen, Chia Ping ; Wu, Chung-Hsien. / Speech emotion recognition with ensemble learning methods. 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. 頁 2756-2760
@inproceedings{236ad3aba7d641e2957ef76efc3ec2e7,
title = "Speech emotion recognition with ensemble learning methods",
abstract = "In this paper, we propose to apply ensemble learning methods on neural networks to improve the performance of speech emotion recognition tasks. The basic idea is to first divide unbalanced data set into balanced subsets and then combine the predictions of the models trained on these subsets. Several methods regarding the decomposition of data and the exploitation of model predictions are investigated in this study. On the public-domain FAU-Aibo database, which is used in Interspeech Emotion Challenge evaluation, the best performance we achieve is an unweighted average (UA) recall rate of 45.5{\%} for the 5-class classification task. Furthermore, such performance is achieved with a feature space of 40-dimension. Compared to the baseline system with 384-dimension feature vector per example and an UA of 38.9{\%}, such a performance is very impressive. Indeed, this is one of the best performances on FAU-Aibo within the static modeling framework.",
author = "Shih, {Po Yuan} and Chen, {Chia Ping} and Chung-Hsien Wu",
year = "2017",
month = "6",
day = "16",
doi = "10.1109/ICASSP.2017.7952658",
language = "English",
pages = "2756--2760",
booktitle = "2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Shih, PY, Chen, CP & Wu, C-H 2017, Speech emotion recognition with ensemble learning methods. 於 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings., 7952658, Institute of Electrical and Electronics Engineers Inc., 頁 2756-2760, 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017, New Orleans, United States, 17-03-05. https://doi.org/10.1109/ICASSP.2017.7952658

Speech emotion recognition with ensemble learning methods. / Shih, Po Yuan; Chen, Chia Ping; Wu, Chung-Hsien.

2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 2756-2760 7952658.

研究成果: Conference contribution

TY - GEN

T1 - Speech emotion recognition with ensemble learning methods

AU - Shih, Po Yuan

AU - Chen, Chia Ping

AU - Wu, Chung-Hsien

PY - 2017/6/16

Y1 - 2017/6/16

N2 - In this paper, we propose to apply ensemble learning methods on neural networks to improve the performance of speech emotion recognition tasks. The basic idea is to first divide unbalanced data set into balanced subsets and then combine the predictions of the models trained on these subsets. Several methods regarding the decomposition of data and the exploitation of model predictions are investigated in this study. On the public-domain FAU-Aibo database, which is used in Interspeech Emotion Challenge evaluation, the best performance we achieve is an unweighted average (UA) recall rate of 45.5% for the 5-class classification task. Furthermore, such performance is achieved with a feature space of 40-dimension. Compared to the baseline system with 384-dimension feature vector per example and an UA of 38.9%, such a performance is very impressive. Indeed, this is one of the best performances on FAU-Aibo within the static modeling framework.

AB - In this paper, we propose to apply ensemble learning methods on neural networks to improve the performance of speech emotion recognition tasks. The basic idea is to first divide unbalanced data set into balanced subsets and then combine the predictions of the models trained on these subsets. Several methods regarding the decomposition of data and the exploitation of model predictions are investigated in this study. On the public-domain FAU-Aibo database, which is used in Interspeech Emotion Challenge evaluation, the best performance we achieve is an unweighted average (UA) recall rate of 45.5% for the 5-class classification task. Furthermore, such performance is achieved with a feature space of 40-dimension. Compared to the baseline system with 384-dimension feature vector per example and an UA of 38.9%, such a performance is very impressive. Indeed, this is one of the best performances on FAU-Aibo within the static modeling framework.

UR - http://www.scopus.com/inward/record.url?scp=85023766525&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023766525&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2017.7952658

DO - 10.1109/ICASSP.2017.7952658

M3 - Conference contribution

AN - SCOPUS:85023766525

SP - 2756

EP - 2760

BT - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Shih PY, Chen CP, Wu C-H. Speech emotion recognition with ensemble learning methods. 於 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 2756-2760. 7952658 https://doi.org/10.1109/ICASSP.2017.7952658