Speech emotion recognition using autoencoder bottleneck features and LSTM

Kun Yi Huang, Chung Hsien Wu, Tsung Hsien Yang, Ming Hsiang Su, Jia Hui Chou

研究成果: Conference contribution

1 引文 (Scopus)

摘要

A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.

原文English
主出版物標題2016 International Conference on Orange Technologies, ICOT 2016
發行者Institute of Electrical and Electronics Engineers Inc.
頁面1-4
頁數4
ISBN(電子)9781538648315
DOIs
出版狀態Published - 2018 二月 1
事件2016 International Conference on Orange Technologies, ICOT 2016 - Melbourne, Australia
持續時間: 2016 十二月 182016 十二月 20

出版系列

名字2016 International Conference on Orange Technologies, ICOT 2016
2018-January

Other

Other2016 International Conference on Orange Technologies, ICOT 2016
國家Australia
城市Melbourne
期間16-12-1816-12-20

指紋

Long-Term Memory
Speech recognition
Short-Term Memory
Emotions
Scattering
Databases
Neural networks
Recognition (Psychology)
Long short-term memory
Processing
Research

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Behavioral Neuroscience
  • Cognitive Neuroscience

引用此文

Huang, K. Y., Wu, C. H., Yang, T. H., Su, M. H., & Chou, J. H. (2018). Speech emotion recognition using autoencoder bottleneck features and LSTM. 於 2016 International Conference on Orange Technologies, ICOT 2016 (頁 1-4). [8278965] (2016 International Conference on Orange Technologies, ICOT 2016; 卷 2018-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICOT.2016.8278965
Huang, Kun Yi ; Wu, Chung Hsien ; Yang, Tsung Hsien ; Su, Ming Hsiang ; Chou, Jia Hui. / Speech emotion recognition using autoencoder bottleneck features and LSTM. 2016 International Conference on Orange Technologies, ICOT 2016. Institute of Electrical and Electronics Engineers Inc., 2018. 頁 1-4 (2016 International Conference on Orange Technologies, ICOT 2016).
@inproceedings{9d8c59d5fabc4580a9afb98cf07eee7c,
title = "Speech emotion recognition using autoencoder bottleneck features and LSTM",
abstract = "A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1{\%}, outperforming the systems using LLDs or DSS individually.",
author = "Huang, {Kun Yi} and Wu, {Chung Hsien} and Yang, {Tsung Hsien} and Su, {Ming Hsiang} and Chou, {Jia Hui}",
year = "2018",
month = "2",
day = "1",
doi = "10.1109/ICOT.2016.8278965",
language = "English",
series = "2016 International Conference on Orange Technologies, ICOT 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "1--4",
booktitle = "2016 International Conference on Orange Technologies, ICOT 2016",
address = "United States",

}

Huang, KY, Wu, CH, Yang, TH, Su, MH & Chou, JH 2018, Speech emotion recognition using autoencoder bottleneck features and LSTM. 於 2016 International Conference on Orange Technologies, ICOT 2016., 8278965, 2016 International Conference on Orange Technologies, ICOT 2016, 卷 2018-January, Institute of Electrical and Electronics Engineers Inc., 頁 1-4, 2016 International Conference on Orange Technologies, ICOT 2016, Melbourne, Australia, 16-12-18. https://doi.org/10.1109/ICOT.2016.8278965

Speech emotion recognition using autoencoder bottleneck features and LSTM. / Huang, Kun Yi; Wu, Chung Hsien; Yang, Tsung Hsien; Su, Ming Hsiang; Chou, Jia Hui.

2016 International Conference on Orange Technologies, ICOT 2016. Institute of Electrical and Electronics Engineers Inc., 2018. p. 1-4 8278965 (2016 International Conference on Orange Technologies, ICOT 2016; 卷 2018-January).

研究成果: Conference contribution

TY - GEN

T1 - Speech emotion recognition using autoencoder bottleneck features and LSTM

AU - Huang, Kun Yi

AU - Wu, Chung Hsien

AU - Yang, Tsung Hsien

AU - Su, Ming Hsiang

AU - Chou, Jia Hui

PY - 2018/2/1

Y1 - 2018/2/1

N2 - A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.

AB - A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.

UR - http://www.scopus.com/inward/record.url?scp=85050507118&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050507118&partnerID=8YFLogxK

U2 - 10.1109/ICOT.2016.8278965

DO - 10.1109/ICOT.2016.8278965

M3 - Conference contribution

AN - SCOPUS:85050507118

T3 - 2016 International Conference on Orange Technologies, ICOT 2016

SP - 1

EP - 4

BT - 2016 International Conference on Orange Technologies, ICOT 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Huang KY, Wu CH, Yang TH, Su MH, Chou JH. Speech emotion recognition using autoencoder bottleneck features and LSTM. 於 2016 International Conference on Orange Technologies, ICOT 2016. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1-4. 8278965. (2016 International Conference on Orange Technologies, ICOT 2016). https://doi.org/10.1109/ICOT.2016.8278965