TY - GEN
T1 - Speech emotion recognition using autoencoder bottleneck features and LSTM
AU - Huang, Kun Yi
AU - Wu, Chung Hsien
AU - Yang, Tsung Hsien
AU - Su, Ming Hsiang
AU - Chou, Jia Hui
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/2
Y1 - 2016/7/2
N2 - A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.
AB - A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.
UR - http://www.scopus.com/inward/record.url?scp=85050507118&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050507118&partnerID=8YFLogxK
U2 - 10.1109/ICOT.2016.8278965
DO - 10.1109/ICOT.2016.8278965
M3 - Conference contribution
AN - SCOPUS:85050507118
T3 - 2016 International Conference on Orange Technologies, ICOT 2016
SP - 1
EP - 4
BT - 2016 International Conference on Orange Technologies, ICOT 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 International Conference on Orange Technologies, ICOT 2016
Y2 - 18 December 2016 through 20 December 2016
ER -