Speech emotion recognition using autoencoder bottleneck features and LSTM

Kun Yi Huang, Chung Hsien Wu, Tsung Hsien Yang, Ming Hsiang Su, Jia Hui Chou

研究成果: Conference contribution

16 引文 斯高帕斯(Scopus)

摘要

A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.

原文English
主出版物標題2016 International Conference on Orange Technologies, ICOT 2016
發行者Institute of Electrical and Electronics Engineers Inc.
頁面1-4
頁數4
ISBN(電子)9781538648315
DOIs
出版狀態Published - 2016 7月 2
事件2016 International Conference on Orange Technologies, ICOT 2016 - Melbourne, Australia
持續時間: 2016 12月 182016 12月 20

出版系列

名字2016 International Conference on Orange Technologies, ICOT 2016
2018-January

Other

Other2016 International Conference on Orange Technologies, ICOT 2016
國家/地區Australia
城市Melbourne
期間16-12-1816-12-20

All Science Journal Classification (ASJC) codes

  • 電腦科學應用
  • 電腦視覺和模式識別
  • 行為神經科學
  • 認知神經科學

指紋

深入研究「Speech emotion recognition using autoencoder bottleneck features and LSTM」主題。共同形成了獨特的指紋。

引用此