Speech emotion recognition using autoencoder bottleneck features and LSTM

Kun Yi Huang, Chung Hsien Wu, Tsung Hsien Yang, Ming Hsiang Su, Jia Hui Chou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.

Original languageEnglish
Title of host publication2016 International Conference on Orange Technologies, ICOT 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-4
Number of pages4
ISBN (Electronic)9781538648315
DOIs
Publication statusPublished - 2018 Feb 1
Event2016 International Conference on Orange Technologies, ICOT 2016 - Melbourne, Australia
Duration: 2016 Dec 182016 Dec 20

Publication series

Name2016 International Conference on Orange Technologies, ICOT 2016
Volume2018-January

Other

Other2016 International Conference on Orange Technologies, ICOT 2016
CountryAustralia
CityMelbourne
Period16-12-1816-12-20

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Behavioral Neuroscience
  • Cognitive Neuroscience

Fingerprint Dive into the research topics of 'Speech emotion recognition using autoencoder bottleneck features and LSTM'. Together they form a unique fingerprint.

Cite this