Attentively-Coupled Long Short-Term Memory for Audio-Visual Emotion Recognition

Jia Hao Hsu, Chung Hsien Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There have been more and more studies on emotion recognition through multiple modalities. In the existing audiovisual emotion recognition methods, few studies focused on modeling emotional fluctuations in the signals. Besides, how to fuse multimodal signals, such as audio-visual signals, is still a challenging issue. In this paper, segments of audio-visual signals are extracted and considered as the recognition unit to characterize the emotional fluctuation. An Attentively-Coupled long-short term memory (ACLSTM) is proposed to combine the audio-based and visual-based LSTMs to improve the emotion recognition performance. In the Attentively-Coupled LSTM, the Coupled LSTM is used as the fusion model, and the neural tensor network (NTN) is employed for attention estimation to obtain the segment-based emotion consistency between audio and visual segments. Compared with previous approaches, the experimental results showed that the proposed method achieved the best results of 70.1% in multi-modal emotion recognition on the dataset BAUM-I.

Original languageEnglish
Title of host publication2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1048-1053
Number of pages6
ISBN (Electronic)9789881476883
Publication statusPublished - 2020 Dec 7
Event2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Virtual, Auckland, New Zealand
Duration: 2020 Dec 72020 Dec 10

Publication series

Name2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings

Conference

Conference2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020
Country/TerritoryNew Zealand
CityVirtual, Auckland
Period20-12-0720-12-10

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Signal Processing
  • Decision Sciences (miscellaneous)
  • Instrumentation

Fingerprint

Dive into the research topics of 'Attentively-Coupled Long Short-Term Memory for Audio-Visual Emotion Recognition'. Together they form a unique fingerprint.

Cite this