Mood detection from daily conversational speech using denoising autoencoder and LSTM

Kun Yi Huang, Chung Hsien Wu, Ming Hsiang Su, Hsiang Chi Fu

研究成果: Conference contribution

4 引文 (Scopus)

摘要

In current studies, an extended subjective self-report method is generally used for measuring emotions. Even though it is commonly accepted that speech emotion perceived by the listener is close to the intended emotion conveyed by the speaker, research has indicated that there still remains a mismatch between them. In addition, the individuals with different personalities generally have different emotion expressions. Based on the investigation, in this study, a support vector machine (SVM)-based emotion model is first developed to detect perceived emotion from daily conversational speech. Then, a denoising autoencoder (DAE) is used to construct an emotion conversion model to characterize the relationship between the perceived emotion and the expressed emotion of the subject for a specific personality. Finally, a long short-term memory (LSTM)-based mood model is constructed to model the temporal fluctuation of speech emotions for mood detection. Experimental results show that the proposed method achieved a detection accuracy of 64.5%, improving by 5.0% compared to the HMM-based method.

原文English
主出版物標題2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面5125-5129
頁數5
ISBN(電子)9781509041176
DOIs
出版狀態Published - 2017 六月 16
事件2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States
持續時間: 2017 三月 52017 三月 9

出版系列

名字ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(列印)1520-6149

Other

Other2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
國家United States
城市New Orleans
期間17-03-0517-03-09

指紋

Support vector machines
Long short-term memory

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

引用此文

Huang, K. Y., Wu, C. H., Su, M. H., & Fu, H. C. (2017). Mood detection from daily conversational speech using denoising autoencoder and LSTM. 於 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings (頁 5125-5129). [7953133] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2017.7953133
Huang, Kun Yi ; Wu, Chung Hsien ; Su, Ming Hsiang ; Fu, Hsiang Chi. / Mood detection from daily conversational speech using denoising autoencoder and LSTM. 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. 頁 5125-5129 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).
@inproceedings{5bf16655f5f74777b5afef4c5a70239b,
title = "Mood detection from daily conversational speech using denoising autoencoder and LSTM",
abstract = "In current studies, an extended subjective self-report method is generally used for measuring emotions. Even though it is commonly accepted that speech emotion perceived by the listener is close to the intended emotion conveyed by the speaker, research has indicated that there still remains a mismatch between them. In addition, the individuals with different personalities generally have different emotion expressions. Based on the investigation, in this study, a support vector machine (SVM)-based emotion model is first developed to detect perceived emotion from daily conversational speech. Then, a denoising autoencoder (DAE) is used to construct an emotion conversion model to characterize the relationship between the perceived emotion and the expressed emotion of the subject for a specific personality. Finally, a long short-term memory (LSTM)-based mood model is constructed to model the temporal fluctuation of speech emotions for mood detection. Experimental results show that the proposed method achieved a detection accuracy of 64.5{\%}, improving by 5.0{\%} compared to the HMM-based method.",
author = "Huang, {Kun Yi} and Wu, {Chung Hsien} and Su, {Ming Hsiang} and Fu, {Hsiang Chi}",
year = "2017",
month = "6",
day = "16",
doi = "10.1109/ICASSP.2017.7953133",
language = "English",
series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "5125--5129",
booktitle = "2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings",
address = "United States",

}

Huang, KY, Wu, CH, Su, MH & Fu, HC 2017, Mood detection from daily conversational speech using denoising autoencoder and LSTM. 於 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings., 7953133, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Institute of Electrical and Electronics Engineers Inc., 頁 5125-5129, 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017, New Orleans, United States, 17-03-05. https://doi.org/10.1109/ICASSP.2017.7953133

Mood detection from daily conversational speech using denoising autoencoder and LSTM. / Huang, Kun Yi; Wu, Chung Hsien; Su, Ming Hsiang; Fu, Hsiang Chi.

2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 5125-5129 7953133 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

研究成果: Conference contribution

TY - GEN

T1 - Mood detection from daily conversational speech using denoising autoencoder and LSTM

AU - Huang, Kun Yi

AU - Wu, Chung Hsien

AU - Su, Ming Hsiang

AU - Fu, Hsiang Chi

PY - 2017/6/16

Y1 - 2017/6/16

N2 - In current studies, an extended subjective self-report method is generally used for measuring emotions. Even though it is commonly accepted that speech emotion perceived by the listener is close to the intended emotion conveyed by the speaker, research has indicated that there still remains a mismatch between them. In addition, the individuals with different personalities generally have different emotion expressions. Based on the investigation, in this study, a support vector machine (SVM)-based emotion model is first developed to detect perceived emotion from daily conversational speech. Then, a denoising autoencoder (DAE) is used to construct an emotion conversion model to characterize the relationship between the perceived emotion and the expressed emotion of the subject for a specific personality. Finally, a long short-term memory (LSTM)-based mood model is constructed to model the temporal fluctuation of speech emotions for mood detection. Experimental results show that the proposed method achieved a detection accuracy of 64.5%, improving by 5.0% compared to the HMM-based method.

AB - In current studies, an extended subjective self-report method is generally used for measuring emotions. Even though it is commonly accepted that speech emotion perceived by the listener is close to the intended emotion conveyed by the speaker, research has indicated that there still remains a mismatch between them. In addition, the individuals with different personalities generally have different emotion expressions. Based on the investigation, in this study, a support vector machine (SVM)-based emotion model is first developed to detect perceived emotion from daily conversational speech. Then, a denoising autoencoder (DAE) is used to construct an emotion conversion model to characterize the relationship between the perceived emotion and the expressed emotion of the subject for a specific personality. Finally, a long short-term memory (LSTM)-based mood model is constructed to model the temporal fluctuation of speech emotions for mood detection. Experimental results show that the proposed method achieved a detection accuracy of 64.5%, improving by 5.0% compared to the HMM-based method.

UR - http://www.scopus.com/inward/record.url?scp=85023740511&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023740511&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2017.7953133

DO - 10.1109/ICASSP.2017.7953133

M3 - Conference contribution

AN - SCOPUS:85023740511

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 5125

EP - 5129

BT - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Huang KY, Wu CH, Su MH, Fu HC. Mood detection from daily conversational speech using denoising autoencoder and LSTM. 於 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 5125-5129. 7953133. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2017.7953133