Attention-based convolutional neural network and long Short-term memory for Short-term detection of mood disorders based on elicited speech responses

Kun Yi Huang, Chung-Hsien Wu, Ming Hsiang Su

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Mood disorders, including unipolar depression (UD) and bipolar disorder (BD), have become some of the commonest mental health disorders. The absence of diagnostic markers of BD can cause misdiagnosis of the disorder as UD on initial presentation. Short-term detection, which could be used in early detection and intervention, is desirable. This study proposed an approach for short-term detection of mood disorders based on elicited speech responses. Speech responses of participants were obtained through interviews by a clinician after participants viewed six emotion-eliciting videos. A domain adaptation method based on a hierarchical spectral clustering algorithm was proposed to adapt a labeled emotion database into a collected unlabeled mood database for alleviating the data bias problem in an emotion space. For modeling the local variation of emotions in each response, a convolutional neural network (CNN) with an attention mechanism was used to generate an emotion profile (EP) of each elicited speech response. Finally, long short-term memory (LSTM) was employed to characterize the temporal evolution of EPs of all six speech responses. Moreover, an attention model was applied to the LSTM network for highlighting pertinent speech responses to improve detection performance instead of treating all responses equally. For evaluation, this study elicited emotional speech data from 15 people with BD, 15 people with UD, and 15 healthy controls. Leave-one-group-out cross-validation was employed for the compiled database and proposed method. CNN- and LSTM-based attention models improved the mood disorder detection accuracy of the proposed method by approximately 11%. Furthermore, the proposed method achieved an overall detection accuracy of 75.56%, outperforming support-vector-machine- (62.22%) and CNN-based (66.67%) methods.

Original languageEnglish
Pages (from-to)668-678
Number of pages11
JournalPattern Recognition
Volume88
DOIs
Publication statusPublished - 2019 Apr 1

Fingerprint

Neural networks
Clustering algorithms
Support vector machines
Long short-term memory
Health

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

@article{867a1249f06a4264b5623558e99c4bb3,
title = "Attention-based convolutional neural network and long Short-term memory for Short-term detection of mood disorders based on elicited speech responses",
abstract = "Mood disorders, including unipolar depression (UD) and bipolar disorder (BD), have become some of the commonest mental health disorders. The absence of diagnostic markers of BD can cause misdiagnosis of the disorder as UD on initial presentation. Short-term detection, which could be used in early detection and intervention, is desirable. This study proposed an approach for short-term detection of mood disorders based on elicited speech responses. Speech responses of participants were obtained through interviews by a clinician after participants viewed six emotion-eliciting videos. A domain adaptation method based on a hierarchical spectral clustering algorithm was proposed to adapt a labeled emotion database into a collected unlabeled mood database for alleviating the data bias problem in an emotion space. For modeling the local variation of emotions in each response, a convolutional neural network (CNN) with an attention mechanism was used to generate an emotion profile (EP) of each elicited speech response. Finally, long short-term memory (LSTM) was employed to characterize the temporal evolution of EPs of all six speech responses. Moreover, an attention model was applied to the LSTM network for highlighting pertinent speech responses to improve detection performance instead of treating all responses equally. For evaluation, this study elicited emotional speech data from 15 people with BD, 15 people with UD, and 15 healthy controls. Leave-one-group-out cross-validation was employed for the compiled database and proposed method. CNN- and LSTM-based attention models improved the mood disorder detection accuracy of the proposed method by approximately 11{\%}. Furthermore, the proposed method achieved an overall detection accuracy of 75.56{\%}, outperforming support-vector-machine- (62.22{\%}) and CNN-based (66.67{\%}) methods.",
author = "Huang, {Kun Yi} and Chung-Hsien Wu and Su, {Ming Hsiang}",
year = "2019",
month = "4",
day = "1",
doi = "10.1016/j.patcog.2018.12.016",
language = "English",
volume = "88",
pages = "668--678",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier Limited",

}

Attention-based convolutional neural network and long Short-term memory for Short-term detection of mood disorders based on elicited speech responses. / Huang, Kun Yi; Wu, Chung-Hsien; Su, Ming Hsiang.

In: Pattern Recognition, Vol. 88, 01.04.2019, p. 668-678.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Attention-based convolutional neural network and long Short-term memory for Short-term detection of mood disorders based on elicited speech responses

AU - Huang, Kun Yi

AU - Wu, Chung-Hsien

AU - Su, Ming Hsiang

PY - 2019/4/1

Y1 - 2019/4/1

N2 - Mood disorders, including unipolar depression (UD) and bipolar disorder (BD), have become some of the commonest mental health disorders. The absence of diagnostic markers of BD can cause misdiagnosis of the disorder as UD on initial presentation. Short-term detection, which could be used in early detection and intervention, is desirable. This study proposed an approach for short-term detection of mood disorders based on elicited speech responses. Speech responses of participants were obtained through interviews by a clinician after participants viewed six emotion-eliciting videos. A domain adaptation method based on a hierarchical spectral clustering algorithm was proposed to adapt a labeled emotion database into a collected unlabeled mood database for alleviating the data bias problem in an emotion space. For modeling the local variation of emotions in each response, a convolutional neural network (CNN) with an attention mechanism was used to generate an emotion profile (EP) of each elicited speech response. Finally, long short-term memory (LSTM) was employed to characterize the temporal evolution of EPs of all six speech responses. Moreover, an attention model was applied to the LSTM network for highlighting pertinent speech responses to improve detection performance instead of treating all responses equally. For evaluation, this study elicited emotional speech data from 15 people with BD, 15 people with UD, and 15 healthy controls. Leave-one-group-out cross-validation was employed for the compiled database and proposed method. CNN- and LSTM-based attention models improved the mood disorder detection accuracy of the proposed method by approximately 11%. Furthermore, the proposed method achieved an overall detection accuracy of 75.56%, outperforming support-vector-machine- (62.22%) and CNN-based (66.67%) methods.

AB - Mood disorders, including unipolar depression (UD) and bipolar disorder (BD), have become some of the commonest mental health disorders. The absence of diagnostic markers of BD can cause misdiagnosis of the disorder as UD on initial presentation. Short-term detection, which could be used in early detection and intervention, is desirable. This study proposed an approach for short-term detection of mood disorders based on elicited speech responses. Speech responses of participants were obtained through interviews by a clinician after participants viewed six emotion-eliciting videos. A domain adaptation method based on a hierarchical spectral clustering algorithm was proposed to adapt a labeled emotion database into a collected unlabeled mood database for alleviating the data bias problem in an emotion space. For modeling the local variation of emotions in each response, a convolutional neural network (CNN) with an attention mechanism was used to generate an emotion profile (EP) of each elicited speech response. Finally, long short-term memory (LSTM) was employed to characterize the temporal evolution of EPs of all six speech responses. Moreover, an attention model was applied to the LSTM network for highlighting pertinent speech responses to improve detection performance instead of treating all responses equally. For evaluation, this study elicited emotional speech data from 15 people with BD, 15 people with UD, and 15 healthy controls. Leave-one-group-out cross-validation was employed for the compiled database and proposed method. CNN- and LSTM-based attention models improved the mood disorder detection accuracy of the proposed method by approximately 11%. Furthermore, the proposed method achieved an overall detection accuracy of 75.56%, outperforming support-vector-machine- (62.22%) and CNN-based (66.67%) methods.

UR - http://www.scopus.com/inward/record.url?scp=85058813191&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058813191&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2018.12.016

DO - 10.1016/j.patcog.2018.12.016

M3 - Article

VL - 88

SP - 668

EP - 678

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

ER -