Mood disorder identification using deep bottleneck features of elicited speech

Kun Yi Huang, Chung-Hsien Wu, Ming Hsiang Su, Chia Hui Chou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.

Original languageEnglish
Title of host publicationProceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1648-1652
Number of pages5
Volume2018-February
ISBN (Electronic)9781538615423
DOIs
Publication statusPublished - 2018 Feb 5
Event9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 - Kuala Lumpur, Malaysia
Duration: 2017 Dec 122017 Dec 15

Other

Other9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
CountryMalaysia
CityKuala Lumpur
Period17-12-1217-12-15

Fingerprint

Scattering
Clustering algorithms
Identification (control systems)
Health
Long short-term memory

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Human-Computer Interaction
  • Information Systems
  • Signal Processing

Cite this

Huang, K. Y., Wu, C-H., Su, M. H., & Chou, C. H. (2018). Mood disorder identification using deep bottleneck features of elicited speech. In Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 (Vol. 2018-February, pp. 1648-1652). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2017.8282296
Huang, Kun Yi ; Wu, Chung-Hsien ; Su, Ming Hsiang ; Chou, Chia Hui. / Mood disorder identification using deep bottleneck features of elicited speech. Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. Vol. 2018-February Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1648-1652
@inproceedings{127d8d9892374636bb7a4eb171aab317,
title = "Mood disorder identification using deep bottleneck features of elicited speech",
abstract = "In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33{\%}, improving by 8.89{\%}, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44{\%}, outperformed that without verification.",
author = "Huang, {Kun Yi} and Chung-Hsien Wu and Su, {Ming Hsiang} and Chou, {Chia Hui}",
year = "2018",
month = "2",
day = "5",
doi = "10.1109/APSIPA.2017.8282296",
language = "English",
volume = "2018-February",
pages = "1648--1652",
booktitle = "Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Huang, KY, Wu, C-H, Su, MH & Chou, CH 2018, Mood disorder identification using deep bottleneck features of elicited speech. in Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. vol. 2018-February, Institute of Electrical and Electronics Engineers Inc., pp. 1648-1652, 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017, Kuala Lumpur, Malaysia, 17-12-12. https://doi.org/10.1109/APSIPA.2017.8282296

Mood disorder identification using deep bottleneck features of elicited speech. / Huang, Kun Yi; Wu, Chung-Hsien; Su, Ming Hsiang; Chou, Chia Hui.

Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. Vol. 2018-February Institute of Electrical and Electronics Engineers Inc., 2018. p. 1648-1652.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Mood disorder identification using deep bottleneck features of elicited speech

AU - Huang, Kun Yi

AU - Wu, Chung-Hsien

AU - Su, Ming Hsiang

AU - Chou, Chia Hui

PY - 2018/2/5

Y1 - 2018/2/5

N2 - In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.

AB - In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.

UR - http://www.scopus.com/inward/record.url?scp=85050451699&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050451699&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2017.8282296

DO - 10.1109/APSIPA.2017.8282296

M3 - Conference contribution

VL - 2018-February

SP - 1648

EP - 1652

BT - Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Huang KY, Wu C-H, Su MH, Chou CH. Mood disorder identification using deep bottleneck features of elicited speech. In Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. Vol. 2018-February. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1648-1652 https://doi.org/10.1109/APSIPA.2017.8282296