Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks

Ming Hsiang Su, Chung Hsien Wu, Kun Yi Huang, Qian Bei Hong, Hsin Min Wang

研究成果: Conference contribution

2 引文 (Scopus)

摘要

This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97% was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.

原文English
主出版物標題Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
發行者Institute of Electrical and Electronics Engineers Inc.
頁面1532-1536
頁數5
2018-February
ISBN(電子)9781538615423
DOIs
出版狀態Published - 2018 二月 5
事件9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 - Kuala Lumpur, Malaysia
持續時間: 2017 十二月 122017 十二月 15

Other

Other9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
國家Malaysia
城市Kuala Lumpur
期間17-12-1217-12-15

指紋

Multiresolution analysis
Neural networks
Acoustics
Wavelet transforms
Experiments

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Human-Computer Interaction
  • Information Systems
  • Signal Processing

引用此文

Su, M. H., Wu, C. H., Huang, K. Y., Hong, Q. B., & Wang, H. M. (2018). Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks. 於 Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 (卷 2018-February, 頁 1532-1536). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2017.8282287
Su, Ming Hsiang ; Wu, Chung Hsien ; Huang, Kun Yi ; Hong, Qian Bei ; Wang, Hsin Min. / Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks. Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. 卷 2018-February Institute of Electrical and Electronics Engineers Inc., 2018. 頁 1532-1536
@inproceedings{ad1393716e10459094d8bf07ccfdc2e8,
title = "Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks",
abstract = "This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97{\%} was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.",
author = "Su, {Ming Hsiang} and Wu, {Chung Hsien} and Huang, {Kun Yi} and Hong, {Qian Bei} and Wang, {Hsin Min}",
year = "2018",
month = "2",
day = "5",
doi = "10.1109/APSIPA.2017.8282287",
language = "English",
volume = "2018-February",
pages = "1532--1536",
booktitle = "Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Su, MH, Wu, CH, Huang, KY, Hong, QB & Wang, HM 2018, Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks. 於 Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. 卷 2018-February, Institute of Electrical and Electronics Engineers Inc., 頁 1532-1536, 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017, Kuala Lumpur, Malaysia, 17-12-12. https://doi.org/10.1109/APSIPA.2017.8282287

Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks. / Su, Ming Hsiang; Wu, Chung Hsien; Huang, Kun Yi; Hong, Qian Bei; Wang, Hsin Min.

Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. 卷 2018-February Institute of Electrical and Electronics Engineers Inc., 2018. p. 1532-1536.

研究成果: Conference contribution

TY - GEN

T1 - Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks

AU - Su, Ming Hsiang

AU - Wu, Chung Hsien

AU - Huang, Kun Yi

AU - Hong, Qian Bei

AU - Wang, Hsin Min

PY - 2018/2/5

Y1 - 2018/2/5

N2 - This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97% was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.

AB - This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97% was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.

UR - http://www.scopus.com/inward/record.url?scp=85050405827&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050405827&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2017.8282287

DO - 10.1109/APSIPA.2017.8282287

M3 - Conference contribution

AN - SCOPUS:85050405827

VL - 2018-February

SP - 1532

EP - 1536

BT - Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Su MH, Wu CH, Huang KY, Hong QB, Wang HM. Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks. 於 Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. 卷 2018-February. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1532-1536 https://doi.org/10.1109/APSIPA.2017.8282287