Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks

Ming Hsiang Su, Chung-Hsien Wu, Kun Yi Huang, Qian Bei Hong, Hsin Min Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97% was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.

Original languageEnglish
Title of host publicationProceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1532-1536
Number of pages5
Volume2018-February
ISBN (Electronic)9781538615423
DOIs
Publication statusPublished - 2018 Feb 5
Event9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 - Kuala Lumpur, Malaysia
Duration: 2017 Dec 122017 Dec 15

Other

Other9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
CountryMalaysia
CityKuala Lumpur
Period17-12-1217-12-15

Fingerprint

Multiresolution analysis
Neural networks
Acoustics
Wavelet transforms
Experiments

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Human-Computer Interaction
  • Information Systems
  • Signal Processing

Cite this

Su, M. H., Wu, C-H., Huang, K. Y., Hong, Q. B., & Wang, H. M. (2018). Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks. In Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 (Vol. 2018-February, pp. 1532-1536). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2017.8282287
Su, Ming Hsiang ; Wu, Chung-Hsien ; Huang, Kun Yi ; Hong, Qian Bei ; Wang, Hsin Min. / Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks. Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. Vol. 2018-February Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1532-1536
@inproceedings{ad1393716e10459094d8bf07ccfdc2e8,
title = "Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks",
abstract = "This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97{\%} was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.",
author = "Su, {Ming Hsiang} and Chung-Hsien Wu and Huang, {Kun Yi} and Hong, {Qian Bei} and Wang, {Hsin Min}",
year = "2018",
month = "2",
day = "5",
doi = "10.1109/APSIPA.2017.8282287",
language = "English",
volume = "2018-February",
pages = "1532--1536",
booktitle = "Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Su, MH, Wu, C-H, Huang, KY, Hong, QB & Wang, HM 2018, Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks. in Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. vol. 2018-February, Institute of Electrical and Electronics Engineers Inc., pp. 1532-1536, 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017, Kuala Lumpur, Malaysia, 17-12-12. https://doi.org/10.1109/APSIPA.2017.8282287

Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks. / Su, Ming Hsiang; Wu, Chung-Hsien; Huang, Kun Yi; Hong, Qian Bei; Wang, Hsin Min.

Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. Vol. 2018-February Institute of Electrical and Electronics Engineers Inc., 2018. p. 1532-1536.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks

AU - Su, Ming Hsiang

AU - Wu, Chung-Hsien

AU - Huang, Kun Yi

AU - Hong, Qian Bei

AU - Wang, Hsin Min

PY - 2018/2/5

Y1 - 2018/2/5

N2 - This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97% was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.

AB - This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97% was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.

UR - http://www.scopus.com/inward/record.url?scp=85050405827&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050405827&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2017.8282287

DO - 10.1109/APSIPA.2017.8282287

M3 - Conference contribution

VL - 2018-February

SP - 1532

EP - 1536

BT - Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Su MH, Wu C-H, Huang KY, Hong QB, Wang HM. Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks. In Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. Vol. 2018-February. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1532-1536 https://doi.org/10.1109/APSIPA.2017.8282287