Using Prosodic Phrase-Based VQVAE on Audio ALBERT for Speech Emotion Recognition

Jia Hao Hsu, Chung Hsien Wu, Tsung Hsien Yang

研究成果: Conference contribution

摘要

Speech emotion recognition has been an important field in the research of human-computer interaction. Understanding the user's emotions from speech help the system to grasp the user's underlying information, such as user satisfaction with the service. This research attempts to detect the emotion of the user's speech recorded by the customer service dialogue systems for telecommunication applications. This study proposes the prosodic phrase-based Vector Quantized Variational AutoEncoder (VQVAE) as the feature extraction module in the pre-trained model, Audio ALBERT (AALBERT). Two steps are added before fine-tuning the pre-trained AALBERT model, including prosodic phrase segmentation and prosodic phrase-based VQVAE model. The speech segments are extracted using the prosodic phrase segmentation algorithm, in which each segment is supposed to contain only a single emotion. The VQVAE model is trained to obtain quantized important prosodic phrase vectors. In the experiment, the speech corpus collected by the telecom customer service system was used for evaluation, and the ablation study shows that the method proposed can effectively improve the performance of the pretrained model, and the accuracy reached 91.41%. It can be seen that feature extraction using prosodic segmentation and prosodic phrase quantization has a certain potential in the field of speech emotion recognition.

原文English
主出版物標題Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
發行者Institute of Electrical and Electronics Engineers Inc.
頁面415-419
頁數5
ISBN(電子)9786165904773
DOIs
出版狀態Published - 2022
事件2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022 - Chiang Mai, Thailand
持續時間: 2022 11月 72022 11月 10

出版系列

名字Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022

Conference

Conference2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
國家/地區Thailand
城市Chiang Mai
期間22-11-0722-11-10

All Science Journal Classification (ASJC) codes

  • 電腦網路與通信
  • 資訊系統
  • 訊號處理

指紋

深入研究「Using Prosodic Phrase-Based VQVAE on Audio ALBERT for Speech Emotion Recognition」主題。共同形成了獨特的指紋。

引用此