TY - GEN
T1 - Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract)
AU - Wu, Chung Hsien
AU - Liang, Wei Bin
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/2
Y1 - 2015/12/2
N2 - This work presents an approach to emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information (AP) and semantic labels (SLs). For AP-based recognition, acoustic and prosodic features are extracted from the detected emotional salient segments of the input speech. Three types of models GMMs, SVMs, and MLPs are adopted as the base-level classifiers. A Meta Decision Tree (MDT) is then employed for classifier fusion to obtain the AP-based emotion recognition confidence. For SL-based recognition, semantic labels are used to automatically extract Emotion Association Rules (EARs) from the recognized word sequence of the affective speech. The maximum entropy model (MaxEnt) is thereafter utilized to characterize the relationship between emotional states and EARs for emotion recognition. Finally, a weighted product fusion method is used to integrate the AP-based and SL-based recognition results for final emotion decision. For evaluation, 2,033 utterances for four emotional states were collected. The experimental results reveal that the emotion recognition performance for AP-based recognition using MDT achieved 80.00%. On the other hand, an average recognition accuracy of 80.92% was obtained for SL-based recognition. Finally, combining AP information and SLs achieved 83.55% accuracy for emotion recognition.
AB - This work presents an approach to emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information (AP) and semantic labels (SLs). For AP-based recognition, acoustic and prosodic features are extracted from the detected emotional salient segments of the input speech. Three types of models GMMs, SVMs, and MLPs are adopted as the base-level classifiers. A Meta Decision Tree (MDT) is then employed for classifier fusion to obtain the AP-based emotion recognition confidence. For SL-based recognition, semantic labels are used to automatically extract Emotion Association Rules (EARs) from the recognized word sequence of the affective speech. The maximum entropy model (MaxEnt) is thereafter utilized to characterize the relationship between emotional states and EARs for emotion recognition. Finally, a weighted product fusion method is used to integrate the AP-based and SL-based recognition results for final emotion decision. For evaluation, 2,033 utterances for four emotional states were collected. The experimental results reveal that the emotion recognition performance for AP-based recognition using MDT achieved 80.00%. On the other hand, an average recognition accuracy of 80.92% was obtained for SL-based recognition. Finally, combining AP information and SLs achieved 83.55% accuracy for emotion recognition.
UR - http://www.scopus.com/inward/record.url?scp=84964033985&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964033985&partnerID=8YFLogxK
U2 - 10.1109/ACII.2015.7344613
DO - 10.1109/ACII.2015.7344613
M3 - Conference contribution
AN - SCOPUS:84964033985
T3 - 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015
SP - 477
EP - 483
BT - 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015
Y2 - 21 September 2015 through 24 September 2015
ER -