TY - JOUR
T1 - Exploiting psychological factors for interaction style recognition in spoken conversation
AU - Wei, Wen Li
AU - Wu, Chung Hsien
AU - Lin, Jen Chun
AU - Li, Han
PY - 2014/3
Y1 - 2014/3
N2 - Determining how a speaker is engaged in a conversation is crucial for achieving harmonious interaction between computers and humans. In this study, a fusion approach was developed based on psychological factors to recognize Interaction Style (IS) in spoken conversation, which plays a key role in creating natural dialogue agents. The proposed Fused Cross-Correlation Model (FCCM) provides a unified probabilistic framework to model the relationships among the psychological factors of emotion, personality trait (PT), transient IS, and IS history, for recognizing IS. An emotional arousal-dependent speech recognizer was used to obtain the recognized spoken text for extracting linguistic features to estimate transient likelihood and recognize PT. A temporal course modeling approach and an emotional sub-state language model, based on the temporal phases of an emotional expression, were employed to obtain a better emotion recognition result. The experimental results indicate that the proposed FCCM yields satisfactory results in recognition and also demonstrate that combining psychological factors effectively improves IS recognition accuracy.
AB - Determining how a speaker is engaged in a conversation is crucial for achieving harmonious interaction between computers and humans. In this study, a fusion approach was developed based on psychological factors to recognize Interaction Style (IS) in spoken conversation, which plays a key role in creating natural dialogue agents. The proposed Fused Cross-Correlation Model (FCCM) provides a unified probabilistic framework to model the relationships among the psychological factors of emotion, personality trait (PT), transient IS, and IS history, for recognizing IS. An emotional arousal-dependent speech recognizer was used to obtain the recognized spoken text for extracting linguistic features to estimate transient likelihood and recognize PT. A temporal course modeling approach and an emotional sub-state language model, based on the temporal phases of an emotional expression, were employed to obtain a better emotion recognition result. The experimental results indicate that the proposed FCCM yields satisfactory results in recognition and also demonstrate that combining psychological factors effectively improves IS recognition accuracy.
UR - http://www.scopus.com/inward/record.url?scp=84898069567&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84898069567&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2014.2300339
DO - 10.1109/TASLP.2014.2300339
M3 - Article
AN - SCOPUS:84898069567
VL - 22
SP - 659
EP - 671
JO - IEEE Transactions on Speech and Audio Processing
JF - IEEE Transactions on Speech and Audio Processing
SN - 1558-7916
IS - 3
ER -