This chapter introduces the current data fusion strategies among audiovisual signals for bimodal emotion recognition. Face detection, in the chapter, is performed based on the adaboost cascade face detector and can be used to provide initial facial position and reduce the time for error convergence in feature extraction. In the chapter, active appearance model (AAM) is employed to extract the 68 labeled facial feature points (FPs) from 5 facial regions including eyebrow, eye, nose, mouth, and facial contours for later facial animation parameters (FAPs) calculation. Three kinds of primary prosodic features are adopted, including pitch, energy, and formants F1-F5 in each speech frame for emotion recognition. Finally, a semi-coupled hidden Markov model (SC-HMM) is proposed for emotion recognition based on state-based alignment strategy for audiovisual bimodal features.
All Science Journal Classification (ASJC) codes
- 工程 (全部)