With the increasing role of speech interfaces in human-computer interac¬tion applications, automatically recognizing emotions from human speech becomes more and more important. This chapter begins by introducing the correlations be¬tween basic speech features such as pitch, intensity, formants, MFCC, and so on, and the emotions. Several recognition methods are then described to illustrate the performance of the previously proposed models, including support vector machine (SVM), K-nearest neighbors (KNN), neural networks, and the like. To give a more practical description of an emotion recognition procedure, a new approach to emotion recognition is provided as a case study. In this case study, the Intonation Groups (IGs) of the input speech signals are first defined and extracted for feature extraction. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to char¬acterize the feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Min¬imum Classification Error (MCE) algorithm. The IG-based feature vectors compen¬sated by the compensation vectors are used to train the Gaussian Mixture Models (GMMs) for each emotional state. The emotional state with the GMM having the maximal likelihood ratio is determined as the emotion state output.
All Science Journal Classification (ASJC) codes
- Computer Science(all)