A complete emotional expression typically contains a complex temporal course in a natural conversation. Related research on utterance-level and segment-level processing lacks understanding of the underlying structure of emotional speech. In this study, a hierarchical affective structure of an emotional utterance characterized by the probabilistic context free grammars (PCFGs) is proposed for emotion modeling. SVM-based emotion profiles are obtained and employed to segment the utterance into emotionally consistent segments. Vector quantization is applied to convert the emotion profile of each segment into codewords. A binary tree in which each node represents a codeword is constructed to characterize the affective structure of the utterance modeled by PCFG. Given an input utterance, the output emotion is determined according to the PCFG-based emotion model with the highest likelihood of the speech segments along with the score of the affective structure. For evaluation, the EMO-DB database and its expansion in utterance length were conducted. Experimental results show that the proposed method achieved emotion recognition accuracy of 87.22% for long utterances and outperformed the SVM-based method.