This article presents a probabilistic scheme for detecting the interruption point (IP) in spontaneous speech based on inter-syllable boundary-based prosodic features. Because of the high error rate in spontaneous speech recognition, a combined acoustic model considering both syllable and subsyllable recognition units, is firstly used to determine the inter-syllable boundaries and output the recognition confidence of the input speech. Based on the finding that IPs always occur at inter-syllable boundaries, a probability distribution of the prosodic features at the current potential IP is estimated. The Conditional Random Field (CRF) model, which employs the clustered prosodic features of the current potential IP and its preceding and succeeding inter-syllable boundaries, is employed to output the IP likelihood measure. Finally, the confidence of the recognized speech, the probability distribution of the prosodic features and the CRF-based IP likelihood measure are integrated to determine the optimal IP sequence of the input spontaneous speech. In addition, pitch reset and lengthening are also applied to improve the IP detection performance. The Mandarin Conversional Dialogue Corpus is adopted for evaluation. Experimental results show that the proposed IP detection approach obtains 10.56% and 6.5% more effective results than the hidden Markov model and the Maximum Entropy model respectively under the same experimental conditions. Besides, the IP detection error rate can be further reduced by 9.15% using pitch reset and lengthening information. The experimental results confirm that the proposed model based on inter-syllable boundary-based prosodic features can effectively detect the interruption point in spontaneous Mandarin speech.
|Journal||ACM Transactions on Asian Language Information Processing|
|Publication status||Published - 2011 Mar 1|
All Science Journal Classification (ASJC) codes
- Computer Science(all)