In telephone speech recognition, the acoustic mismatch between training and testing environments often causes a severe degradation in the recognition performance. This paper presents a keyword-driven two-level codebook-based stochastic matching (CBSM) algorithm to eliminate the acoustic mismatch. Additionally, in Mandarin speech, it is difficult to correctly recognize the unvoiced part in a syllable. In order to reduce the recognition error of unvoiced segments, a fuzzy search algorithm is proposed to extract keyword candidates from a syllable lattice. Finally, a keyword relation and a weighting function for keyword combinations are presented for multi-keyword spotting. In the multi-keyword spotting of Mandarin speech, 94 right context-dependent and 38 context-independent subsyllables are used as the basic recognition units. A corresponding anti-subsyllable model for each subsyllable is trained and used for verification. In this system, 2583 faculty names and 39 department names are selected as the primary keywords and the secondary keywords, respectively. Using a testing set of 3088 conversational speech utterances from 33 speakers (20 male, 13 female), these techniques reduced the recognition error rate from 29.6% to 20.6% for multi-keywords embedded in non-keyword speech.
All Science Journal Classification (ASJC) codes
- Modelling and Simulation
- Language and Linguistics
- Linguistics and Language
- Computer Vision and Pattern Recognition
- Computer Science Applications