Mandarin speech is known for its tonal characteristic, and prosodie information plays an important role in Mandarin speech recognition. Driven by this property, phonetic and prosodie information are integrated and used for Mandarin telephone speech keyword spotting. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 132 subsyllable models, two general acoustic filler models and one background/silence model are separately trained and used as the basic recognition units. For utterance verification, 12 antisubsyllable models, 175 context-dependent prosodie models and five anti-prosodic models are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 3088 conversational speech utterances from 33 speakers (20 males and 13 females) and a vocabulary of 2583 faculty names, at 8.5% false rejection, the proposed verification method results in an 18.3% false alarm rate. Furthermore, this method is able correctly to reject 90.9% of non-keywords. Comparison with a baseline system without prosodic-phase verification shows that prosodie information can benefit the verification performance.
|頁（從 - 到）||55-61|
|期刊||IEE Proceedings: Vision, Image and Signal Processing|
|出版狀態||Published - 2000|
All Science Journal Classification (ASJC) codes