Multi-keyword spotting of telephone speech using a fuzzy search algorithm and keyword-driven two-level CBSM

Chung Hsien Wu, Yeou Jiunn Chen

Research output: Contribution to journalArticlepeer-review

14 Citations (Scopus)

Abstract

In telephone speech recognition, the acoustic mismatch between training and testing environments often causes a severe degradation in the recognition performance. This paper presents a keyword-driven two-level codebook-based stochastic matching (CBSM) algorithm to eliminate the acoustic mismatch. Additionally, in Mandarin speech, it is difficult to correctly recognize the unvoiced part in a syllable. In order to reduce the recognition error of unvoiced segments, a fuzzy search algorithm is proposed to extract keyword candidates from a syllable lattice. Finally, a keyword relation and a weighting function for keyword combinations are presented for multi-keyword spotting. In the multi-keyword spotting of Mandarin speech, 94 right context-dependent and 38 context-independent subsyllables are used as the basic recognition units. A corresponding anti-subsyllable model for each subsyllable is trained and used for verification. In this system, 2583 faculty names and 39 department names are selected as the primary keywords and the secondary keywords, respectively. Using a testing set of 3088 conversational speech utterances from 33 speakers (20 male, 13 female), these techniques reduced the recognition error rate from 29.6% to 20.6% for multi-keywords embedded in non-keyword speech.

Original languageEnglish
Pages (from-to)197-212
Number of pages16
JournalSpeech Communication
Volume33
Issue number3
DOIs
Publication statusPublished - 2001 Feb

All Science Journal Classification (ASJC) codes

  • Software
  • Modelling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Multi-keyword spotting of telephone speech using a fuzzy search algorithm and keyword-driven two-level CBSM'. Together they form a unique fingerprint.

Cite this