TY - JOUR
T1 - Keyword spotting using context dependent PLU-based segmental Bayesian networks
AU - Wu, Chung-Hsien
AU - Chen, Jau Hung
PY - 1996/7/1
Y1 - 1996/7/1
N2 - In this paper, a continuous Mandarin speech keyword spotting system based on context-dependent phonelike units (PLUs) is presented. In this vocabulary-independent system, users can define their own keywords and most frequently occurring non-keywords without retraining the system. A set of 176 monosyllables and 483 balanced words or sentences are used to establish the context-dependent PLUs, i.e., initials or finals in Mandarin speech. Each PLU is represented by a proposed segmental Bayesian network (SBN) model. In the training process, a modified K-means algorithm is proposed to reduce the training time. The most frequently occurring non-keywords are divided into keyword predecessors and successors. Each type of keyword predecessor and successor is modeled by 6 initial part SBNs and 2 final part SBNs as the garbage models. For extraneous speech, 15 initial part SBNs and 5 final part SBNs are established as the extraneous speech garbage models. In the recognition process, a final part preprocessor is used to screen out unreasonable hypotheses in order to reduce the recognition time. Using a test set of 525 conversational speech utterances from 15 speakers (10 males and 5 females), word spotting rates of 97.4% on isolated keywords, and 92.0% when the vocabulary word was embedded in unconstrained extraneous speech, were obtained for a user-defined 20 keyword vocabulary.
AB - In this paper, a continuous Mandarin speech keyword spotting system based on context-dependent phonelike units (PLUs) is presented. In this vocabulary-independent system, users can define their own keywords and most frequently occurring non-keywords without retraining the system. A set of 176 monosyllables and 483 balanced words or sentences are used to establish the context-dependent PLUs, i.e., initials or finals in Mandarin speech. Each PLU is represented by a proposed segmental Bayesian network (SBN) model. In the training process, a modified K-means algorithm is proposed to reduce the training time. The most frequently occurring non-keywords are divided into keyword predecessors and successors. Each type of keyword predecessor and successor is modeled by 6 initial part SBNs and 2 final part SBNs as the garbage models. For extraneous speech, 15 initial part SBNs and 5 final part SBNs are established as the extraneous speech garbage models. In the recognition process, a final part preprocessor is used to screen out unreasonable hypotheses in order to reduce the recognition time. Using a test set of 525 conversational speech utterances from 15 speakers (10 males and 5 females), word spotting rates of 97.4% on isolated keywords, and 92.0% when the vocabulary word was embedded in unconstrained extraneous speech, were obtained for a user-defined 20 keyword vocabulary.
UR - http://www.scopus.com/inward/record.url?scp=0030191145&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0030191145&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:0030191145
VL - 20
SP - 465
EP - 473
JO - Proceedings of the National Science Council, Republic of China, Part A: Physical Science and Engineering
JF - Proceedings of the National Science Council, Republic of China, Part A: Physical Science and Engineering
SN - 0255-6588
IS - 4
ER -