Utterance verification using prosodic information for Mandarin telephone speech keyword spotting

Yeou Jiunn Chen, Chung-Hsien Wu, Gwo Lang Yan

Research output: Contribution to journalConference article

6 Citations (Scopus)

Abstract

In this paper, the prosodic information, a very special and important feature in Mandarin speech, is used for Mandarin telephone speech utterance verification. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 59 context-independent subsyllables, i.e., 22 INITIAL's and 37 FINAL's in Mandarin speech, and one background/silence model, are used as the basic recognition units. For utterance verification, 12 anti-subsyllable HMM's, 175 context-dependent prosodic HMM's, and five anti-prosodic HMM's, are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 2400 conversational speech utterances from 20 speakers (12 males and 8 females), at 8.5% false rejection, the proposed verification method resulted in 17.8% false alarm rate. Furthermore, this method was able to correctly reject 90.4% of nonkeywords. Comparison with a baseline system without prosodic-phase verification shows that the prosodic information can benefit the verification performance.

Original languageEnglish
Pages (from-to)697-700
Number of pages4
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2
Publication statusPublished - 1999 Jan 1
EventProceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-99) - Phoenix, AZ, USA
Duration: 1999 Mar 151999 Mar 19

Fingerprint

telephones
Telephone
phonetics
Speech analysis
false alarms
rejection

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

@article{d005ba8f0ec04c67be842880760a8d8c,
title = "Utterance verification using prosodic information for Mandarin telephone speech keyword spotting",
abstract = "In this paper, the prosodic information, a very special and important feature in Mandarin speech, is used for Mandarin telephone speech utterance verification. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 59 context-independent subsyllables, i.e., 22 INITIAL's and 37 FINAL's in Mandarin speech, and one background/silence model, are used as the basic recognition units. For utterance verification, 12 anti-subsyllable HMM's, 175 context-dependent prosodic HMM's, and five anti-prosodic HMM's, are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 2400 conversational speech utterances from 20 speakers (12 males and 8 females), at 8.5{\%} false rejection, the proposed verification method resulted in 17.8{\%} false alarm rate. Furthermore, this method was able to correctly reject 90.4{\%} of nonkeywords. Comparison with a baseline system without prosodic-phase verification shows that the prosodic information can benefit the verification performance.",
author = "Chen, {Yeou Jiunn} and Chung-Hsien Wu and Yan, {Gwo Lang}",
year = "1999",
month = "1",
day = "1",
language = "English",
volume = "2",
pages = "697--700",
journal = "Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing",
issn = "0736-7791",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Utterance verification using prosodic information for Mandarin telephone speech keyword spotting

AU - Chen, Yeou Jiunn

AU - Wu, Chung-Hsien

AU - Yan, Gwo Lang

PY - 1999/1/1

Y1 - 1999/1/1

N2 - In this paper, the prosodic information, a very special and important feature in Mandarin speech, is used for Mandarin telephone speech utterance verification. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 59 context-independent subsyllables, i.e., 22 INITIAL's and 37 FINAL's in Mandarin speech, and one background/silence model, are used as the basic recognition units. For utterance verification, 12 anti-subsyllable HMM's, 175 context-dependent prosodic HMM's, and five anti-prosodic HMM's, are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 2400 conversational speech utterances from 20 speakers (12 males and 8 females), at 8.5% false rejection, the proposed verification method resulted in 17.8% false alarm rate. Furthermore, this method was able to correctly reject 90.4% of nonkeywords. Comparison with a baseline system without prosodic-phase verification shows that the prosodic information can benefit the verification performance.

AB - In this paper, the prosodic information, a very special and important feature in Mandarin speech, is used for Mandarin telephone speech utterance verification. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 59 context-independent subsyllables, i.e., 22 INITIAL's and 37 FINAL's in Mandarin speech, and one background/silence model, are used as the basic recognition units. For utterance verification, 12 anti-subsyllable HMM's, 175 context-dependent prosodic HMM's, and five anti-prosodic HMM's, are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 2400 conversational speech utterances from 20 speakers (12 males and 8 females), at 8.5% false rejection, the proposed verification method resulted in 17.8% false alarm rate. Furthermore, this method was able to correctly reject 90.4% of nonkeywords. Comparison with a baseline system without prosodic-phase verification shows that the prosodic information can benefit the verification performance.

UR - http://www.scopus.com/inward/record.url?scp=0032680846&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032680846&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:0032680846

VL - 2

SP - 697

EP - 700

JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

SN - 0736-7791

ER -