Discriminative disfluency modeling for spontaneous speech recognition

Chung Hsien Wu, Gwo Lang Yan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Most automatic speech recognizers (ASRs) have concentrated on read speech, which is different from speech with the presence of disfluencies. These ASRs cannot handle the speech with a high rate of disfluencies such as filled pauses, repetition, repairs, false starts, and silence pauses in actual spontaneous speech or dialogues. In this paper, we focus on the modeling of the filled pauses "uh" and "um." The filled pauses contain the characteristics of nasal and lengthening, and the acoustic parameters for these characteristics are analyzed and adopted for disfluency modeling. A Gaussian mixture model (GMM), trained by a discriminative training algorithm that minimizes the recognition error, is proposed. A transition probability density function is defined from the GMM and used to weight the transition probability between the boundaries of fluency and disfluency models in the one-stage algorithm. Experimental result shows that the proposed method yields an improvement rate of 27.3% for disfluency compared to the baseline system.

Original languageEnglish
Title of host publicationEUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology
EditorsBorge Lindberg, Henrik Benner, Paul Dalsgaard, Zheng-Hua Tan
PublisherInternational Speech Communication Association
Pages1955-1958
Number of pages4
ISBN (Electronic)8790834100, 9788790834104
Publication statusPublished - 2001 Jan 1
Event7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001 - Aalborg, Denmark
Duration: 2001 Sep 32001 Sep 7

Publication series

NameEUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology

Other

Other7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001
CountryDenmark
CityAalborg
Period01-09-0301-09-07

All Science Journal Classification (ASJC) codes

  • Communication
  • Linguistics and Language
  • Computer Science Applications
  • Software

Fingerprint Dive into the research topics of 'Discriminative disfluency modeling for spontaneous speech recognition'. Together they form a unique fingerprint.

Cite this