TY - GEN
T1 - Discriminative disfluency modeling for spontaneous speech recognition
AU - Wu, Chung Hsien
AU - Yan, Gwo Lang
PY - 2001
Y1 - 2001
N2 - Most automatic speech recognizers (ASRs) have concentrated on read speech, which is different from speech with the presence of disfluencies. These ASRs cannot handle the speech with a high rate of disfluencies such as filled pauses, repetition, repairs, false starts, and silence pauses in actual spontaneous speech or dialogues. In this paper, we focus on the modeling of the filled pauses "uh" and "um." The filled pauses contain the characteristics of nasal and lengthening, and the acoustic parameters for these characteristics are analyzed and adopted for disfluency modeling. A Gaussian mixture model (GMM), trained by a discriminative training algorithm that minimizes the recognition error, is proposed. A transition probability density function is defined from the GMM and used to weight the transition probability between the boundaries of fluency and disfluency models in the one-stage algorithm. Experimental result shows that the proposed method yields an improvement rate of 27.3% for disfluency compared to the baseline system.
AB - Most automatic speech recognizers (ASRs) have concentrated on read speech, which is different from speech with the presence of disfluencies. These ASRs cannot handle the speech with a high rate of disfluencies such as filled pauses, repetition, repairs, false starts, and silence pauses in actual spontaneous speech or dialogues. In this paper, we focus on the modeling of the filled pauses "uh" and "um." The filled pauses contain the characteristics of nasal and lengthening, and the acoustic parameters for these characteristics are analyzed and adopted for disfluency modeling. A Gaussian mixture model (GMM), trained by a discriminative training algorithm that minimizes the recognition error, is proposed. A transition probability density function is defined from the GMM and used to weight the transition probability between the boundaries of fluency and disfluency models in the one-stage algorithm. Experimental result shows that the proposed method yields an improvement rate of 27.3% for disfluency compared to the baseline system.
UR - http://www.scopus.com/inward/record.url?scp=85009090967&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85009090967&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85009090967
T3 - EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology
SP - 1955
EP - 1958
BT - EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology
A2 - Lindberg, Borge
A2 - Benner, Henrik
A2 - Dalsgaard, Paul
A2 - Tan, Zheng-Hua
PB - International Speech Communication Association
T2 - 7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001
Y2 - 3 September 2001 through 7 September 2001
ER -