Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition

Chung Hsien Wu, Gwo Lang Yan

研究成果: Article同行評審

12 引文 斯高帕斯(Scopus)

摘要

Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses "ah," "ung," "urn," "em," and "hem" in spontaneous speech. Karhunen-Loéve transform (KLT) and linear discriminant analysis (LDA) were adopted to select discriminant features for filled pause detection. In order to suitably determine the number of discriminant features, Bartlett hypothesis testing was adopted. Twenty-six features were selected using Bartlett hypothesis testing. Gaussian mixture models (GMMs), trained with a gradient decent algorithm, were used to improve the filled pause detection performance. The experimental results show that the filled pause detection rates using KLT and LDA were 84.4% and 86.8%, respectively, A significant improvement was obtained in the filled pause detection rate using the discriminative GMM with KLT and LDA. In addition, the LDA features outperformed the KLT features in the detection of filled pauses.

原文English
頁(從 - 到)91-104
頁數14
期刊Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
36
發行號2-3
DOIs
出版狀態Published - 2004 一月 1

All Science Journal Classification (ASJC) codes

  • 訊號處理
  • 資訊系統
  • 電氣與電子工程

指紋

深入研究「Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition」主題。共同形成了獨特的指紋。

引用此