Bayesian duration modeling and learning for speech recognition

Jen Tzung Chien, Chih-Hsien Huang

Research output: Contribution to journalConference article

3 Citations (Scopus)

Abstract

We present the Bayesiari duration modeling and learning for speech recognition under nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson and gamma distributions are investigated to characterize duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced for twofold advantages. One is to determine the optimal quasi-Bayes (QB) duration parameter, which can be merged in HMM's for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill parameter estimation. In the experiments, the proposed Bayesian approaches significantly improve the speech recognition performance of Mandarin broadcast news. The batch and sequential learning are investigated for MAP and QB duration models, respectively.

Original languageEnglish
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
Publication statusPublished - 2004 Sep 28
EventProceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing - Montreal, Que, Canada
Duration: 2004 May 172004 May 21

Fingerprint

Speech recognition
Acoustic noise
Parameter estimation
Statistics
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

@article{fceffe90974e417b907c8548f36d87e2,
title = "Bayesian duration modeling and learning for speech recognition",
abstract = "We present the Bayesiari duration modeling and learning for speech recognition under nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson and gamma distributions are investigated to characterize duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced for twofold advantages. One is to determine the optimal quasi-Bayes (QB) duration parameter, which can be merged in HMM's for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill parameter estimation. In the experiments, the proposed Bayesian approaches significantly improve the speech recognition performance of Mandarin broadcast news. The batch and sequential learning are investigated for MAP and QB duration models, respectively.",
author = "Chien, {Jen Tzung} and Chih-Hsien Huang",
year = "2004",
month = "9",
day = "28",
language = "English",
volume = "1",
journal = "Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing",
issn = "0736-7791",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Bayesian duration modeling and learning for speech recognition

AU - Chien, Jen Tzung

AU - Huang, Chih-Hsien

PY - 2004/9/28

Y1 - 2004/9/28

N2 - We present the Bayesiari duration modeling and learning for speech recognition under nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson and gamma distributions are investigated to characterize duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced for twofold advantages. One is to determine the optimal quasi-Bayes (QB) duration parameter, which can be merged in HMM's for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill parameter estimation. In the experiments, the proposed Bayesian approaches significantly improve the speech recognition performance of Mandarin broadcast news. The batch and sequential learning are investigated for MAP and QB duration models, respectively.

AB - We present the Bayesiari duration modeling and learning for speech recognition under nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson and gamma distributions are investigated to characterize duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced for twofold advantages. One is to determine the optimal quasi-Bayes (QB) duration parameter, which can be merged in HMM's for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill parameter estimation. In the experiments, the proposed Bayesian approaches significantly improve the speech recognition performance of Mandarin broadcast news. The batch and sequential learning are investigated for MAP and QB duration models, respectively.

UR - http://www.scopus.com/inward/record.url?scp=4544343001&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544343001&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:4544343001

VL - 1

JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

SN - 0736-7791

ER -