TY - JOUR
T1 - Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs
AU - Wu, Chung Hsien
AU - Chiu, Yu Hsien
AU - Shia, Chi Jiun
AU - Lin, Chun Yu
N1 - Funding Information:
Manuscript received January 10, 2003; revised September 23, 2004. This work was supported by the National Science Council of the Republic of China under Contract NSC90-2213-E-006-088. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Ramesh A. Gopinath.
PY - 2006/1
Y1 - 2006/1
N2 - This paper proposes an approach to segmenting and identifying mixed-language speech. A delta Bayesian information criterion (delta-BIC) is firstly applied to segment the input speech utterance into a sequence of language-dependent segments using acoustic features. A VQ-based bi-gram model is used to characterize the acoustic-phonetic dynamics of two consecutive codewords in a language. Accordingly the language-specific acoustic-phonetic property of sequence of phones was integrated in the identification process. A Gaussian mixture model (GMM) is used to model codeword occurrence vectors orthonormally transformed using latent semantic analysis (LSA) for each language-dependent segment. A filtering method is used to smooth the hypothesized language sequence and thus eliminate noise-like components of the detected language sequence generated by the maximum likelihood estimation. Finally, a dynamic programming method is used to determine globally the language boundaries. Experimental results show that for Mandarin, English, and Taiwanese, a recall rate of 0.87 for language boundary segmentation was obtained. Based on this recall rate, the proposed approach achieved language identification accuracies of 92.1% and 74.9% for single-language and mixed-language speech, respectively.
AB - This paper proposes an approach to segmenting and identifying mixed-language speech. A delta Bayesian information criterion (delta-BIC) is firstly applied to segment the input speech utterance into a sequence of language-dependent segments using acoustic features. A VQ-based bi-gram model is used to characterize the acoustic-phonetic dynamics of two consecutive codewords in a language. Accordingly the language-specific acoustic-phonetic property of sequence of phones was integrated in the identification process. A Gaussian mixture model (GMM) is used to model codeword occurrence vectors orthonormally transformed using latent semantic analysis (LSA) for each language-dependent segment. A filtering method is used to smooth the hypothesized language sequence and thus eliminate noise-like components of the detected language sequence generated by the maximum likelihood estimation. Finally, a dynamic programming method is used to determine globally the language boundaries. Experimental results show that for Mandarin, English, and Taiwanese, a recall rate of 0.87 for language boundary segmentation was obtained. Based on this recall rate, the proposed approach achieved language identification accuracies of 92.1% and 74.9% for single-language and mixed-language speech, respectively.
UR - http://www.scopus.com/inward/record.url?scp=33745000055&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745000055&partnerID=8YFLogxK
U2 - 10.1109/TSA.2005.852992
DO - 10.1109/TSA.2005.852992
M3 - Article
AN - SCOPUS:33745000055
SN - 1558-7916
VL - 14
SP - 266
EP - 275
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 1
ER -