TY - GEN
T1 - Alternative hypothesis generation using a weighted kernel feature matrix for ASR substitution error correction
AU - Liu, Chao Hong
AU - Wu, Chung Hsien
AU - Sarwono, David
N1 - Funding Information:
The Authors gratefully acknowledge Dr. Claudio Pini and the staff of the AN.FO.RA. farm (Fontanellato, PR) for technical assistance in diet formulation and care of animals.
PY - 2012
Y1 - 2012
N2 - Although automatic speech recognition (ASR) has been successfully used in several applications, it is still non-robust and imprecise especially in a harsh environment wherein the input speech is of low quality. Robust error correction for ASR outputs thus becomes important in addition to improving recognition performance. In recent approaches to error correction, linguistic or domain information is used to generate the alternative hypotheses for the ASR outputs followed by the selection of the most likely alternative. In this study, the distances between ASR outputs and the potentially correct alternatives are estimated based on a weighted context-dependent syllable cluster-based kernel feature matrix followed by multidimensional scaling (MDS)-based distance rescaling. These distances are then used to construct an alternative syllable lattice and the dynamic programming is used to obtain the most likely correct output with respect to the original ASR results. Experiments show that the proposed method achieved about 1.95% improvement on the word error rate compared to the correction pair approach using the MATBN Mandarin Chinese broadcast news corpus.
AB - Although automatic speech recognition (ASR) has been successfully used in several applications, it is still non-robust and imprecise especially in a harsh environment wherein the input speech is of low quality. Robust error correction for ASR outputs thus becomes important in addition to improving recognition performance. In recent approaches to error correction, linguistic or domain information is used to generate the alternative hypotheses for the ASR outputs followed by the selection of the most likely alternative. In this study, the distances between ASR outputs and the potentially correct alternatives are estimated based on a weighted context-dependent syllable cluster-based kernel feature matrix followed by multidimensional scaling (MDS)-based distance rescaling. These distances are then used to construct an alternative syllable lattice and the dynamic programming is used to obtain the most likely correct output with respect to the original ASR results. Experiments show that the proposed method achieved about 1.95% improvement on the word error rate compared to the correction pair approach using the MATBN Mandarin Chinese broadcast news corpus.
UR - http://www.scopus.com/inward/record.url?scp=84874471205&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874471205&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP.2012.6423475
DO - 10.1109/ISCSLP.2012.6423475
M3 - Conference contribution
AN - SCOPUS:84874471205
SN - 9781467325059
T3 - 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012
SP - 1
EP - 5
BT - 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012
T2 - 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012
Y2 - 5 December 2012 through 8 December 2012
ER -