Due to globalization multilingual communication is becoming more and more popular With the increase of multilingual speech data and the need of wide applications the development of multilingual automatic speech recognition (ASR) systems has become more and more important For an ASR system accents produced by non-native speakers will dramatically deteriorate the recognition performance Moreover Code-switching a phenomenon of language change during conversation could be easily found in multilingual communities It also degrades the recognition accuracy of ASR seriously Thus the design and development of a code-switching speech database for ASR training becomes highly desirable This dissertation presents the procedure for the design and development of a Chinese-English code-switching speech database In order to conquer the recognition problems caused by code-switching code-switching event detection can be used to improve the recognition accuracy of ASR This dissertation presents a new paradigm for code-switching event detection based on latent language space models (LLSMs) and delta-BIC A LLSM is proposed to characterize a language by modeling the spatial relationships of the senones/articulatory features in the eigenspace using the PCA-transformed features The language likelihood between the input speech LLSM and each of the language-dependent LLSMs is estimated basd on Euclidian distance-based and cosine angle distance-based similarities This dissertation also proposes a data-driven approach to phone set construction for code-switching ASR Acoustic and context-dependent cross-lingual articulatory features (AFs) are incorporated into the estimation of the distance between triphone units for constructing a Chinese-English phone set The AFs extracted using a deep neural network are used for code-switching articulation modeling to alleviate the data sparseness problem The triphones are finally clustered to obtain a Chinese-English phone set Multilingual speech recognition is confronted with the accent-related problems caused by non-native speech The acoustic properties in accented speech are quite divergent The dissertation generates the highly Mandarin-accented English models for the speakers whose mother tongue is Mandarin A verification method is proposed to extract the highly accented speech segments automatically Gaussian components of the highly accented speech models are then generated from the corresponding Gaussian components of the native speech models using a linear transformation function and decision tree to deal with the data sparseness problem A discrimination function is further applied to verify the generated accented acoustic models Furthermore English is the most common language used by multiligual speakers The dissertation creates a global pronunciation map of World Englishes Successful clustering of accented English can be benifitial to speech recognition since people can choose suitable accented model to recognize This dissertation investigates invariant pronunciation structure analysis and Support Vector Regression to predict the inter-speaker pronunciation distances for clustering These methods were implemented and the experimental results show that the proposed approaches achieved improvements in multilungual code-switching and accented speech recognition
Date of Award | 2014 Jun 16 |
---|
Original language | English |
---|
Supervisor | Chung-Hsien Wu (Supervisor) |
---|
A Study on Multilingual and Accented Speech Recognition
涵平, 沈. (Author). 2014 Jun 16
Student thesis: Doctoral Thesis