Textual Data Augmentation for Code-Switching Speech Recognition with Under-Resourced Language

  • 王 竣煌

Student thesis: Doctoral Thesis

Abstract

Automatic speech recognition is one of the hot topics in speech-related research More and more cases of applying it to daily life are more frequent than in the past In order to make the speech recognizer have accurate recognition ability both of the methods of constructing the speech recognizer and the training corpora are very important However not all languages are widely used like Mandarin and English Therefore in many cases it is hard to collect a large amount of speech corpus to train the speech recognizer In this condition it is necessary to adjust the method and choose the method which requires less training data In this thesis Taiwanese is chosen as the role of under-resourced language and its speech and text corpus are relatively lacking In addition if we further consider the practical application of Taiwanese the code-switching between Taiwanese and Mandarin will occur quite frequently and this is one of the goals that this thesis must deal with Fortunately both Taiwanese and Mandarin have similarities in pronunciation and grammar To this end this thesis proposes a method of sharing phones using the Mandarin speech to train the shared acoustic models On the other hand regarding the lack of text corpus this thesis translated Mandarin corpus into Taiwanese corpus through word-by-word and manually designed rules It is expected that the text corpus could be augmented by this method Moreover additional translation rules for code-switching are established and the results are used for the training of language models In terms of experiments this thesis adopted a lexicon containing shared phones and used Taiwanese and Mandarin speech to jointly train the acoustic models The performance of word error rate was 26 02% which was better than that trained by the pure Taiwanese corpus used as the baseline In addition this thesis used the code-switching text corpus to train the language model and combined it with the acoustic model of the shared phone The performance of the word error rate was 29 05% and the experimental results showed that the speech recognizer had the ability to recognize the code-switching vocabulary
Date of Award2019
Original languageEnglish
SupervisorChung-Hsien Wu (Supervisor)

Cite this

'