Formosa Speech Recognition Challenge 2020 and Taiwanese across Taiwan Corpus

Yuan Fu Liao, Chia Yu Chang, Hak Khiam Tiun, Huang Lan Su, Hui Lu Khoo, Jane S. Tsay, Le Kun Tan, Peter Kang, Tsun Guan Thiann, Un Gian Iunn, Jyh Her Yang, Chih Neng Liang

研究成果: Conference contribution

1 引文 斯高帕斯(Scopus)

摘要

Taiwanese (a.k.a. Taiwanese Hokkien, Hoklo, Taigi, Southern Min or Min-Nan) is an endangered language, because the domination of Mandarin, the number of Taiwanese speakers continues to drop, especially among the youth generations. In addressing this problem, a Taiwanese speech-enabled human-computer interface for supporting people's daily life is essential. Therefore, a Formosa Speech in the Wild (FSW) project was established to collect a large-scale Taiwanese speech across Taiwan (TAT) corpus to boost the development of Taiwanese speech recognition (TSR). A Formosa Speech Recognition Challenge 2020 (FSR-2020) was also hosted to promote the corpus as well as to evaluate the performance of state-of-the-art TSR systems. This paper briefly introduces TAT corpus and FSR-2020 challenge, presents the provided data profile, evaluation plan and reports experimental baseline results. A subset of TAT corpus, TAT-Vol1, is given away for free for all participants (non-commercial license), and its corresponding Kaldi baseline recipes have been published online. Experimental results have showed that the combination of TAT corpus and the baseline recipes is a good resource pack for TSR research and development.

原文English
主出版物標題Proceedings of 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020
發行者Institute of Electrical and Electronics Engineers Inc.
頁面65-70
頁數6
ISBN(電子)9781728198965
DOIs
出版狀態Published - 2020 11月 5
事件23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020 - Virtual, Yangon, Myanmar
持續時間: 2020 11月 52020 11月 7

出版系列

名字Proceedings of 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020

Conference

Conference23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020
國家/地區Myanmar
城市Virtual, Yangon
期間20-11-0520-11-07

All Science Journal Classification (ASJC) codes

  • 電腦科學應用
  • 資訊系統
  • 資訊系統與管理
  • 語言和語言學

指紋

深入研究「Formosa Speech Recognition Challenge 2020 and Taiwanese across Taiwan Corpus」主題。共同形成了獨特的指紋。

引用此