TY - JOUR
T1 - Development of a Taiwanese Speech Synthesis System Using Hidden Markov Models and a Robust Tonal Phoneme Corpus
AU - Sher, Yung Ji
AU - Hsu, Ming Chun
AU - Chiu, Yu-Hsien
AU - Chen, Yeou Jiunn
AU - Wu, Chung Hsien
AU - Wu, Jiunn Liang
N1 - Publisher Copyright:
© 2024 Institute of Information Science. All rights reserved.
PY - 2024/3
Y1 - 2024/3
N2 - The number of young native speakers of Taiwanese, the variant of Southern Min spoken in Taiwan, has decreased. Technological advancements such as text-to-speech (TTS) systems could help arrest this decline. The aim of this study was to design a robust tonal phoneme corpus and a speech synthesis system for Modern Literal Taiwanese (MLT). MLT subsyllables were analyzed using phonetics and phonology to establish tonal phoneme models. These robust tonal phoneme models and hidden Markov models were used to construct an MLT TTS synthesis system. Algorithm-based training resulted in 869 balanced sentences containing 12,544 syllables, with each sentence containing an average of 14.4 syllables. In total, 218 sentences, which included rare phonemes, were manually drafted to supplement the corpus. The synthesized phonemes were deemed to have high intelligibility and could be included in the developed TTS system. According to the HTK speech recognition tool, the overall phoneme recognition rate was 96.47%. Testers, who were native Taiwanese speakers, assigned the synthesized sentences a mean opinion score of 4, indicating that they sounded natural. This developed system and the results described herein can inspire future developments in speech technology and computational linguistics.
AB - The number of young native speakers of Taiwanese, the variant of Southern Min spoken in Taiwan, has decreased. Technological advancements such as text-to-speech (TTS) systems could help arrest this decline. The aim of this study was to design a robust tonal phoneme corpus and a speech synthesis system for Modern Literal Taiwanese (MLT). MLT subsyllables were analyzed using phonetics and phonology to establish tonal phoneme models. These robust tonal phoneme models and hidden Markov models were used to construct an MLT TTS synthesis system. Algorithm-based training resulted in 869 balanced sentences containing 12,544 syllables, with each sentence containing an average of 14.4 syllables. In total, 218 sentences, which included rare phonemes, were manually drafted to supplement the corpus. The synthesized phonemes were deemed to have high intelligibility and could be included in the developed TTS system. According to the HTK speech recognition tool, the overall phoneme recognition rate was 96.47%. Testers, who were native Taiwanese speakers, assigned the synthesized sentences a mean opinion score of 4, indicating that they sounded natural. This developed system and the results described herein can inspire future developments in speech technology and computational linguistics.
UR - https://www.scopus.com/pages/publications/85188026801
UR - https://www.scopus.com/pages/publications/85188026801#tab=citedBy
U2 - 10.6688/JISE.202403_40(2).0006
DO - 10.6688/JISE.202403_40(2).0006
M3 - Article
AN - SCOPUS:85188026801
SN - 1016-2364
VL - 40
SP - 283
EP - 302
JO - Journal of Information Science and Engineering
JF - Journal of Information Science and Engineering
IS - 2
ER -