Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling

Ming Chi Yen, Wen Chin Huang, Kazuhiro Kobayashi, Yu Huai Peng, Shu Wei Tsai, Yu Tsao, Tomoki Toda, Jyh Shing Roger Jang, Hsin Min Wang

研究成果: Conference contribution

8 引文 斯高帕斯(Scopus)

摘要

The electrolaryngeal speech (EL speech) is typically spoken with an electrolarynx device that generates excitation signals to substitute human vocal fold vibrations. Because the excitation signals cannot perfectly characterize sound sources generated by vocal folds, the naturalness and intelligibility of the EL speech are inevitably worse than that of the natural speech (NL speech). To improve speech naturalness, statistical models, such as Gaussian mixture models and deep-learning-based models, have been employed for EL speech voice conversion (ELVC). The ELVC task aims to convert EL speech into NL speech through an ELVC model. To implement a frame-wise ELVC system, accurate feature alignment is crucial for model training. However, the abnormal acoustic characteristics of the EL speech cause misalignments and accordingly limit the ELVC performance. To address this issue, we propose a novel ELVC system based on sequence-to-sequence (seq2seq) modeling with text-to-speech (TTS) pretraining. The seq2seq model involves an attention mechanism to concurrently perform representation learning and alignment. Meanwhile, TTS pretraining provides efficient training with limited data. Experimental results show that the proposed ELVC system yields notable improvements in terms of standardized evaluation metrics and subjective listening tests over a well-known frame-wise ELVC system.

原文English
主出版物標題2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面650-657
頁數8
ISBN(電子)9781665437394
DOIs
出版狀態Published - 2021
事件2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Cartagena, Colombia
持續時間: 2021 12月 132021 12月 17

出版系列

名字2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings

Conference

Conference2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021
國家/地區Colombia
城市Cartagena
期間21-12-1321-12-17

All Science Journal Classification (ASJC) codes

  • 電腦視覺和模式識別
  • 訊號處理
  • 語言和語言學

指紋

深入研究「Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling」主題。共同形成了獨特的指紋。

引用此