Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion

Yi Syuan Liou, Wen Chin Huang, Ming Chi Yen, Shu Wei Tsai, Yu Huai Peng, Tomoki Toda, Yu Tsao, Hsin Min Wang

研究成果: Conference contribution

1 引文 斯高帕斯(Scopus)

摘要

Voice conversion (VC) is an effective approach to electrolaryngeal (EL) speech enhancement, a task that aims to improve the quality of the artificial voice from an electrolarynx device. In frame-based VC methods, time alignment needs to be performed prior to model training, and the dynamic time warping (DTW) algorithm is widely adopted to compute the best time alignment between each utterance pair. The validity is based on the assumption that the same phonemes of the speakers have similar features and can be mapped by measuring a pre-defined distance between speech frames of the source and the target. However, the special characteristics of the EL speech can break the assumption, resulting in a sub-optimal DTW alignment. In this work, we propose to use lip images for time alignment, as we assume that the lip movements of laryngectomee remain normal compared to healthy people. We investigate two naive lip representations and distance metrics, and experimental results demonstrate that the proposed method can significantly outperform the audio-only alignment in terms of objective and subjective evaluations.

原文English
主出版物標題2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面1234-1238
頁數5
ISBN(電子)9789881476890
出版狀態Published - 2021
事件2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, Japan
持續時間: 2021 12月 142021 12月 17

出版系列

名字2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

Conference

Conference2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
國家/地區Japan
城市Tokyo
期間21-12-1421-12-17

All Science Journal Classification (ASJC) codes

  • 人工智慧
  • 電腦視覺和模式識別
  • 訊號處理
  • 儀器

指紋

深入研究「Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion」主題。共同形成了獨特的指紋。

引用此