Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features

Hsin Hao Chen, Yung Lun Chien, Ming Chi Yen, Shu Wei Tsai, Yu Tsao, Tai Shih Chi, Hsin Min Wang

Research output: Contribution to journalConference articlepeer-review

Abstract

Patients who have had their entire larynx removed, including the vocal folds, owing to throat cancer may experience difficulties in speaking. In such cases, electrolarynx devices are often prescribed to produce speech, which is commonly referred to as electrolaryngeal speech (EL speech). However, the quality and intelligibility of EL speech are poor. To address this problem, EL voice conversion (ELVC) is a method used to improve the intelligibility and quality of EL speech. In this paper, we propose a novel ELVC system that incorporates cross-domain features, specifically spectral features and self-supervised learning (SSL) embeddings. The experimental results show that applying cross-domain features can notably improve the conversion performance for the ELVC task compared with utilizing only traditional spectral features.

Original languageEnglish
Pages (from-to)5018-5022
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
Publication statusPublished - 2023
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 2023 Aug 202023 Aug 24

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features'. Together they form a unique fingerprint.

Cite this