Voice Conversion using Precise Speech Alignment based on Spectral Property and Eigen-Codeword Distribution

Yi Chin Huang, Chung Hsien Wu, Chung Han Lee, Yu Ting Chao

Research output: Contribution to conferencePaperpeer-review

Abstract

While voice conversion methods have been popularly applied to convert the speech signals uttered by a source speaker to a target speaker, frame-based voice conversion generally suffers from incorrect alignment using only spectral distance and therefore generate improper conversion results. In a parallel phone sequence, the alignment using minimum spectral distance between frame-based feature vectors of the source and target phone sequences is theoretical impractical, since the spectral properties of the source and target phones are inherently different. Nevertheless, if the feature vectors of the phone sequence are transformed into codewords in an eigen space, the eigen-codeword occurrence distribution curves of the source and target phone sequences are likely to be similar. By integrating the codeword occurrence distribution into distance estimation, a more precise frame alignment based on dynamic time warping can be obtained. With the precise alignment, voice conversion functions can be properly constructed. Objective and subjective evaluations were conducted and the comparison results to spectral distance-based alignment confirm the improved performance of the proposed method.

Original languageEnglish
Pages62-67
Number of pages6
Publication statusPublished - 2010
Event7th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2010 - Kyoto, Japan
Duration: 2010 Sep 222010 Sep 24

Conference

Conference7th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2010
Country/TerritoryJapan
CityKyoto
Period10-09-2210-09-24

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Cultural Studies

Fingerprint

Dive into the research topics of 'Voice Conversion using Precise Speech Alignment based on Spectral Property and Eigen-Codeword Distribution'. Together they form a unique fingerprint.

Cite this