Voice Conversion using Precise Speech Alignment based on Spectral Property and Eigen-Codeword Distribution

Yi Chin Huang, Chung Hsien Wu, Chung Han Lee, Yu Ting Chao

研究成果: Paper同行評審

摘要

While voice conversion methods have been popularly applied to convert the speech signals uttered by a source speaker to a target speaker, frame-based voice conversion generally suffers from incorrect alignment using only spectral distance and therefore generate improper conversion results. In a parallel phone sequence, the alignment using minimum spectral distance between frame-based feature vectors of the source and target phone sequences is theoretical impractical, since the spectral properties of the source and target phones are inherently different. Nevertheless, if the feature vectors of the phone sequence are transformed into codewords in an eigen space, the eigen-codeword occurrence distribution curves of the source and target phone sequences are likely to be similar. By integrating the codeword occurrence distribution into distance estimation, a more precise frame alignment based on dynamic time warping can be obtained. With the precise alignment, voice conversion functions can be properly constructed. Objective and subjective evaluations were conducted and the comparison results to spectral distance-based alignment confirm the improved performance of the proposed method.

原文English
頁面62-67
頁數6
出版狀態Published - 2010
事件7th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2010 - Kyoto, Japan
持續時間: 2010 9月 222010 9月 24

Conference

Conference7th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2010
國家/地區Japan
城市Kyoto
期間10-09-2210-09-24

All Science Journal Classification (ASJC) codes

  • 語言與語言學
  • 文化學習

指紋

深入研究「Voice Conversion using Precise Speech Alignment based on Spectral Property and Eigen-Codeword Distribution」主題。共同形成了獨特的指紋。

引用此