Statistics Pooling Time Delay Neural Network Based on X-Vector for Speaker Verification

Qian Bei Hong, Chung Hsien Wu, Hsin Min Wang, Chien Lin Huang

研究成果: Conference contribution

10 引文 斯高帕斯(Scopus)

摘要

This paper aims to improve speaker embedding representation based on x-vector for extracting more detailed information for speaker verification. We propose a statistics pooling time delay neural network (TDNN), in which the TDNN structure integrates statistics pooling for each layer, to consider the variation of temporal context in frame-level transformation. The proposed feature vector, named as statsvector, are compared with the baseline x-vector features on the VoxCeleb dataset and the Speakers in the Wild (SITW) dataset for speaker verification. The experimental results showed that the proposed stats-vector with score fusion achieved the best performance on VoxCeleb1 dataset. Furthermore, considering the interference from other speakers in the recordings, we found that the proposed statsvector efficiently reduced the interference and improved the speaker verification performance on the SITW dataset.

原文English
主出版物標題2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面6849-6853
頁數5
ISBN(電子)9781509066315
DOIs
出版狀態Published - 2020 5月
事件2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
持續時間: 2020 5月 42020 5月 8

出版系列

名字ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2020-May
ISSN(列印)1520-6149

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
國家/地區Spain
城市Barcelona
期間20-05-0420-05-08

All Science Journal Classification (ASJC) codes

  • 軟體
  • 訊號處理
  • 電氣與電子工程

指紋

深入研究「Statistics Pooling Time Delay Neural Network Based on X-Vector for Speaker Verification」主題。共同形成了獨特的指紋。

引用此