Sequential speaker embedding and transfer learning for text-independent speaker identification

Qian Bei Hong, Chung Hsien Wu, Ming Hsiang Su, Hsin Min Wang

研究成果: Conference contribution

1 引文 斯高帕斯(Scopus)

摘要

In this study, an approach to speaker identification is proposed based on a convolutional neural network (CNN)-based model considering sequential speaker embedding and transfer learning. First, a CNN-based universal background model (UBM) is constructed and a transfer learning mechanism is applied to obtain speaker embedding using a small amount of enrollment data. Second, considering the temporal variation of acoustic features in an utterance of a speaker, this study generates sequential speaker embedding to capture temporal characteristics of speech features of a speaker. Experiments were conducted on the King-ASR series database for UBM training, and the LibriSpeech corpus was adopted for evaluation. The experimental results showed that the proposed method using sequential speaker embedding and transfer learning achieved an equal error rate (EER) of 6.89% outperforming the method based on x-vector and PLDA method (8.25%). Furthermore, we considered the effect of speaker number for speaker identification. When the number of enrolled speakers was from 50 to 1172, the identification accuracy of the proposed method was degraded from 82.99% to 73.26%, which outperformed the identification accuracy of the method using x-vector and PLDA which was dramatically degraded from 83.17% to 60.95%.

原文English
主出版物標題2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
發行者Institute of Electrical and Electronics Engineers Inc.
頁面827-832
頁數6
ISBN(電子)9781728132488
DOIs
出版狀態Published - 2019 11月
事件2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 - Lanzhou, China
持續時間: 2019 11月 182019 11月 21

出版系列

名字2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Conference

Conference2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
國家/地區China
城市Lanzhou
期間19-11-1819-11-21

All Science Journal Classification (ASJC) codes

  • 資訊系統

指紋

深入研究「Sequential speaker embedding and transfer learning for text-independent speaker identification」主題。共同形成了獨特的指紋。

引用此