Sequential speaker embedding and transfer learning for text-independent speaker identification

Qian Bei Hong, Chung Hsien Wu, Ming Hsiang Su, Hsin Min Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this study, an approach to speaker identification is proposed based on a convolutional neural network (CNN)-based model considering sequential speaker embedding and transfer learning. First, a CNN-based universal background model (UBM) is constructed and a transfer learning mechanism is applied to obtain speaker embedding using a small amount of enrollment data. Second, considering the temporal variation of acoustic features in an utterance of a speaker, this study generates sequential speaker embedding to capture temporal characteristics of speech features of a speaker. Experiments were conducted on the King-ASR series database for UBM training, and the LibriSpeech corpus was adopted for evaluation. The experimental results showed that the proposed method using sequential speaker embedding and transfer learning achieved an equal error rate (EER) of 6.89% outperforming the method based on x-vector and PLDA method (8.25%). Furthermore, we considered the effect of speaker number for speaker identification. When the number of enrolled speakers was from 50 to 1172, the identification accuracy of the proposed method was degraded from 82.99% to 73.26%, which outperformed the identification accuracy of the method using x-vector and PLDA which was dramatically degraded from 83.17% to 60.95%.

Original languageEnglish
Title of host publication2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages827-832
Number of pages6
ISBN (Electronic)9781728132488
DOIs
Publication statusPublished - 2019 Nov
Event2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 - Lanzhou, China
Duration: 2019 Nov 182019 Nov 21

Publication series

Name2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Conference

Conference2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
CountryChina
CityLanzhou
Period19-11-1819-11-21

All Science Journal Classification (ASJC) codes

  • Information Systems

Fingerprint Dive into the research topics of 'Sequential speaker embedding and transfer learning for text-independent speaker identification'. Together they form a unique fingerprint.

Cite this