Combining Deep Embeddings of Acoustic and Articulatory Features for Speaker Identification

Qian Bei Hong, Chung Hsien Wu, Hsin Min Wang, Chien Lin Huang

研究成果: Conference contribution

摘要

In this study, deep embedding of acoustic and articulatory features are combined for speaker identification. First, a convolutional neural network (CNN)-based universal background model (UBM) is constructed to generate acoustic feature (AC) embedding. In addition, as the articulatory features (AFs) represent some important phonological properties during speech production, a multilayer perceptron (MLP)-based AF embedding extraction model is also constructed for AF embedding extraction. The extracted AC and AF embeddings are concatenated as a combined feature vector for speaker identification using a fully-connected neural network. This proposed system was evaluated by three corpora consisting of King-ASR, LibriSpeech and SITW, and the experiments were conducted according to the properties of the datasets. We adopted all three corpora to evaluate the effect of AF embedding, and the results showed that combining AF embedding into the input feature vector improved the performance of speaker identification. The LibriSpeech corpus was used to evaluate the effect of the number of enrolled speakers. The proposed system achieved an EER of 7.80% outperforming the method based on x-vector with PLDA (8.25%). And we further evaluated the effect of signal mismatch using the SITW corpus. The proposed system achieved an EER of 25.19%, which outperformed the other baseline methods.

原文English
主出版物標題2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面7589-7593
頁數5
ISBN(電子)9781509066315
DOIs
出版狀態Published - 2020 五月
事件2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
持續時間: 2020 五月 42020 五月 8

出版系列

名字ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2020-May
ISSN(列印)1520-6149

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
國家Spain
城市Barcelona
期間20-05-0420-05-08

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

指紋 深入研究「Combining Deep Embeddings of Acoustic and Articulatory Features for Speaker Identification」主題。共同形成了獨特的指紋。

引用此