Speaker clustering using decision tree-based phone cluster models with multi-space probability distributions

Han Ping Shen, Jui Feng Yeh, Chung-Hsien Wu

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.

Original languageEnglish
Article number5613154
Pages (from-to)1289-1300
Number of pages12
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume19
Issue number5
DOIs
Publication statusPublished - 2011 May 25

Fingerprint

Decision trees
Probability distributions
phonetics
Speech analysis
Merging
Linear regression
Clustering algorithms
Maximum likelihood
regression analysis
Acoustics
acoustics

All Science Journal Classification (ASJC) codes

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Cite this

@article{8037ea942b7e4b23a4120fca2a1c2e90,
title = "Speaker clustering using decision tree-based phone cluster models with multi-space probability distributions",
abstract = "This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.",
author = "Shen, {Han Ping} and Yeh, {Jui Feng} and Chung-Hsien Wu",
year = "2011",
month = "5",
day = "25",
doi = "10.1109/TASL.2010.2090144",
language = "English",
volume = "19",
pages = "1289--1300",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "5",

}

Speaker clustering using decision tree-based phone cluster models with multi-space probability distributions. / Shen, Han Ping; Yeh, Jui Feng; Wu, Chung-Hsien.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 19, No. 5, 5613154, 25.05.2011, p. 1289-1300.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Speaker clustering using decision tree-based phone cluster models with multi-space probability distributions

AU - Shen, Han Ping

AU - Yeh, Jui Feng

AU - Wu, Chung-Hsien

PY - 2011/5/25

Y1 - 2011/5/25

N2 - This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.

AB - This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.

UR - http://www.scopus.com/inward/record.url?scp=79956265528&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79956265528&partnerID=8YFLogxK

U2 - 10.1109/TASL.2010.2090144

DO - 10.1109/TASL.2010.2090144

M3 - Article

AN - SCOPUS:79956265528

VL - 19

SP - 1289

EP - 1300

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 5

M1 - 5613154

ER -