TY - JOUR
T1 - Speaker clustering using decision tree-based phone cluster models with multi-space probability distributions
AU - Shen, Han Ping
AU - Yeh, Jui Feng
AU - Wu, Chung Hsien
N1 - Funding Information:
Manuscript received July 22, 2009. revised December 22, 2009; accepted October 11, 2010. Date of publication October 28, 2010; date of current version May 13, 2011. This work was supported by the National Science Council, Taiwan, under Contract NSC 95-2221-E-006-181-MY3. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Haizhou Li.
PY - 2011
Y1 - 2011
N2 - This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.
AB - This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.
UR - http://www.scopus.com/inward/record.url?scp=79956265528&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79956265528&partnerID=8YFLogxK
U2 - 10.1109/TASL.2010.2090144
DO - 10.1109/TASL.2010.2090144
M3 - Article
AN - SCOPUS:79956265528
SN - 1558-7916
VL - 19
SP - 1289
EP - 1300
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 5
M1 - 5613154
ER -