Interruption point detection of spontaneous speech using inter-syllable boundary-based prosodic features

Chung-Hsien Wu, Wei Bin Liang, Jui Feng Yeh

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

This article presents a probabilistic scheme for detecting the interruption point (IP) in spontaneous speech based on inter-syllable boundary-based prosodic features. Because of the high error rate in spontaneous speech recognition, a combined acoustic model considering both syllable and subsyllable recognition units, is firstly used to determine the inter-syllable boundaries and output the recognition confidence of the input speech. Based on the finding that IPs always occur at inter-syllable boundaries, a probability distribution of the prosodic features at the current potential IP is estimated. The Conditional Random Field (CRF) model, which employs the clustered prosodic features of the current potential IP and its preceding and succeeding inter-syllable boundaries, is employed to output the IP likelihood measure. Finally, the confidence of the recognized speech, the probability distribution of the prosodic features and the CRF-based IP likelihood measure are integrated to determine the optimal IP sequence of the input spontaneous speech. In addition, pitch reset and lengthening are also applied to improve the IP detection performance. The Mandarin Conversional Dialogue Corpus is adopted for evaluation. Experimental results show that the proposed IP detection approach obtains 10.56% and 6.5% more effective results than the hidden Markov model and the Maximum Entropy model respectively under the same experimental conditions. Besides, the IP detection error rate can be further reduced by 9.15% using pitch reset and lengthening information. The experimental results confirm that the proposed model based on inter-syllable boundary-based prosodic features can effectively detect the interruption point in spontaneous Mandarin speech.

Original languageEnglish
Article number6
JournalACM Transactions on Asian Language Information Processing
Volume10
Issue number1
DOIs
Publication statusPublished - 2011 Mar 1

Fingerprint

Probability distributions
Error detection
Hidden Markov models
Speech recognition
Entropy
Acoustics

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

@article{c13ba4ce75614154bf68d7427acc1dee,
title = "Interruption point detection of spontaneous speech using inter-syllable boundary-based prosodic features",
abstract = "This article presents a probabilistic scheme for detecting the interruption point (IP) in spontaneous speech based on inter-syllable boundary-based prosodic features. Because of the high error rate in spontaneous speech recognition, a combined acoustic model considering both syllable and subsyllable recognition units, is firstly used to determine the inter-syllable boundaries and output the recognition confidence of the input speech. Based on the finding that IPs always occur at inter-syllable boundaries, a probability distribution of the prosodic features at the current potential IP is estimated. The Conditional Random Field (CRF) model, which employs the clustered prosodic features of the current potential IP and its preceding and succeeding inter-syllable boundaries, is employed to output the IP likelihood measure. Finally, the confidence of the recognized speech, the probability distribution of the prosodic features and the CRF-based IP likelihood measure are integrated to determine the optimal IP sequence of the input spontaneous speech. In addition, pitch reset and lengthening are also applied to improve the IP detection performance. The Mandarin Conversional Dialogue Corpus is adopted for evaluation. Experimental results show that the proposed IP detection approach obtains 10.56{\%} and 6.5{\%} more effective results than the hidden Markov model and the Maximum Entropy model respectively under the same experimental conditions. Besides, the IP detection error rate can be further reduced by 9.15{\%} using pitch reset and lengthening information. The experimental results confirm that the proposed model based on inter-syllable boundary-based prosodic features can effectively detect the interruption point in spontaneous Mandarin speech.",
author = "Chung-Hsien Wu and Liang, {Wei Bin} and Yeh, {Jui Feng}",
year = "2011",
month = "3",
day = "1",
doi = "10.1145/1929908.1929914",
language = "English",
volume = "10",
journal = "ACM Transactions on Asian Language Information Processing",
issn = "1530-0226",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

Interruption point detection of spontaneous speech using inter-syllable boundary-based prosodic features. / Wu, Chung-Hsien; Liang, Wei Bin; Yeh, Jui Feng.

In: ACM Transactions on Asian Language Information Processing, Vol. 10, No. 1, 6, 01.03.2011.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Interruption point detection of spontaneous speech using inter-syllable boundary-based prosodic features

AU - Wu, Chung-Hsien

AU - Liang, Wei Bin

AU - Yeh, Jui Feng

PY - 2011/3/1

Y1 - 2011/3/1

N2 - This article presents a probabilistic scheme for detecting the interruption point (IP) in spontaneous speech based on inter-syllable boundary-based prosodic features. Because of the high error rate in spontaneous speech recognition, a combined acoustic model considering both syllable and subsyllable recognition units, is firstly used to determine the inter-syllable boundaries and output the recognition confidence of the input speech. Based on the finding that IPs always occur at inter-syllable boundaries, a probability distribution of the prosodic features at the current potential IP is estimated. The Conditional Random Field (CRF) model, which employs the clustered prosodic features of the current potential IP and its preceding and succeeding inter-syllable boundaries, is employed to output the IP likelihood measure. Finally, the confidence of the recognized speech, the probability distribution of the prosodic features and the CRF-based IP likelihood measure are integrated to determine the optimal IP sequence of the input spontaneous speech. In addition, pitch reset and lengthening are also applied to improve the IP detection performance. The Mandarin Conversional Dialogue Corpus is adopted for evaluation. Experimental results show that the proposed IP detection approach obtains 10.56% and 6.5% more effective results than the hidden Markov model and the Maximum Entropy model respectively under the same experimental conditions. Besides, the IP detection error rate can be further reduced by 9.15% using pitch reset and lengthening information. The experimental results confirm that the proposed model based on inter-syllable boundary-based prosodic features can effectively detect the interruption point in spontaneous Mandarin speech.

AB - This article presents a probabilistic scheme for detecting the interruption point (IP) in spontaneous speech based on inter-syllable boundary-based prosodic features. Because of the high error rate in spontaneous speech recognition, a combined acoustic model considering both syllable and subsyllable recognition units, is firstly used to determine the inter-syllable boundaries and output the recognition confidence of the input speech. Based on the finding that IPs always occur at inter-syllable boundaries, a probability distribution of the prosodic features at the current potential IP is estimated. The Conditional Random Field (CRF) model, which employs the clustered prosodic features of the current potential IP and its preceding and succeeding inter-syllable boundaries, is employed to output the IP likelihood measure. Finally, the confidence of the recognized speech, the probability distribution of the prosodic features and the CRF-based IP likelihood measure are integrated to determine the optimal IP sequence of the input spontaneous speech. In addition, pitch reset and lengthening are also applied to improve the IP detection performance. The Mandarin Conversional Dialogue Corpus is adopted for evaluation. Experimental results show that the proposed IP detection approach obtains 10.56% and 6.5% more effective results than the hidden Markov model and the Maximum Entropy model respectively under the same experimental conditions. Besides, the IP detection error rate can be further reduced by 9.15% using pitch reset and lengthening information. The experimental results confirm that the proposed model based on inter-syllable boundary-based prosodic features can effectively detect the interruption point in spontaneous Mandarin speech.

UR - http://www.scopus.com/inward/record.url?scp=79953253065&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79953253065&partnerID=8YFLogxK

U2 - 10.1145/1929908.1929914

DO - 10.1145/1929908.1929914

M3 - Article

VL - 10

JO - ACM Transactions on Asian Language Information Processing

JF - ACM Transactions on Asian Language Information Processing

SN - 1530-0226

IS - 1

M1 - 6

ER -