Miscommunication handling in spoken dialog systems based on error-aware dialog state detection

研究成果: Article

2 引文 (Scopus)

摘要

With the exponential growth in computing power and progress in speech recognition technology, spoken dialog systems (SDSs) with which a user interacts through natural speech has been widely used in human-computer interaction. However, error-prone automatic speech recognition (ASR) results usually lead to inappropriate semantic interpretation so that miscommunication happens easily. This paper presents an approach to error-aware dialog state (DS) detection for robust miscommunication handling in an SDS. Non-understanding (Non-U) and misunderstanding (Mis-U) are considered for miscommunication handling in this study. First, understanding evidence (UE), derived from the recognition confidence, is adopted for Non-U detection followed by Non-U recovery. For Mis-U with the recognized sentence containing uncertain recognized words, the partial sentences obtained by removing potentially misrecognized words from the input utterance are organized, based on regular expressions, as a tree structure to tolerate the deletion or rejection of keywords resulting from misrecognition for Mis-U DS modeling. Latent semantic analysis is then employed to consider the verified words and their n-grams for DS detection, including Mis-U and predefined Base DSs. Historical information-based n-grams are employed to find the most likely DS for the SDS. Several experiments were performed with a dialog corpus for the restaurant reservation task. The experimental results show that the proposed approach achieved a promising performance for Non-U recovery and Mis-U repair as well as a satisfactory task success rate for the dialogs using the proposed method.

原文English
文章編號9
期刊Eurasip Journal on Audio, Speech, and Music Processing
2017
發行號1
DOIs
出版狀態Published - 2017 十二月 1

指紋

Speech recognition
sentences
semantics
Semantics
speech recognition
Recovery
Human computer interaction
recovery
deletion
Repair
rejection
confidence
Experiments
interactions

All Science Journal Classification (ASJC) codes

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

引用此文

@article{3554eb1781324fb38f082c40829dca64,
title = "Miscommunication handling in spoken dialog systems based on error-aware dialog state detection",
abstract = "With the exponential growth in computing power and progress in speech recognition technology, spoken dialog systems (SDSs) with which a user interacts through natural speech has been widely used in human-computer interaction. However, error-prone automatic speech recognition (ASR) results usually lead to inappropriate semantic interpretation so that miscommunication happens easily. This paper presents an approach to error-aware dialog state (DS) detection for robust miscommunication handling in an SDS. Non-understanding (Non-U) and misunderstanding (Mis-U) are considered for miscommunication handling in this study. First, understanding evidence (UE), derived from the recognition confidence, is adopted for Non-U detection followed by Non-U recovery. For Mis-U with the recognized sentence containing uncertain recognized words, the partial sentences obtained by removing potentially misrecognized words from the input utterance are organized, based on regular expressions, as a tree structure to tolerate the deletion or rejection of keywords resulting from misrecognition for Mis-U DS modeling. Latent semantic analysis is then employed to consider the verified words and their n-grams for DS detection, including Mis-U and predefined Base DSs. Historical information-based n-grams are employed to find the most likely DS for the SDS. Several experiments were performed with a dialog corpus for the restaurant reservation task. The experimental results show that the proposed approach achieved a promising performance for Non-U recovery and Mis-U repair as well as a satisfactory task success rate for the dialogs using the proposed method.",
author = "Wu, {Chung Hsien} and Su, {Ming Hsiang} and Liang, {Wei Bin}",
year = "2017",
month = "12",
day = "1",
doi = "10.1186/s13636-017-0107-3",
language = "English",
volume = "2017",
journal = "Eurasip Journal on Audio, Speech, and Music Processing",
issn = "1687-4714",
publisher = "Springer Publishing Company",
number = "1",

}

TY - JOUR

T1 - Miscommunication handling in spoken dialog systems based on error-aware dialog state detection

AU - Wu, Chung Hsien

AU - Su, Ming Hsiang

AU - Liang, Wei Bin

PY - 2017/12/1

Y1 - 2017/12/1

N2 - With the exponential growth in computing power and progress in speech recognition technology, spoken dialog systems (SDSs) with which a user interacts through natural speech has been widely used in human-computer interaction. However, error-prone automatic speech recognition (ASR) results usually lead to inappropriate semantic interpretation so that miscommunication happens easily. This paper presents an approach to error-aware dialog state (DS) detection for robust miscommunication handling in an SDS. Non-understanding (Non-U) and misunderstanding (Mis-U) are considered for miscommunication handling in this study. First, understanding evidence (UE), derived from the recognition confidence, is adopted for Non-U detection followed by Non-U recovery. For Mis-U with the recognized sentence containing uncertain recognized words, the partial sentences obtained by removing potentially misrecognized words from the input utterance are organized, based on regular expressions, as a tree structure to tolerate the deletion or rejection of keywords resulting from misrecognition for Mis-U DS modeling. Latent semantic analysis is then employed to consider the verified words and their n-grams for DS detection, including Mis-U and predefined Base DSs. Historical information-based n-grams are employed to find the most likely DS for the SDS. Several experiments were performed with a dialog corpus for the restaurant reservation task. The experimental results show that the proposed approach achieved a promising performance for Non-U recovery and Mis-U repair as well as a satisfactory task success rate for the dialogs using the proposed method.

AB - With the exponential growth in computing power and progress in speech recognition technology, spoken dialog systems (SDSs) with which a user interacts through natural speech has been widely used in human-computer interaction. However, error-prone automatic speech recognition (ASR) results usually lead to inappropriate semantic interpretation so that miscommunication happens easily. This paper presents an approach to error-aware dialog state (DS) detection for robust miscommunication handling in an SDS. Non-understanding (Non-U) and misunderstanding (Mis-U) are considered for miscommunication handling in this study. First, understanding evidence (UE), derived from the recognition confidence, is adopted for Non-U detection followed by Non-U recovery. For Mis-U with the recognized sentence containing uncertain recognized words, the partial sentences obtained by removing potentially misrecognized words from the input utterance are organized, based on regular expressions, as a tree structure to tolerate the deletion or rejection of keywords resulting from misrecognition for Mis-U DS modeling. Latent semantic analysis is then employed to consider the verified words and their n-grams for DS detection, including Mis-U and predefined Base DSs. Historical information-based n-grams are employed to find the most likely DS for the SDS. Several experiments were performed with a dialog corpus for the restaurant reservation task. The experimental results show that the proposed approach achieved a promising performance for Non-U recovery and Mis-U repair as well as a satisfactory task success rate for the dialogs using the proposed method.

UR - http://www.scopus.com/inward/record.url?scp=85019153593&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019153593&partnerID=8YFLogxK

U2 - 10.1186/s13636-017-0107-3

DO - 10.1186/s13636-017-0107-3

M3 - Article

AN - SCOPUS:85019153593

VL - 2017

JO - Eurasip Journal on Audio, Speech, and Music Processing

JF - Eurasip Journal on Audio, Speech, and Music Processing

SN - 1687-4714

IS - 1

M1 - 9

ER -