TY - JOUR
T1 - Response selection and automatic message-response expansion in retrieval-based QA systems using semantic dependency pair model
AU - Su, Ming Hsiang
AU - Wu, Chung Hsien
AU - Huang, Kun Yi
AU - Lin, Wu Hsuan
N1 - Funding Information:
This work was supported in part by the Ministry of Science and Technology, Taiwan, under Grant No. MOST 104-2221-E-006-051-MY3. Authors’ addresses: M.-H. Su, C.-H. Wu (corresponding author), K.-Y. Huang, and W.-H. Lin, Department of Computer Science and Information Engineering, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan (R.O.C); emails: {huntfox.su, chunghsienwu, iamkyh77, dds45612}@gmail.com. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2018 ACM 2375-4699/2018/11-ART3 $15.00 https://doi.org/10.1145/3229184
Funding Information:
This work was supported in part by the Ministry of Science and Technology, Taiwan, under Grant No. MOST 104-2221-E-006-051-MY3.
Publisher Copyright:
© 2018 ACM
PY - 2018/11
Y1 - 2018/11
N2 - This article presents an approach to response selection and message-response (MR) database expansion from the unstructured data on the psychological consultation websites for a retrieval-based question answering (QA) system in a constrained domain for emotional support and comforting. First, we manually construct an initial MR database based on the articles collected from the psychological consultation websites. The Chinese Knowledge and Information Processing probabilistic context-free grammar is adopted to obtain the semantic dependency graphs (SDGs) of all the messages and responses in the initial MR database. For each sentence in the MR database, all the semantic dependencies, each composed of two words and their semantic relation, are extracted from the SDG of the sentence to form a semantic dependency set. Finally, a matrix with the element representing the correlation between the semantic dependencies of the messages and their corresponding responses is constructed as a semantic dependency pair model (SDPM) for response selection. Moreover, as the number of MR pairs in the psychological consultation websites is increasing day by day, the MR database in the QA system should be expanded to meet the needs of the users. For MR database expansion, the unstructured data from the message board are automatically collected. For the collected data, the supervised latent Dirichlet allocation is adopted for event detection and then the event-based delta Bayesian Information Criterion is used for message and response article segmentation. Each extracted message segment is then fed to the constructed retrieval-based QA system to find the best matched response segment and the matching score is also estimated to verify if the new MR pair is suitable to be included in the expanded MR database. Fivefold cross validation was employed to evaluate the performance of the proposed retrieval-based QA system over the expanded MR database based on SDPM. Compared to the vector space model-based method, the Okapi BM25 model, and the deep learning-based sequence-to-sequence with attention model, the proposed approach achieved a more favorable performance according to a statistical significance test. The retrieval accuracy based on MR expansion was also evaluated and a satisfactory result was obtained confirming the effectiveness of the expanded MR database. In addition, the user's satisfaction score of the proposed system was evaluated using the Cronbach's alpha value and the satisfaction score of the proposed SDPM was higher than those of the methods for comparison.
AB - This article presents an approach to response selection and message-response (MR) database expansion from the unstructured data on the psychological consultation websites for a retrieval-based question answering (QA) system in a constrained domain for emotional support and comforting. First, we manually construct an initial MR database based on the articles collected from the psychological consultation websites. The Chinese Knowledge and Information Processing probabilistic context-free grammar is adopted to obtain the semantic dependency graphs (SDGs) of all the messages and responses in the initial MR database. For each sentence in the MR database, all the semantic dependencies, each composed of two words and their semantic relation, are extracted from the SDG of the sentence to form a semantic dependency set. Finally, a matrix with the element representing the correlation between the semantic dependencies of the messages and their corresponding responses is constructed as a semantic dependency pair model (SDPM) for response selection. Moreover, as the number of MR pairs in the psychological consultation websites is increasing day by day, the MR database in the QA system should be expanded to meet the needs of the users. For MR database expansion, the unstructured data from the message board are automatically collected. For the collected data, the supervised latent Dirichlet allocation is adopted for event detection and then the event-based delta Bayesian Information Criterion is used for message and response article segmentation. Each extracted message segment is then fed to the constructed retrieval-based QA system to find the best matched response segment and the matching score is also estimated to verify if the new MR pair is suitable to be included in the expanded MR database. Fivefold cross validation was employed to evaluate the performance of the proposed retrieval-based QA system over the expanded MR database based on SDPM. Compared to the vector space model-based method, the Okapi BM25 model, and the deep learning-based sequence-to-sequence with attention model, the proposed approach achieved a more favorable performance according to a statistical significance test. The retrieval accuracy based on MR expansion was also evaluated and a satisfactory result was obtained confirming the effectiveness of the expanded MR database. In addition, the user's satisfaction score of the proposed system was evaluated using the Cronbach's alpha value and the satisfaction score of the proposed SDPM was higher than those of the methods for comparison.
UR - http://www.scopus.com/inward/record.url?scp=85056766752&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056766752&partnerID=8YFLogxK
U2 - 10.1145/3229184
DO - 10.1145/3229184
M3 - Article
AN - SCOPUS:85056766752
SN - 2375-4699
VL - 18
JO - ACM Transactions on Asian and Low-Resource Language Information Processing
JF - ACM Transactions on Asian and Low-Resource Language Information Processing
IS - 1
M1 - 3
ER -