This article presents an approach to response selection and message-response (MR) database expansion from the unstructured data on the psychological consultation websites for a retrieval-based question answering (QA) system in a constrained domain for emotional support and comforting. First, we manually construct an initial MR database based on the articles collected from the psychological consultation websites. The Chinese Knowledge and Information Processing probabilistic context-free grammar is adopted to obtain the semantic dependency graphs (SDGs) of all the messages and responses in the initial MR database. For each sentence in the MR database, all the semantic dependencies, each composed of two words and their semantic relation, are extracted from the SDG of the sentence to form a semantic dependency set. Finally, a matrix with the element representing the correlation between the semantic dependencies of the messages and their corresponding responses is constructed as a semantic dependency pair model (SDPM) for response selection. Moreover, as the number of MR pairs in the psychological consultation websites is increasing day by day, the MR database in the QA system should be expanded to meet the needs of the users. For MR database expansion, the unstructured data from the message board are automatically collected. For the collected data, the supervised latent Dirichlet allocation is adopted for event detection and then the event-based delta Bayesian Information Criterion is used for message and response article segmentation. Each extracted message segment is then fed to the constructed retrieval-based QA system to find the best matched response segment and the matching score is also estimated to verify if the new MR pair is suitable to be included in the expanded MR database. Fivefold cross validation was employed to evaluate the performance of the proposed retrieval-based QA system over the expanded MR database based on SDPM. Compared to the vector space model-based method, the Okapi BM25 model, and the deep learning-based sequence-to-sequence with attention model, the proposed approach achieved a more favorable performance according to a statistical significance test. The retrieval accuracy based on MR expansion was also evaluated and a satisfactory result was obtained confirming the effectiveness of the expanded MR database. In addition, the user's satisfaction score of the proposed system was evaluated using the Cronbach's alpha value and the satisfaction score of the proposed SDPM was higher than those of the methods for comparison.
|Journal||ACM Transactions on Asian and Low-Resource Language Information Processing|
|Publication status||Published - 2018 Nov|
All Science Journal Classification (ASJC) codes
- Computer Science(all)