Word co-occurrence augmented topic model in short text

Guan Bin Chen, Hung Yu Kao

研究成果: Article同行評審

5 引文 斯高帕斯(Scopus)

摘要

The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) have then been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as Twitter is very popular. However, directly applying the transitional topic model on the short text corpus usually obtains non-coherent topics. It's because that there is no enough words to discover the word co-occurrence patterns in a short document. In this paper, we solve the problem of lack of the local word co-occurrence in LDA. Thus, we proposed an improvement of word co-occurrence method to enhance the topic models. We generate new virtual documents by re-organizing the words in documents and use it to enhance the traditional LDA. The experimental results show that our re-organized LDA (RO-LDA) method gets better results in the noisy Tweet dataset and the regular news dataset. Moreover, in our proposed augmented model, we do not need any external data. Our proposed methods are only based on the original topic model, thus our methods can easily apply to other existing LDA based models.

原文English
頁(從 - 到)S55-S70
期刊Intelligent Data Analysis
21
發行號S1
DOIs
出版狀態Published - 2017

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

指紋 深入研究「Word co-occurrence augmented topic model in short text」主題。共同形成了獨特的指紋。

引用此