Re-organized topic modeling for micro-blogging data

Guan Bin Chen, Hung Yu Kao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)


The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) has been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as tweet is very popular. However, directly applies the transitional topic model on the short text corpus usually gating non-coherent topics. Because there is no enough words to discover the word co-occurrence pattern in a short document. In this paper, we solve the lack of the local word co-occurrence problem in LDA. Thus, we proposed an improvement of word cooccurrence method to enhance the topic models. We generate new virtual documents by re-organizing the words in documents and just apply in the traditional LDA. The experimental result that show our RO-LDA method gets well results in the noisy Tweet dataset and the regular news title dataset. Moreover, there are two advantages in our methods. We do not need any external data and our proposed methods are based on the original topic model that we did not modify the model itself, thus our methods can easily apply to some other existing LDA based models.

Original languageEnglish
Title of host publicationProceedings of the ASE BigData and SocialInformatics 2015, ASE BD and SI 2015
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450337359
Publication statusPublished - 2015 Oct 7
EventASE BigData and SocialInformatics, ASE BD and SI 2015 - Kaohsiung, Taiwan
Duration: 2015 Oct 72015 Oct 9

Publication series

NameACM International Conference Proceeding Series


OtherASE BigData and SocialInformatics, ASE BD and SI 2015

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Re-organized topic modeling for micro-blogging data'. Together they form a unique fingerprint.

Cite this