Re-organized topic modeling for micro-blogging data

Guan Bin Chen, Hung Yu Kao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) has been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as tweet is very popular. However, directly applies the transitional topic model on the short text corpus usually gating non-coherent topics. Because there is no enough words to discover the word co-occurrence pattern in a short document. In this paper, we solve the lack of the local word co-occurrence problem in LDA. Thus, we proposed an improvement of word cooccurrence method to enhance the topic models. We generate new virtual documents by re-organizing the words in documents and just apply in the traditional LDA. The experimental result that show our RO-LDA method gets well results in the noisy Tweet dataset and the regular news title dataset. Moreover, there are two advantages in our methods. We do not need any external data and our proposed methods are based on the original topic model that we did not modify the model itself, thus our methods can easily apply to some other existing LDA based models.

Original languageEnglish
Title of host publicationProceedings of the ASE BigData and SocialInformatics 2015, ASE BD and SI 2015
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450337359
DOIs
Publication statusPublished - 2015 Oct 7
EventASE BigData and SocialInformatics, ASE BD and SI 2015 - Kaohsiung, Taiwan
Duration: 2015 Oct 72015 Oct 9

Publication series

NameACM International Conference Proceeding Series
Volume07-09-Ocobert-2015

Other

OtherASE BigData and SocialInformatics, ASE BD and SI 2015
CountryTaiwan
CityKaohsiung
Period15-10-0715-10-09

Fingerprint

Internet

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Chen, G. B., & Kao, H. Y. (2015). Re-organized topic modeling for micro-blogging data. In Proceedings of the ASE BigData and SocialInformatics 2015, ASE BD and SI 2015 [a35] (ACM International Conference Proceeding Series; Vol. 07-09-Ocobert-2015). Association for Computing Machinery. https://doi.org/10.1145/2818869.2818875
Chen, Guan Bin ; Kao, Hung Yu. / Re-organized topic modeling for micro-blogging data. Proceedings of the ASE BigData and SocialInformatics 2015, ASE BD and SI 2015. Association for Computing Machinery, 2015. (ACM International Conference Proceeding Series).
@inproceedings{a1d8c2e05f674f479e65d98e55da33c2,
title = "Re-organized topic modeling for micro-blogging data",
abstract = "The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) has been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as tweet is very popular. However, directly applies the transitional topic model on the short text corpus usually gating non-coherent topics. Because there is no enough words to discover the word co-occurrence pattern in a short document. In this paper, we solve the lack of the local word co-occurrence problem in LDA. Thus, we proposed an improvement of word cooccurrence method to enhance the topic models. We generate new virtual documents by re-organizing the words in documents and just apply in the traditional LDA. The experimental result that show our RO-LDA method gets well results in the noisy Tweet dataset and the regular news title dataset. Moreover, there are two advantages in our methods. We do not need any external data and our proposed methods are based on the original topic model that we did not modify the model itself, thus our methods can easily apply to some other existing LDA based models.",
author = "Chen, {Guan Bin} and Kao, {Hung Yu}",
year = "2015",
month = "10",
day = "7",
doi = "10.1145/2818869.2818875",
language = "English",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "Proceedings of the ASE BigData and SocialInformatics 2015, ASE BD and SI 2015",

}

Chen, GB & Kao, HY 2015, Re-organized topic modeling for micro-blogging data. in Proceedings of the ASE BigData and SocialInformatics 2015, ASE BD and SI 2015., a35, ACM International Conference Proceeding Series, vol. 07-09-Ocobert-2015, Association for Computing Machinery, ASE BigData and SocialInformatics, ASE BD and SI 2015, Kaohsiung, Taiwan, 15-10-07. https://doi.org/10.1145/2818869.2818875

Re-organized topic modeling for micro-blogging data. / Chen, Guan Bin; Kao, Hung Yu.

Proceedings of the ASE BigData and SocialInformatics 2015, ASE BD and SI 2015. Association for Computing Machinery, 2015. a35 (ACM International Conference Proceeding Series; Vol. 07-09-Ocobert-2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Re-organized topic modeling for micro-blogging data

AU - Chen, Guan Bin

AU - Kao, Hung Yu

PY - 2015/10/7

Y1 - 2015/10/7

N2 - The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) has been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as tweet is very popular. However, directly applies the transitional topic model on the short text corpus usually gating non-coherent topics. Because there is no enough words to discover the word co-occurrence pattern in a short document. In this paper, we solve the lack of the local word co-occurrence problem in LDA. Thus, we proposed an improvement of word cooccurrence method to enhance the topic models. We generate new virtual documents by re-organizing the words in documents and just apply in the traditional LDA. The experimental result that show our RO-LDA method gets well results in the noisy Tweet dataset and the regular news title dataset. Moreover, there are two advantages in our methods. We do not need any external data and our proposed methods are based on the original topic model that we did not modify the model itself, thus our methods can easily apply to some other existing LDA based models.

AB - The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) has been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as tweet is very popular. However, directly applies the transitional topic model on the short text corpus usually gating non-coherent topics. Because there is no enough words to discover the word co-occurrence pattern in a short document. In this paper, we solve the lack of the local word co-occurrence problem in LDA. Thus, we proposed an improvement of word cooccurrence method to enhance the topic models. We generate new virtual documents by re-organizing the words in documents and just apply in the traditional LDA. The experimental result that show our RO-LDA method gets well results in the noisy Tweet dataset and the regular news title dataset. Moreover, there are two advantages in our methods. We do not need any external data and our proposed methods are based on the original topic model that we did not modify the model itself, thus our methods can easily apply to some other existing LDA based models.

UR - http://www.scopus.com/inward/record.url?scp=84959933756&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959933756&partnerID=8YFLogxK

U2 - 10.1145/2818869.2818875

DO - 10.1145/2818869.2818875

M3 - Conference contribution

AN - SCOPUS:84959933756

T3 - ACM International Conference Proceeding Series

BT - Proceedings of the ASE BigData and SocialInformatics 2015, ASE BD and SI 2015

PB - Association for Computing Machinery

ER -

Chen GB, Kao HY. Re-organized topic modeling for micro-blogging data. In Proceedings of the ASE BigData and SocialInformatics 2015, ASE BD and SI 2015. Association for Computing Machinery. 2015. a35. (ACM International Conference Proceeding Series). https://doi.org/10.1145/2818869.2818875