A clustering scheme for large high-dimensional document datasets

Jung Yi Jiang, Jing Wen Chen, Shie Jue Lee

研究成果: Conference contribution

摘要

Scalability and high dimensionality are two common problems associated with document clustering. We present a novel scheme to deal with these problems. Given a set of documents, we partition the set into several parts. We use one part and cluster the constituent documents into groups. By the obtained groups, we reduce the number of features by a certain ratio. Then we add another part, cluster the documents into groups based on the reduced features, and further reduce the number of the remaining features. This process is iterated until all parts are used. Experimental results have shown that our proposed scheme is effective for clustering large high-dimensional document dataseis.

原文English
主出版物標題Advances in Computation and Intelligence - Second International Symposium, ISICA 2007, Proceedings
發行者Springer Verlag
頁面511-519
頁數9
ISBN(列印)9783540745808
DOIs
出版狀態Published - 2007
事件2nd International Symposium on Intelligence Computation and Applications, ISICA 2007 - Wuhan, China
持續時間: 2007 9月 212007 9月 23

出版系列

名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
4683 LNCS
ISSN(列印)0302-9743
ISSN(電子)1611-3349

Conference

Conference2nd International Symposium on Intelligence Computation and Applications, ISICA 2007
國家/地區China
城市Wuhan
期間07-09-2107-09-23

All Science Journal Classification (ASJC) codes

  • 理論電腦科學
  • 電腦科學(全部)

指紋

深入研究「A clustering scheme for large high-dimensional document datasets」主題。共同形成了獨特的指紋。

引用此