A clustering scheme for large high-dimensional document datasets

Jung Yi Jiang, Jing Wen Chen, Shie Jue Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scalability and high dimensionality are two common problems associated with document clustering. We present a novel scheme to deal with these problems. Given a set of documents, we partition the set into several parts. We use one part and cluster the constituent documents into groups. By the obtained groups, we reduce the number of features by a certain ratio. Then we add another part, cluster the documents into groups based on the reduced features, and further reduce the number of the remaining features. This process is iterated until all parts are used. Experimental results have shown that our proposed scheme is effective for clustering large high-dimensional document dataseis.

Original languageEnglish
Title of host publicationAdvances in Computation and Intelligence - Second International Symposium, ISICA 2007, Proceedings
PublisherSpringer Verlag
Pages511-519
Number of pages9
ISBN (Print)9783540745808
DOIs
Publication statusPublished - 2007
Event2nd International Symposium on Intelligence Computation and Applications, ISICA 2007 - Wuhan, China
Duration: 2007 Sept 212007 Sept 23

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4683 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Symposium on Intelligence Computation and Applications, ISICA 2007
Country/TerritoryChina
CityWuhan
Period07-09-2107-09-23

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'A clustering scheme for large high-dimensional document datasets'. Together they form a unique fingerprint.

Cite this