TY - GEN
T1 - A clustering scheme for large high-dimensional document datasets
AU - Jiang, Jung Yi
AU - Chen, Jing Wen
AU - Lee, Shie Jue
PY - 2007
Y1 - 2007
N2 - Scalability and high dimensionality are two common problems associated with document clustering. We present a novel scheme to deal with these problems. Given a set of documents, we partition the set into several parts. We use one part and cluster the constituent documents into groups. By the obtained groups, we reduce the number of features by a certain ratio. Then we add another part, cluster the documents into groups based on the reduced features, and further reduce the number of the remaining features. This process is iterated until all parts are used. Experimental results have shown that our proposed scheme is effective for clustering large high-dimensional document dataseis.
AB - Scalability and high dimensionality are two common problems associated with document clustering. We present a novel scheme to deal with these problems. Given a set of documents, we partition the set into several parts. We use one part and cluster the constituent documents into groups. By the obtained groups, we reduce the number of features by a certain ratio. Then we add another part, cluster the documents into groups based on the reduced features, and further reduce the number of the remaining features. This process is iterated until all parts are used. Experimental results have shown that our proposed scheme is effective for clustering large high-dimensional document dataseis.
UR - http://www.scopus.com/inward/record.url?scp=38049087587&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38049087587&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-74581-5_56
DO - 10.1007/978-3-540-74581-5_56
M3 - Conference contribution
AN - SCOPUS:38049087587
SN - 9783540745808
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 511
EP - 519
BT - Advances in Computation and Intelligence - Second International Symposium, ISICA 2007, Proceedings
PB - Springer Verlag
T2 - 2nd International Symposium on Intelligence Computation and Applications, ISICA 2007
Y2 - 21 September 2007 through 23 September 2007
ER -