A fuzzy self-constructing feature clustering algorithm for text classification

Jung Yi Jiang, Ren Jia Liou, Shie Jue Lee

研究成果: Article同行評審

114 引文 斯高帕斯(Scopus)

摘要

Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. In this paper, we propose a fuzzy similarity-based self-constructing algorithm for feature clustering. The words in the feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical mean and deviation. When all the words have been fed in, a desired number of clusters are formed automatically. We then have one extracted feature for each cluster. The extracted feature, corresponding to a cluster, is a weighted combination of the words contained in the cluster. By this algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Besides, the user need not specify the number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted features can then be avoided. Experimental results show that our method can run faster and obtain better extracted features than other methods.

原文English
文章編號5530315
頁(從 - 到)335-349
頁數15
期刊IEEE Transactions on Knowledge and Data Engineering
23
發行號3
DOIs
出版狀態Published - 2011 一月 31

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

指紋 深入研究「A fuzzy self-constructing feature clustering algorithm for text classification」主題。共同形成了獨特的指紋。

引用此