A confidence-based hierarchical feature clustering algorithm for text classification

Jung Yi Jiang, Kai Tai Yin, Shie Jue Lee

研究成果: Conference contribution

摘要

In this paper, we propose a novel feature reduction approach to group words hierarchically into clusters which can then be used as new features for document classification. Initially, each word constitutes a cluster. We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster. This process of merging is iterated until all the mutual confidences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental results have shown that our method can perform better than other methods.

原文English
主出版物標題Proceedings The 2007 International Conference on Intelligent Pervasive Computing, IPC 2007
頁面161-164
頁數4
DOIs
出版狀態Published - 2007
事件2007 International Conference on Intelligent Pervasive Computing, IPC 2007 - Jeju Island, Korea, Republic of
持續時間: 2007 10月 112007 10月 13

出版系列

名字Proceedings The 2007 International Conference on Intelligent Pervasive Computing, IPC 2007

Conference

Conference2007 International Conference on Intelligent Pervasive Computing, IPC 2007
國家/地區Korea, Republic of
城市Jeju Island
期間07-10-1107-10-13

All Science Journal Classification (ASJC) codes

  • 一般電腦科學
  • 電腦網路與通信
  • 軟體

指紋

深入研究「A confidence-based hierarchical feature clustering algorithm for text classification」主題。共同形成了獨特的指紋。

引用此