A confidence-based hierarchical feature clustering algorithm for text classification

Jung Yi Jiang, Kai Tai Yin, Shie Jue Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a novel feature reduction approach to group words hierarchically into clusters which can then be used as new features for document classification. Initially, each word constitutes a cluster. We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster. This process of merging is iterated until all the mutual confidences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental results have shown that our method can perform better than other methods.

Original languageEnglish
Title of host publicationProceedings The 2007 International Conference on Intelligent Pervasive Computing, IPC 2007
Pages161-164
Number of pages4
DOIs
Publication statusPublished - 2007
Event2007 International Conference on Intelligent Pervasive Computing, IPC 2007 - Jeju Island, Korea, Republic of
Duration: 2007 Oct 112007 Oct 13

Publication series

NameProceedings The 2007 International Conference on Intelligent Pervasive Computing, IPC 2007

Conference

Conference2007 International Conference on Intelligent Pervasive Computing, IPC 2007
Country/TerritoryKorea, Republic of
CityJeju Island
Period07-10-1107-10-13

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'A confidence-based hierarchical feature clustering algorithm for text classification'. Together they form a unique fingerprint.

Cite this