TY - GEN
T1 - A confidence-based hierarchical feature clustering algorithm for text classification
AU - Jiang, Jung Yi
AU - Yin, Kai Tai
AU - Lee, Shie Jue
PY - 2007
Y1 - 2007
N2 - In this paper, we propose a novel feature reduction approach to group words hierarchically into clusters which can then be used as new features for document classification. Initially, each word constitutes a cluster. We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster. This process of merging is iterated until all the mutual confidences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental results have shown that our method can perform better than other methods.
AB - In this paper, we propose a novel feature reduction approach to group words hierarchically into clusters which can then be used as new features for document classification. Initially, each word constitutes a cluster. We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster. This process of merging is iterated until all the mutual confidences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental results have shown that our method can perform better than other methods.
UR - http://www.scopus.com/inward/record.url?scp=50249178200&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=50249178200&partnerID=8YFLogxK
U2 - 10.1109/IPC.2007.35
DO - 10.1109/IPC.2007.35
M3 - Conference contribution
AN - SCOPUS:50249178200
SN - 0769530060
SN - 9780769530062
T3 - Proceedings The 2007 International Conference on Intelligent Pervasive Computing, IPC 2007
SP - 161
EP - 164
BT - Proceedings The 2007 International Conference on Intelligent Pervasive Computing, IPC 2007
T2 - 2007 International Conference on Intelligent Pervasive Computing, IPC 2007
Y2 - 11 October 2007 through 13 October 2007
ER -