In this paper, we propose a novel feature reduction approach to group words hierarchically into clusters which can then be used as new features for document classification. Initially, each word constitutes a cluster. We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster. This process of merging is iterated until all the mutual confidences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental results have shown that our method can perform better than other methods.