TY - JOUR
T1 - A general grid-clustering approach
AU - Yue, Shihong
AU - Wei, Miaomiao
AU - Wang, Jeen Shing
AU - Wang, Huaxiang
N1 - Funding Information:
The authors would like to thank the anonymous referees for their helpful comments and suggestions to improve the presentation of this paper. This work was supported by the National Science Foundation of China under Grant Nos. 60772080, 60532020.
PY - 2008/7/1
Y1 - 2008/7/1
N2 - Hierarchical clustering is an important part of cluster analysis. Based on various theories, numerous hierarchical clustering algorithms have been developed, and new clustering algorithms continue to appear in the literature. It is known that both divisive and agglomerative clustering algorithms in hierarchical clustering play a pivotal role in data-based models, and have been successfully applied in clustering very large datasets. However, hierarchical clustering is parameter-sensitive. When the user has no knowledge of how to choose the input parameters, the clustering results may become undesirable. In this paper, we propose a general grid-clustering approach (GGCA) under a common assumption about hierarchical clustering. The key features of the GGCA include: (1) the combination of the divisible and the agglomerative clustering algorithms into a unifying generative framework; (2) the determination of key input parameters: an optimal grid size for the first time; and (3) the application of a two-phase merging process to aggregate all data objects. Consequently, the GGCA is a non-parametric algorithm which does not require users to input parameters, and exhibits excellent performance in dealing with not well-separated and shape-diverse clusters. Some experimental results comparing the proposed GGCA with the existing methods show the superiority of the GGCA approach.
AB - Hierarchical clustering is an important part of cluster analysis. Based on various theories, numerous hierarchical clustering algorithms have been developed, and new clustering algorithms continue to appear in the literature. It is known that both divisive and agglomerative clustering algorithms in hierarchical clustering play a pivotal role in data-based models, and have been successfully applied in clustering very large datasets. However, hierarchical clustering is parameter-sensitive. When the user has no knowledge of how to choose the input parameters, the clustering results may become undesirable. In this paper, we propose a general grid-clustering approach (GGCA) under a common assumption about hierarchical clustering. The key features of the GGCA include: (1) the combination of the divisible and the agglomerative clustering algorithms into a unifying generative framework; (2) the determination of key input parameters: an optimal grid size for the first time; and (3) the application of a two-phase merging process to aggregate all data objects. Consequently, the GGCA is a non-parametric algorithm which does not require users to input parameters, and exhibits excellent performance in dealing with not well-separated and shape-diverse clusters. Some experimental results comparing the proposed GGCA with the existing methods show the superiority of the GGCA approach.
UR - https://www.scopus.com/pages/publications/43249118570
UR - https://www.scopus.com/pages/publications/43249118570#tab=citedBy
U2 - 10.1016/j.patrec.2008.02.019
DO - 10.1016/j.patrec.2008.02.019
M3 - Article
AN - SCOPUS:43249118570
SN - 0167-8655
VL - 29
SP - 1372
EP - 1384
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
IS - 9
ER -