TY - JOUR
T1 - Density conscious subspace clustering for high-dimensional data
AU - Chu, Yi Hong
AU - Huang, Jen Wei
AU - Chuang, Kun Ta
AU - Yang, De Nian
AU - Chen, Ming Syan
N1 - Funding Information:
The work was supported in part by the National Science Council of Taiwan, ROC, under Contracts NSC95-2752-E-002-006-PAE. In addition, the authors would like to thank K. Kailing, H.-P. Kriegel, and P. Kroger for their offering the codes of CLIQUE and SUBCLU, whose variations are used in their experimental studies.
PY - 2010/1
Y1 - 2010/1
N2 - Instead of finding clusters in the full feature space, subspace clustering is an emergent task which aims at detecting clusters embedded in subspaces. Most of previous works in the literature are density-based approaches, where a cluster is regarded as a high-density region in a subspace. However, the identification of dense regions in previous works lacks of considering a critical problem, called the density divergence problem in this paper, which refers to the phenomenon that the region densities vary in different subspace cardinalities. Without considering this problem, previous works utilize a density threshold to discover the dense regions in all subspaces, which incurs the serious loss of clustering accuracy (either recall or precision of the resulting clusters) in different subspace cardinalities. To tackle the density divergence problem, in this paper, we devise a novel subspace clustering model to discover the clusters based on the relative region densities in the subspaces, where the clusters are regarded as regions whose densities are relatively high as compared to the region densities in a subspace. Based on this idea, different density thresholds are adaptively determined to discover the clusters in different subspace cardinalities. Due to the infeasibility of applying previous techniques in this novel clustering model, we also devise an innovative algorithm, referred to as DENCOS (DENsity COnscious Subspace clustering), to adopt a divide-and-conquer scheme to efficiently discover clusters satisfying different density thresholds in different subspace cardinalities. As validated by our extensive experiments on various data sets, DENCOS can discover the clusters in all subspaces with high quality, and the efficiency of DENCOS outperformes previous works.
AB - Instead of finding clusters in the full feature space, subspace clustering is an emergent task which aims at detecting clusters embedded in subspaces. Most of previous works in the literature are density-based approaches, where a cluster is regarded as a high-density region in a subspace. However, the identification of dense regions in previous works lacks of considering a critical problem, called the density divergence problem in this paper, which refers to the phenomenon that the region densities vary in different subspace cardinalities. Without considering this problem, previous works utilize a density threshold to discover the dense regions in all subspaces, which incurs the serious loss of clustering accuracy (either recall or precision of the resulting clusters) in different subspace cardinalities. To tackle the density divergence problem, in this paper, we devise a novel subspace clustering model to discover the clusters based on the relative region densities in the subspaces, where the clusters are regarded as regions whose densities are relatively high as compared to the region densities in a subspace. Based on this idea, different density thresholds are adaptively determined to discover the clusters in different subspace cardinalities. Due to the infeasibility of applying previous techniques in this novel clustering model, we also devise an innovative algorithm, referred to as DENCOS (DENsity COnscious Subspace clustering), to adopt a divide-and-conquer scheme to efficiently discover clusters satisfying different density thresholds in different subspace cardinalities. As validated by our extensive experiments on various data sets, DENCOS can discover the clusters in all subspaces with high quality, and the efficiency of DENCOS outperformes previous works.
UR - http://www.scopus.com/inward/record.url?scp=72949091449&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=72949091449&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2008.224
DO - 10.1109/TKDE.2008.224
M3 - Article
AN - SCOPUS:72949091449
SN - 1041-4347
VL - 22
SP - 16
EP - 30
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 1
M1 - 4663070
ER -