Self-tuning clustering: An adaptive clustering method for transaction data

Ching Huang Yun, Kun Ta Chuang, Ming Syan Chen

研究成果: Conference contribution

1 引文 斯高帕斯(Scopus)

摘要

In this paper, we devise an efficient algorithm for clustering market-basket data items. Market-basket data analysis has been well addressed in mining association rules for discovering the set of large items which are the frequently purchased items among all transactions. In essence, clustering is meant to divide a set of data items into some proper groups in such a way that items in the same group are as similar to one another as possible. In view of the nature of clustering market basket data, we present a measurement, called the small-large (SL) ratio, which is in essence the ratio of the number of small items to that of large items. Clearly, the smaller the SL ratio of a cluster, the more similar to one another the items in the cluster are. Then, by utilizing a self-tuning technique for adaptively tuning the input and output SL ratio thresholds, we develop an efficient clustering algorithm, algorithm STC (standing for Self-Tuning Clustering), for clustering market-basket data. The objective of algorithm STC is "Given a database of transactions, determine a clustering such that the average SL ratio is minimized." We conduct several experiments on the real data and the synthetic workload for performance studies. It is shown by our experimental results that by utilizing the self-tuning technique to adaptively minimize the input and output SL ratio thresholds, algorithm STC performs very well. Specifically, algorithm STC not only incurs an execution time that is significantly smaller than that by prior works but also leads to the clustering results of very good quality.

原文English
主出版物標題Data Warehousing and Knowledge Discovery - 4th International Conference, DaWaK 2002, Proceedings
頁面42-51
頁數10
出版狀態Published - 2002 十二月 1
事件4th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2002 - Aix-en-Provence, France
持續時間: 2002 九月 42002 九月 6

出版系列

名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
2454 LNCS
ISSN(列印)0302-9743
ISSN(電子)1611-3349

Other

Other4th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2002
國家France
城市Aix-en-Provence
期間02-09-0402-09-06

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

指紋 深入研究「Self-tuning clustering: An adaptive clustering method for transaction data」主題。共同形成了獨特的指紋。

引用此