TY - JOUR
T1 - Mining frequent patterns in a varying-size sliding window of online transactional data streams
AU - Chen, Hui
AU - Shu, Lihchyun
AU - Xia, Jiali
AU - Deng, Qingshan
N1 - Funding Information:
This work was partly supported by the National Science Council of Taiwan under Grant NSC 100-2221-E-006-159, and the Jiangxi Province Department of Education Fund under Grant GJJ10119.
PY - 2012/12/15
Y1 - 2012/12/15
N2 - In some data stream applications, the information embedded in the data arriving in the most recent time period is of particular interest. This paper proposes a method for efficiently mining the frequent patterns in a varying-size sliding window of online data streams. To highlight recent frequent patterns in the data stream, a time decay model is used to differentiate the patterns of recently generated transactions from historical transactions. The derived concrete bounds of the decay factor can achieve either 100% recall or 100% precision. A summary data structure, named SWP-tree, is proposed for capturing the content of the transactions in the sliding window by scanning the stream only once. In order to speed up online processing of new transactions, the information of frequent patterns recorded in the SWP-tree is updated in an incrementally way. To make the mining operation efficient, the SWP-tree is periodically pruned by identifying insignificant patterns, which include two kinds of obsolete pattern and two kinds of infrequent pattern. Since the sliding window can change its size, the effect of window size is examined. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.
AB - In some data stream applications, the information embedded in the data arriving in the most recent time period is of particular interest. This paper proposes a method for efficiently mining the frequent patterns in a varying-size sliding window of online data streams. To highlight recent frequent patterns in the data stream, a time decay model is used to differentiate the patterns of recently generated transactions from historical transactions. The derived concrete bounds of the decay factor can achieve either 100% recall or 100% precision. A summary data structure, named SWP-tree, is proposed for capturing the content of the transactions in the sliding window by scanning the stream only once. In order to speed up online processing of new transactions, the information of frequent patterns recorded in the SWP-tree is updated in an incrementally way. To make the mining operation efficient, the SWP-tree is periodically pruned by identifying insignificant patterns, which include two kinds of obsolete pattern and two kinds of infrequent pattern. Since the sliding window can change its size, the effect of window size is examined. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.
UR - http://www.scopus.com/inward/record.url?scp=84864776901&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84864776901&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2012.05.007
DO - 10.1016/j.ins.2012.05.007
M3 - Article
AN - SCOPUS:84864776901
SN - 0020-0255
VL - 215
SP - 15
EP - 36
JO - Information sciences
JF - Information sciences
ER -