Mining frequent patterns in a varying-size sliding window of online transactional data streams

Hui Chen, Lih-Chyun Shu, Jiali Xia, Qingshan Deng

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

In some data stream applications, the information embedded in the data arriving in the most recent time period is of particular interest. This paper proposes a method for efficiently mining the frequent patterns in a varying-size sliding window of online data streams. To highlight recent frequent patterns in the data stream, a time decay model is used to differentiate the patterns of recently generated transactions from historical transactions. The derived concrete bounds of the decay factor can achieve either 100% recall or 100% precision. A summary data structure, named SWP-tree, is proposed for capturing the content of the transactions in the sliding window by scanning the stream only once. In order to speed up online processing of new transactions, the information of frequent patterns recorded in the SWP-tree is updated in an incrementally way. To make the mining operation efficient, the SWP-tree is periodically pruned by identifying insignificant patterns, which include two kinds of obsolete pattern and two kinds of infrequent pattern. Since the sliding window can change its size, the effect of window size is examined. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.

Original languageEnglish
Pages (from-to)15-36
Number of pages22
JournalInformation sciences
Volume215
DOIs
Publication statusPublished - 2012 Dec 15

Fingerprint

Frequent Pattern Mining
Sliding Window
Data Streams
Frequent Pattern
Transactions
Data structures
Concretes
Scanning
Processing
Mining
Experiments
Decay
Differentiate
Simulation Experiment
Data Structures
Speedup
Data streams
Sliding window

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

@article{2dfe9661aa8e40179700fc0bc1cba7e5,
title = "Mining frequent patterns in a varying-size sliding window of online transactional data streams",
abstract = "In some data stream applications, the information embedded in the data arriving in the most recent time period is of particular interest. This paper proposes a method for efficiently mining the frequent patterns in a varying-size sliding window of online data streams. To highlight recent frequent patterns in the data stream, a time decay model is used to differentiate the patterns of recently generated transactions from historical transactions. The derived concrete bounds of the decay factor can achieve either 100{\%} recall or 100{\%} precision. A summary data structure, named SWP-tree, is proposed for capturing the content of the transactions in the sliding window by scanning the stream only once. In order to speed up online processing of new transactions, the information of frequent patterns recorded in the SWP-tree is updated in an incrementally way. To make the mining operation efficient, the SWP-tree is periodically pruned by identifying insignificant patterns, which include two kinds of obsolete pattern and two kinds of infrequent pattern. Since the sliding window can change its size, the effect of window size is examined. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.",
author = "Hui Chen and Lih-Chyun Shu and Jiali Xia and Qingshan Deng",
year = "2012",
month = "12",
day = "15",
doi = "10.1016/j.ins.2012.05.007",
language = "English",
volume = "215",
pages = "15--36",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

Mining frequent patterns in a varying-size sliding window of online transactional data streams. / Chen, Hui; Shu, Lih-Chyun; Xia, Jiali; Deng, Qingshan.

In: Information sciences, Vol. 215, 15.12.2012, p. 15-36.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Mining frequent patterns in a varying-size sliding window of online transactional data streams

AU - Chen, Hui

AU - Shu, Lih-Chyun

AU - Xia, Jiali

AU - Deng, Qingshan

PY - 2012/12/15

Y1 - 2012/12/15

N2 - In some data stream applications, the information embedded in the data arriving in the most recent time period is of particular interest. This paper proposes a method for efficiently mining the frequent patterns in a varying-size sliding window of online data streams. To highlight recent frequent patterns in the data stream, a time decay model is used to differentiate the patterns of recently generated transactions from historical transactions. The derived concrete bounds of the decay factor can achieve either 100% recall or 100% precision. A summary data structure, named SWP-tree, is proposed for capturing the content of the transactions in the sliding window by scanning the stream only once. In order to speed up online processing of new transactions, the information of frequent patterns recorded in the SWP-tree is updated in an incrementally way. To make the mining operation efficient, the SWP-tree is periodically pruned by identifying insignificant patterns, which include two kinds of obsolete pattern and two kinds of infrequent pattern. Since the sliding window can change its size, the effect of window size is examined. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.

AB - In some data stream applications, the information embedded in the data arriving in the most recent time period is of particular interest. This paper proposes a method for efficiently mining the frequent patterns in a varying-size sliding window of online data streams. To highlight recent frequent patterns in the data stream, a time decay model is used to differentiate the patterns of recently generated transactions from historical transactions. The derived concrete bounds of the decay factor can achieve either 100% recall or 100% precision. A summary data structure, named SWP-tree, is proposed for capturing the content of the transactions in the sliding window by scanning the stream only once. In order to speed up online processing of new transactions, the information of frequent patterns recorded in the SWP-tree is updated in an incrementally way. To make the mining operation efficient, the SWP-tree is periodically pruned by identifying insignificant patterns, which include two kinds of obsolete pattern and two kinds of infrequent pattern. Since the sliding window can change its size, the effect of window size is examined. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.

UR - http://www.scopus.com/inward/record.url?scp=84864776901&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864776901&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2012.05.007

DO - 10.1016/j.ins.2012.05.007

M3 - Article

AN - SCOPUS:84864776901

VL - 215

SP - 15

EP - 36

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -