Frequent Sequential Pattern and High Utility Pattern Mining in the Data Stream Environments

  • 畢 杰

Student thesis: Doctoral Thesis

Abstract

In this dissertation we addressed frequent sequential pattern and high utility pattern mining in data streams environment In data stream mining it is desirable that the algorithms perform the operations using single-pass Moreover algorithms are required to be efficient in terms of execution time In addition it is also desirable that the algorithms require less amount of memory usage We first explored frequent sequential pattern mining in multiple streams Specifically we proposed an efficient algorithm named PSP-AMS to progressively mine frequent sequential patterns across multiple streams In multiple streams environment sequential pattern may appear across multiple streams In addition users would like to see the patterns based on recent data rather than the old data Therefore we utilize progressive mining approach to find the across-streams sequential patterns PSP-AMS uses a novel data structure PSP-MS-tree to insert new items update current items and delete obsolete items By maintaining a PSP-MS-tree PSP-AMS efficiently finds the frequent sequential patterns across multiple streams Next we explored high utility pattern mining High utility pattern mining focuses utility value as measure of importance whereas frequent pattern mining focuses on frequency as measure of importance There are several studies in literature that propose algorithms for mining high utility patterns We observed that some of the algorithms perform well on sparse dataset whereas some of the algorithms perform well on dense datasets To address this issue we propose a novel algorithm called DMHUPS in conjunction with a data structure called IUData List to efficiently mine high utility patterns on both sparse and dense datasets In addition DMHUPS algorithm simultaneously calculates utility and tighter extension upper-bound values for multiple promising candidates We then explored high utility pattern mining in data stream environment To deal with data stream environment we utilize sliding window approach and propose an effective single-pass and one-phase algorithm SOHUPDS for mining high utility patterns in a data stream Moreover we propose a data structure IUDataListSW which stores information of length-1 itemsets for the current sliding window SOHUPDS utilizes IUDataListSW to efficiently mine high utility patterns over a data stream
Date of Award2019
Original languageEnglish
SupervisorJen-Wei Huang (Supervisor)

Cite this

Frequent Sequential Pattern and High Utility Pattern Mining in the Data Stream Environments
杰, 畢. (Author). 2019

Student thesis: Doctoral Thesis