A Study on Utility-based Episode Mining Methodologies

  • 林 鈺峰

Student thesis: Doctoral Thesis


Frequent episode mining is a fundamental research topic in data mining which refers to discovering all episodes that appear in complex event sequences with frequencies no less than a user-specified minimum support threshold Frequent episode mining has been applied to various kinds of domains such as finance analysis anomaly detection and vital sign analysis One emerging research issue in this field is utility-based episode mining which differs from frequent episode mining in that it considers not only the basic properties of items (i e frequency) but also its utility (e g weight profit and value) Hence it is a more effective kind of technique through which deeper insights can be gained Although some studies have been devoted to the issue of utility-based episode mining the following deficiencies have been identified from the existing methods: First simply deriving rules from the set of high utility episodes may not produce useful or meaningful utility-based episode rules for users In addition it may be computationally costly to generate the rules using current methods Second for users it is very difficult to specify an appropriate minimum utility threshold and to directly obtain the most valuable high utility episodes This is because the complexity of utility-based complex event sequences involves multiple factors e g the distribution of the events and utilities the density of the complex event sequences and the lengths of the episodes Third the prediction models constructed by the episodes (e g frequent episodes high utility episodes) do not consider simultaneous optimization on the parameters and selection of model subsets Hence they might be ineffective in terms of profitability and accuracy To resolve the issues as mentioned above this dissertation addresses a series of novel utility-based episode mining problems including (1) high utility episode rule mining (2) top-k high utility episode mining and (3) construction of an optimized model using high utility episodes and genetic algorithms In the first research topic we propose an algorithm called UBER-Mine and a compact tree structure called UR-Tree in order to efficiently discover the complete set of high utility episode rules in complex event sequences Furthermore in order to further demonstrate the effectiveness of our proposed method we devise an episode-based investment model called SISTEM that is able to automatically determine multi-event episodes and associated profitable complex events embedded in stock price data In addition we further propose an extended version called IV-UBER constructed using high utility episode rules in the context of investment The results show that high utility episode rules can be successfully applied to the challenging problem of predicting of stock movement and IV-UBER outperforms several state-of-the-art methods In the second research topic we propose an efficient algorithm called TKUE for efficiently discovering top-k high utility episodes from complex event sequences Furthermore in order to demonstrate the effectiveness and efficiency of our method in real-life applications we also conduct an analysis on bike rentals in a city Experiments show that the TKUE has good scalability and can effectively discover the key events affecting bike rentals Regarding the third research topic we propose a novel method named HUEM-GAO to generate a prediction model of high utility episode rules and use genetic algorithms to optimize this model The problem of stock movement prediction is employed as an example application The results show that our HUEM-GAO method outperforms well-known machine learning methods in terms of average return and precision The above research topics are proposed to provide users with important and concise results and this dissertation contributes to advancing the research in utility-based episode mining as well as providing an effective solution for high-potential applications
Date of Award2015 Jun 1
Original languageEnglish
SupervisorSun-Yuan Hsieh (Supervisor)

Cite this