On the Guarantee of Peak Memory Consumption in Map-Reduce Frequent Pattern Mining

  • 李 高丞

Student thesis: Master's Thesis

Abstract

In recent years the design paradigm of data mining algorithms has moved far beyond the traditional execution in single powerful server into the Map-Reduce-based execution in the scalable fault-tolerance and self-configurable clusters such as Hadoop or Spark The parallelism has been recognized as the cure to resolve the performance bottleneck raising from the dramatic growth of data amount nowadays In the literature the state-of-the-art Map-Reduce mining algorithms almost pursue the reduction of the turnaround time by completely migrating the sophisticated index such as the FP-Tree developed in non-parallel version of FPGrowth in the distributed platform since the pruning effectiveness or search efficiency of these in-memory structure has been comprehensively demonstrated Surprisingly we in this paper explore that these in-memory structures will become the critical challenge in the big data era The inseparable nature of some complicated index leads to the unacceptable memory consumption in some computing nodes resulting in the ”out-of-memory” fatal error The computational bottleneck from data increase could be resolved by the Map-Reduce manner but however the memory usage will be inevitably aggravated since data increase will enlarge the size of centralized indexes To remedy the out-of-memory challenge we explore in this paper a practically important Map-Reduce mining framework to retrieve top-k frequent patterns in the presence of the memory constraint Specifically we focus on the memory issue which attempts to specify an available memory size in the Hadoop platform that can be utilized for mining frequent itemsets We devise a novel Framework called MFMR to efficiently discover frequent patterns while complying with the goal of user-controllable memory upper bound In our experimental studies on real data and synthetic data the MFMR framework shows its prominent advantages including the memory consumption and execution time demonstrating its promising applicability
Date of Award2017 Aug 1
Original languageEnglish
SupervisorKun-Ta Chuang (Supervisor)

Cite this

'