DPSP: Distributed Progressive Sequential Pattern mining on the cloud

Jen Wei Huang, Su Chen Lin, Ming Syan Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

22 Citations (Scopus)

Abstract

The progressive sequential pattern mining problem has been discussed in previous research works. With the increasing amount of data, single processors struggle to scale up. Traditional algorithms running on a single machine may have scalability troubles. Therefore, mining progressive sequential patterns intrinsically suffers from the scalability problem. In view of this, we design a distributed mining algorithm to address the scalability problem of mining progressive sequential patterns. The proposed algorithm DPSP, standing for Distributed Progressive Sequential Pattern mining algorithm, is implemented on top of Hadoop platform, which realizes the cloud computing environment. We propose Map/Reduce jobs in DPSP to delete obsolete itemsets, update current candidate sequential patterns and report up-to-date frequent sequential patterns within each POI. The experimental results show that DPSP possesses great scalability and consequently increases the performance and the practicability of mining algorithms.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 14th Pacific-Asia Conference, PAKDD 2010, Proceedings
Pages27-34
Number of pages8
EditionPART 2
DOIs
Publication statusPublished - 2010 Dec 1
Event14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2010 - Hyderabad, India
Duration: 2010 Jun 212010 Jun 24

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume6119 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2010
CountryIndia
CityHyderabad
Period10-06-2110-06-24

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'DPSP: Distributed Progressive Sequential Pattern mining on the cloud'. Together they form a unique fingerprint.

  • Cite this

    Huang, J. W., Lin, S. C., & Chen, M. S. (2010). DPSP: Distributed Progressive Sequential Pattern mining on the cloud. In Advances in Knowledge Discovery and Data Mining - 14th Pacific-Asia Conference, PAKDD 2010, Proceedings (PART 2 ed., pp. 27-34). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6119 LNAI, No. PART 2). https://doi.org/10.1007/978-3-642-13672-6_3