TY - GEN
T1 - DPSP
T2 - 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2010
AU - Huang, Jen Wei
AU - Lin, Su Chen
AU - Chen, Ming Syan
PY - 2010/12/1
Y1 - 2010/12/1
N2 - The progressive sequential pattern mining problem has been discussed in previous research works. With the increasing amount of data, single processors struggle to scale up. Traditional algorithms running on a single machine may have scalability troubles. Therefore, mining progressive sequential patterns intrinsically suffers from the scalability problem. In view of this, we design a distributed mining algorithm to address the scalability problem of mining progressive sequential patterns. The proposed algorithm DPSP, standing for Distributed Progressive Sequential Pattern mining algorithm, is implemented on top of Hadoop platform, which realizes the cloud computing environment. We propose Map/Reduce jobs in DPSP to delete obsolete itemsets, update current candidate sequential patterns and report up-to-date frequent sequential patterns within each POI. The experimental results show that DPSP possesses great scalability and consequently increases the performance and the practicability of mining algorithms.
AB - The progressive sequential pattern mining problem has been discussed in previous research works. With the increasing amount of data, single processors struggle to scale up. Traditional algorithms running on a single machine may have scalability troubles. Therefore, mining progressive sequential patterns intrinsically suffers from the scalability problem. In view of this, we design a distributed mining algorithm to address the scalability problem of mining progressive sequential patterns. The proposed algorithm DPSP, standing for Distributed Progressive Sequential Pattern mining algorithm, is implemented on top of Hadoop platform, which realizes the cloud computing environment. We propose Map/Reduce jobs in DPSP to delete obsolete itemsets, update current candidate sequential patterns and report up-to-date frequent sequential patterns within each POI. The experimental results show that DPSP possesses great scalability and consequently increases the performance and the practicability of mining algorithms.
UR - http://www.scopus.com/inward/record.url?scp=79956324856&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79956324856&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-13672-6_3
DO - 10.1007/978-3-642-13672-6_3
M3 - Conference contribution
AN - SCOPUS:79956324856
SN - 3642136710
SN - 9783642136719
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 27
EP - 34
BT - Advances in Knowledge Discovery and Data Mining - 14th Pacific-Asia Conference, PAKDD 2010, Proceedings
Y2 - 21 June 2010 through 24 June 2010
ER -