The computational efficiency is usually a concern when dealing with large-scale social network mining tasks containing billions of entities. Cloud computing is widely regarded as a feasible solution to this problem. In this work, we present an open-source graph mining library called the MapReduce Graph Mining Framework (MGMF) to be a robust and efficient MapReduce-based graph mining tool. We start from dividing graph mining algorithms into several categories and design a MapReduce framework for algorithms in each category. The experimental results show that MGMF is 3–20 times more efficient than PEGASUS, a state-of-the-art library for graph mining on MapReduce. Moreover, it provides broader coverage of a variety of graph mining algorithms. Furthermore, we designed a model to generate large-scale social networks capturing the power-law degree distribution property by parallelizing the mechanism of preferential attachment so that it is possible to produce billion-sized scale-free network in minutes. Our implemented open-source library can be downloaded from http://mslab.csie.ntu.edu.tw/~noahsark/MGMF/.
All Science Journal Classification (ASJC) codes
- Information Systems
- Media Technology
- Human-Computer Interaction
- Computer Science Applications