Federated MapReduce to transparently run applications on multicluster environment

Chun Yu Wang, Tzu Li Tai, Shu Jui-Shing, Chang Jyh-Biau, Shieh Ce-Kuen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

In the Cloud era, data is generated everywhere, how to efficiently analyze those 'Big Data' that have properties such as large volume, fast generation, and variety, are most critical issues. MapReduce is a simplified distributed parallel data processing model. It has been widely applied in many areas such as web indexing, clustering and classification. However, when it confronted the sensitive data, such as network log or mails, which are distributed among independent organizations, these data must keep privacy and cannot be aggregated for centralized analyzing. We propose Federated MapReduce (Fed-MR), a framework aimed at analyzing geometrically distributed data among independent organizations while avoiding data movement. In contrast to previous works, Fed-MR retains the simplicity of MapReduce programming eto provide a transparent way to run original MapReduce jobs across multiple clusters without any extra programming burden. Fed-MR also integrates multiple clusters in different locations to form hierarchical Top-Region relationships. Experiments, compared to a single cluster with the same number of worker nodes, had shown that the computation time was only increased by an average of 30% in WordCount and 10% in Grep. Therefore, Fed-MR has reasonable overheads in performance for analyzing data across Internet-connected clusters while no additional Global Reduce function was required as in traditional hierarchical MapReduce frameworks.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014
EditorsPeter Chen, Peter Chen, Hemant Jain
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages296-303
Number of pages8
ISBN (Electronic)9781479950577
DOIs
Publication statusPublished - 2014 Sept 22
Event3rd IEEE International Congress on Big Data, BigData Congress 2014 - Anchorage, United States
Duration: 2014 Jun 272014 Jul 2

Publication series

NameProceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014

Other

Other3rd IEEE International Congress on Big Data, BigData Congress 2014
Country/TerritoryUnited States
CityAnchorage
Period14-06-2714-07-02

All Science Journal Classification (ASJC) codes

  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Federated MapReduce to transparently run applications on multicluster environment'. Together they form a unique fingerprint.

Cite this