Federated MapReduce to transparently run applications on multicluster environment

Chun Yu Wang, Tzu Li Tai, Shu Jui-Shing, Chang Jyh-Biau, Shieh Ce-Kuen

研究成果: Conference contribution

10 引文 斯高帕斯(Scopus)

摘要

In the Cloud era, data is generated everywhere, how to efficiently analyze those 'Big Data' that have properties such as large volume, fast generation, and variety, are most critical issues. MapReduce is a simplified distributed parallel data processing model. It has been widely applied in many areas such as web indexing, clustering and classification. However, when it confronted the sensitive data, such as network log or mails, which are distributed among independent organizations, these data must keep privacy and cannot be aggregated for centralized analyzing. We propose Federated MapReduce (Fed-MR), a framework aimed at analyzing geometrically distributed data among independent organizations while avoiding data movement. In contrast to previous works, Fed-MR retains the simplicity of MapReduce programming eto provide a transparent way to run original MapReduce jobs across multiple clusters without any extra programming burden. Fed-MR also integrates multiple clusters in different locations to form hierarchical Top-Region relationships. Experiments, compared to a single cluster with the same number of worker nodes, had shown that the computation time was only increased by an average of 30% in WordCount and 10% in Grep. Therefore, Fed-MR has reasonable overheads in performance for analyzing data across Internet-connected clusters while no additional Global Reduce function was required as in traditional hierarchical MapReduce frameworks.

原文English
主出版物標題Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014
編輯Peter Chen, Peter Chen, Hemant Jain
發行者Institute of Electrical and Electronics Engineers Inc.
頁面296-303
頁數8
ISBN(電子)9781479950577
DOIs
出版狀態Published - 2014 9月 22
事件3rd IEEE International Congress on Big Data, BigData Congress 2014 - Anchorage, United States
持續時間: 2014 6月 272014 7月 2

出版系列

名字Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014

Other

Other3rd IEEE International Congress on Big Data, BigData Congress 2014
國家/地區United States
城市Anchorage
期間14-06-2714-07-02

All Science Journal Classification (ASJC) codes

  • 電腦科學應用

指紋

深入研究「Federated MapReduce to transparently run applications on multicluster environment」主題。共同形成了獨特的指紋。

引用此