Idempotent Task Cache System for Handling Intermediate Data Skew in MapReduce on Cloud Computing

Tzu Chi Huang, Kuo Chih Chu, Jia Hui Lin, Ce Kuen Shieh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

A MapReduce system gradually becomes a popular platform for developing cloud applications while MapReduce is the de facto standard programming model of the applications. However, a MapReduce system may suffer intermediate data skew to degrade performances because input data is unpredictable and the Map function of the application may generate different quantities of intermediate data according to the application algorithm. A MapReduce system can use the Idempotent Task Cache System (ITCS) proposed in this paper to handle intermediate data skew. A MapReduce system can avoid negative performance impacts of intermediate data skew with ITCS by using caches to skip the high workload of processing skewed intermediate data in certain Reduce tasks. In experiments, a MapReduce system is tested with several popular applications to prove that ITCS not only alleviates performance penalties when intermediate data skew happens, but also greatly outperforms native MapReduce systems without any help of ITCS.

Original languageEnglish
Title of host publicationProceedings - 2016 International Computer Symposium, ICS 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages531-536
Number of pages6
ISBN (Electronic)9781509034383
DOIs
Publication statusPublished - 2017 Feb 16
Event2016 International Computer Symposium, ICS 2016 - Chiayi, Taiwan
Duration: 2016 Dec 152016 Dec 17

Publication series

NameProceedings - 2016 International Computer Symposium, ICS 2016

Other

Other2016 International Computer Symposium, ICS 2016
CountryTaiwan
CityChiayi
Period16-12-1516-12-17

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Idempotent Task Cache System for Handling Intermediate Data Skew in MapReduce on Cloud Computing'. Together they form a unique fingerprint.

Cite this