The MapReduce programming model is designed to process large data sets based on parallel computing among multiple computer nodes (CNs). Because the data size is considerably increased (data are collected from sensors in most cases), the optimization problem of task assignment becomes important to improve the performance of MapReduce. Unfortunately, this problem is even more difficult in heterogeneous clouds in which the CNs have different capabilities and available resources. In this paper, the context-aware task assignment (CATA) approach is proposed to improve the performance of MapReduce in a twofold manner. First, CATA takes the resource demands for different types of jobs into account. Second, CATA can assign tasks to CNs according to their capabilities and available resources in a resource-proportional manner. The experimental results show that CATA can efficiently reduce the job execution time by 10 to 40%.
All Science Journal Classification (ASJC) codes
- Materials Science(all)