TY - GEN
T1 - Virtual Hadoop
T2 - 2016 Research in Adaptive and Convergent Systems, RACS 2016
AU - Chen, Yi Wei
AU - Hung, Shih Hao
AU - Tu, ChiaHeng
AU - Yeh, Chih Wei
PY - 2016/10/11
Y1 - 2016/10/11
N2 - Hadoop is a widely used software framework for handling massive data. As heterogeneous computing gains its momentum, variants of Hadoop have been developed to Offoad the computation of the Hadoop applications onto the heterogeneous processors, such as GPUS, DSPs, and FPGA. Unfortunately, these variants do not support on-demand resource scaling for the deadline-aware applications in a sophisticated heterogeneous computing environment. In this work, we developed a framework called Virtual Hadoop, which scales out the required computing resources for the applications automatically to meet the given real-time requirements. We extended the methods of resource inference and allocation for the heterogeneous computing environments. On top of these methods, an auto-scaling mechanism was developed to dynamically allocate resources on-demand based on profile data and performance models for the application execution to meet the given time requirements. In addition, Virtual Hadoop can utilize Docker containers to facilitate the auto-scaling mechanism, where a container encapsulates a Hadoop node with the capability to leverage heterogeneous computing engines. Our experimental results reveal the efficiency of Virtual Hadoop, and hopefully the experiences and discussion presented in this paper will ease the adoption of heterogeneous computing for Efficient big data processing. Copyright is held by the owner/author(s). Publication rights licensed to ACM.
AB - Hadoop is a widely used software framework for handling massive data. As heterogeneous computing gains its momentum, variants of Hadoop have been developed to Offoad the computation of the Hadoop applications onto the heterogeneous processors, such as GPUS, DSPs, and FPGA. Unfortunately, these variants do not support on-demand resource scaling for the deadline-aware applications in a sophisticated heterogeneous computing environment. In this work, we developed a framework called Virtual Hadoop, which scales out the required computing resources for the applications automatically to meet the given real-time requirements. We extended the methods of resource inference and allocation for the heterogeneous computing environments. On top of these methods, an auto-scaling mechanism was developed to dynamically allocate resources on-demand based on profile data and performance models for the application execution to meet the given time requirements. In addition, Virtual Hadoop can utilize Docker containers to facilitate the auto-scaling mechanism, where a container encapsulates a Hadoop node with the capability to leverage heterogeneous computing engines. Our experimental results reveal the efficiency of Virtual Hadoop, and hopefully the experiences and discussion presented in this paper will ease the adoption of heterogeneous computing for Efficient big data processing. Copyright is held by the owner/author(s). Publication rights licensed to ACM.
UR - http://www.scopus.com/inward/record.url?scp=85006817321&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006817321&partnerID=8YFLogxK
U2 - 10.1145/2987386.2987408
DO - 10.1145/2987386.2987408
M3 - Conference contribution
AN - SCOPUS:85006817321
T3 - Proceedings of the 2016 Research in Adaptive and Convergent Systems, RACS 2016
SP - 201
EP - 206
BT - Proceedings of the 2016 Research in Adaptive and Convergent Systems, RACS 2016
PB - Association for Computing Machinery, Inc
Y2 - 11 October 2016 through 14 October 2016
ER -