Parallelizing R in hadoop (a work-in-progress study)

Yen Zhou Huang, Yu Ling Chen, Chia Ping Tsai, Hung-Chang Hsiao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

R is a popular programming language which is widely adopted by data scientists. However, typical R can only be executed in a single machine environment. Although R can be linked to Hadoop such as RHadoop, R users need to develop their R scripts based on the MapReduce framework. This de-mands highly skill of R programmers to parallelize their R pro-grams in terms of Map and Reduce jobs, killing the motivation of performing R computation in distributed environments out-pacing the single machine capacity. We present an implementa-tion for parallelizing R in Hadoop in this paper. Our objective is to allow R users to run their R scripts, which are developed in a single machine environment, in Hadoop without modification. While this research work is still ongoing, we report our prelim-inary experiences in this paper on how to hide the complexity of migrating and running such R scripts in Hadoop.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Conference on Smart City, SmartCity 2015, Held Jointly with 8th IEEE International Conference on Social Computing and Networking, SocialCom 2015, 5th IEEE International Conference on Sustainable Computing and Communications, SustainCom 2015, 2015 International Conference on Big Data Intelligence and Computing, DataCom 2015, 5th International Symposium on Cloud and Service Computing, SC2 2015
EditorsXingang Liu, Peicheng Wang, Yufeng Wang, Mianxiong Dong, Robert C. H. Hsu, Feng Xia, Yuhui Deng
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1114-1116
Number of pages3
ISBN (Electronic)9781509018932
DOIs
Publication statusPublished - 2015 Jan 1
EventIEEE International Conference on Smart City, SmartCity 2015 - Chengdu, China
Duration: 2015 Dec 192015 Dec 21

Publication series

NameProceedings - 2015 IEEE International Conference on Smart City, SmartCity 2015, Held Jointly with 8th IEEE International Conference on Social Computing and Networking, SocialCom 2015, 5th IEEE International Conference on Sustainable Computing and Communications, SustainCom 2015, 2015 International Conference on Big Data Intelligence and Computing, DataCom 2015, 5th International Symposium on Cloud and Service Computing, SC2 2015

Other

OtherIEEE International Conference on Smart City, SmartCity 2015
CountryChina
CityChengdu
Period15-12-1915-12-21

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Signal Processing
  • Computer Networks and Communications
  • Modelling and Simulation
  • Sociology and Political Science
  • Urban Studies

Fingerprint Dive into the research topics of 'Parallelizing R in hadoop (a work-in-progress study)'. Together they form a unique fingerprint.

Cite this