TY - GEN
T1 - Bridging the Gap between Big Data System Software Stack and Applications
T2 - 2018 IEEE International Conference on Big Data, Big Data 2018
AU - Tsai, Chia Ping
AU - Hsiao, Hung Chang
AU - Chao, Yu Chang
AU - Hsu, Michael
AU - Chang, Andy R.K.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - We present in this paper two novel infrastructural services based on Hadoop for big data storage and computing in a Taiwan's semiconductor wafer fabrication foundry. The two services include Hadoop data service (HDS) and distributed R language computing service (DRS), which have been built and operated in production systems for 3.5 years. They evolve over time by incrementally accommodating users' requirements. HDS is a web- based distributed big data storage facility. Users simply rely on HDS to access data objects stored in Hadoop with the HTTP protocol. In addition, HDS is scalable and reliable. Moreover, HDS is efficient and effective by intelligently selecting either Hadoop distributed file system (HDFS) or database (HBase) for publishing data objects. Specifically, HDS is transparent to existing analytics and data inquiry applications, such as Spark and Hive. While HDS is a unified storage for supporting sequential and random data accesses for big data in the wafer fabrication foundry, DRS is a distributed computing framework for typical R language users. R users employ DRS to enjoy data-parallel computations, effortlessly and seamlessly. Similar to HDS, DRS can be horizontally scaled out. It guarantees the completion of computational jobs even with failures. In particular, it adaptively reallocates computational resources on the fly, minimizing job execution time and maximizing utilization of allocated resources. This paper discusses the design and implementation features for HDS and DRS. It also demonstrates their performance metrics.
AB - We present in this paper two novel infrastructural services based on Hadoop for big data storage and computing in a Taiwan's semiconductor wafer fabrication foundry. The two services include Hadoop data service (HDS) and distributed R language computing service (DRS), which have been built and operated in production systems for 3.5 years. They evolve over time by incrementally accommodating users' requirements. HDS is a web- based distributed big data storage facility. Users simply rely on HDS to access data objects stored in Hadoop with the HTTP protocol. In addition, HDS is scalable and reliable. Moreover, HDS is efficient and effective by intelligently selecting either Hadoop distributed file system (HDFS) or database (HBase) for publishing data objects. Specifically, HDS is transparent to existing analytics and data inquiry applications, such as Spark and Hive. While HDS is a unified storage for supporting sequential and random data accesses for big data in the wafer fabrication foundry, DRS is a distributed computing framework for typical R language users. R users employ DRS to enjoy data-parallel computations, effortlessly and seamlessly. Similar to HDS, DRS can be horizontally scaled out. It guarantees the completion of computational jobs even with failures. In particular, it adaptively reallocates computational resources on the fly, minimizing job execution time and maximizing utilization of allocated resources. This paper discusses the design and implementation features for HDS and DRS. It also demonstrates their performance metrics.
UR - http://www.scopus.com/inward/record.url?scp=85062624885&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062624885&partnerID=8YFLogxK
U2 - 10.1109/BigData.2018.8621954
DO - 10.1109/BigData.2018.8621954
M3 - Conference contribution
AN - SCOPUS:85062624885
T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
SP - 1865
EP - 1874
BT - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
A2 - Abe, Naoki
A2 - Liu, Huan
A2 - Pu, Calton
A2 - Hu, Xiaohua
A2 - Ahmed, Nesreen
A2 - Qiao, Mu
A2 - Song, Yang
A2 - Kossmann, Donald
A2 - Liu, Bing
A2 - Lee, Kisung
A2 - Tang, Jiliang
A2 - He, Jingrui
A2 - Saltz, Jeffrey
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 December 2018 through 13 December 2018
ER -