Bridging the Gap between Big Data System Software Stack and Applications: The Case of Semiconductor Wafer Fabrication Foundries

Chia Ping Tsai, Hung-Chang Hsiao, Yu Chang Chao, Michael Hsu, Andy R.K. Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present in this paper two novel infrastructural services based on Hadoop for big data storage and computing in a Taiwan's semiconductor wafer fabrication foundry. The two services include Hadoop data service (HDS) and distributed R language computing service (DRS), which have been built and operated in production systems for 3.5 years. They evolve over time by incrementally accommodating users' requirements. HDS is a web- based distributed big data storage facility. Users simply rely on HDS to access data objects stored in Hadoop with the HTTP protocol. In addition, HDS is scalable and reliable. Moreover, HDS is efficient and effective by intelligently selecting either Hadoop distributed file system (HDFS) or database (HBase) for publishing data objects. Specifically, HDS is transparent to existing analytics and data inquiry applications, such as Spark and Hive. While HDS is a unified storage for supporting sequential and random data accesses for big data in the wafer fabrication foundry, DRS is a distributed computing framework for typical R language users. R users employ DRS to enjoy data-parallel computations, effortlessly and seamlessly. Similar to HDS, DRS can be horizontally scaled out. It guarantees the completion of computational jobs even with failures. In particular, it adaptively reallocates computational resources on the fly, minimizing job execution time and maximizing utilization of allocated resources. This paper discusses the design and implementation features for HDS and DRS. It also demonstrates their performance metrics.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsYang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1865-1874
Number of pages10
ISBN (Electronic)9781538650356
DOIs
Publication statusPublished - 2019 Jan 22
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: 2018 Dec 102018 Dec 13

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
CountryUnited States
CitySeattle
Period18-12-1018-12-13

Fingerprint

Foundries
Semiconductor materials
Fabrication
HTTP
Distributed computer systems
Electric sparks
Network protocols
Big data

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Information Systems

Cite this

Tsai, C. P., Hsiao, H-C., Chao, Y. C., Hsu, M., & Chang, A. R. K. (2019). Bridging the Gap between Big Data System Software Stack and Applications: The Case of Semiconductor Wafer Fabrication Foundries. In Y. Song, B. Liu, K. Lee, N. Abe, C. Pu, M. Qiao, N. Ahmed, D. Kossmann, J. Saltz, J. Tang, J. He, H. Liu, ... X. Hu (Eds.), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 (pp. 1865-1874). [8621954] (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2018.8621954
Tsai, Chia Ping ; Hsiao, Hung-Chang ; Chao, Yu Chang ; Hsu, Michael ; Chang, Andy R.K. / Bridging the Gap between Big Data System Software Stack and Applications : The Case of Semiconductor Wafer Fabrication Foundries. Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. editor / Yang Song ; Bing Liu ; Kisung Lee ; Naoki Abe ; Calton Pu ; Mu Qiao ; Nesreen Ahmed ; Donald Kossmann ; Jeffrey Saltz ; Jiliang Tang ; Jingrui He ; Huan Liu ; Xiaohua Hu. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 1865-1874 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).
@inproceedings{ee1e359d13b541938fdab9712595fafa,
title = "Bridging the Gap between Big Data System Software Stack and Applications: The Case of Semiconductor Wafer Fabrication Foundries",
abstract = "We present in this paper two novel infrastructural services based on Hadoop for big data storage and computing in a Taiwan's semiconductor wafer fabrication foundry. The two services include Hadoop data service (HDS) and distributed R language computing service (DRS), which have been built and operated in production systems for 3.5 years. They evolve over time by incrementally accommodating users' requirements. HDS is a web- based distributed big data storage facility. Users simply rely on HDS to access data objects stored in Hadoop with the HTTP protocol. In addition, HDS is scalable and reliable. Moreover, HDS is efficient and effective by intelligently selecting either Hadoop distributed file system (HDFS) or database (HBase) for publishing data objects. Specifically, HDS is transparent to existing analytics and data inquiry applications, such as Spark and Hive. While HDS is a unified storage for supporting sequential and random data accesses for big data in the wafer fabrication foundry, DRS is a distributed computing framework for typical R language users. R users employ DRS to enjoy data-parallel computations, effortlessly and seamlessly. Similar to HDS, DRS can be horizontally scaled out. It guarantees the completion of computational jobs even with failures. In particular, it adaptively reallocates computational resources on the fly, minimizing job execution time and maximizing utilization of allocated resources. This paper discusses the design and implementation features for HDS and DRS. It also demonstrates their performance metrics.",
author = "Tsai, {Chia Ping} and Hung-Chang Hsiao and Chao, {Yu Chang} and Michael Hsu and Chang, {Andy R.K.}",
year = "2019",
month = "1",
day = "22",
doi = "10.1109/BigData.2018.8621954",
language = "English",
series = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "1865--1874",
editor = "Yang Song and Bing Liu and Kisung Lee and Naoki Abe and Calton Pu and Mu Qiao and Nesreen Ahmed and Donald Kossmann and Jeffrey Saltz and Jiliang Tang and Jingrui He and Huan Liu and Xiaohua Hu",
booktitle = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",
address = "United States",

}

Tsai, CP, Hsiao, H-C, Chao, YC, Hsu, M & Chang, ARK 2019, Bridging the Gap between Big Data System Software Stack and Applications: The Case of Semiconductor Wafer Fabrication Foundries. in Y Song, B Liu, K Lee, N Abe, C Pu, M Qiao, N Ahmed, D Kossmann, J Saltz, J Tang, J He, H Liu & X Hu (eds), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018., 8621954, Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, Institute of Electrical and Electronics Engineers Inc., pp. 1865-1874, 2018 IEEE International Conference on Big Data, Big Data 2018, Seattle, United States, 18-12-10. https://doi.org/10.1109/BigData.2018.8621954

Bridging the Gap between Big Data System Software Stack and Applications : The Case of Semiconductor Wafer Fabrication Foundries. / Tsai, Chia Ping; Hsiao, Hung-Chang; Chao, Yu Chang; Hsu, Michael; Chang, Andy R.K.

Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. ed. / Yang Song; Bing Liu; Kisung Lee; Naoki Abe; Calton Pu; Mu Qiao; Nesreen Ahmed; Donald Kossmann; Jeffrey Saltz; Jiliang Tang; Jingrui He; Huan Liu; Xiaohua Hu. Institute of Electrical and Electronics Engineers Inc., 2019. p. 1865-1874 8621954 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Bridging the Gap between Big Data System Software Stack and Applications

T2 - The Case of Semiconductor Wafer Fabrication Foundries

AU - Tsai, Chia Ping

AU - Hsiao, Hung-Chang

AU - Chao, Yu Chang

AU - Hsu, Michael

AU - Chang, Andy R.K.

PY - 2019/1/22

Y1 - 2019/1/22

N2 - We present in this paper two novel infrastructural services based on Hadoop for big data storage and computing in a Taiwan's semiconductor wafer fabrication foundry. The two services include Hadoop data service (HDS) and distributed R language computing service (DRS), which have been built and operated in production systems for 3.5 years. They evolve over time by incrementally accommodating users' requirements. HDS is a web- based distributed big data storage facility. Users simply rely on HDS to access data objects stored in Hadoop with the HTTP protocol. In addition, HDS is scalable and reliable. Moreover, HDS is efficient and effective by intelligently selecting either Hadoop distributed file system (HDFS) or database (HBase) for publishing data objects. Specifically, HDS is transparent to existing analytics and data inquiry applications, such as Spark and Hive. While HDS is a unified storage for supporting sequential and random data accesses for big data in the wafer fabrication foundry, DRS is a distributed computing framework for typical R language users. R users employ DRS to enjoy data-parallel computations, effortlessly and seamlessly. Similar to HDS, DRS can be horizontally scaled out. It guarantees the completion of computational jobs even with failures. In particular, it adaptively reallocates computational resources on the fly, minimizing job execution time and maximizing utilization of allocated resources. This paper discusses the design and implementation features for HDS and DRS. It also demonstrates their performance metrics.

AB - We present in this paper two novel infrastructural services based on Hadoop for big data storage and computing in a Taiwan's semiconductor wafer fabrication foundry. The two services include Hadoop data service (HDS) and distributed R language computing service (DRS), which have been built and operated in production systems for 3.5 years. They evolve over time by incrementally accommodating users' requirements. HDS is a web- based distributed big data storage facility. Users simply rely on HDS to access data objects stored in Hadoop with the HTTP protocol. In addition, HDS is scalable and reliable. Moreover, HDS is efficient and effective by intelligently selecting either Hadoop distributed file system (HDFS) or database (HBase) for publishing data objects. Specifically, HDS is transparent to existing analytics and data inquiry applications, such as Spark and Hive. While HDS is a unified storage for supporting sequential and random data accesses for big data in the wafer fabrication foundry, DRS is a distributed computing framework for typical R language users. R users employ DRS to enjoy data-parallel computations, effortlessly and seamlessly. Similar to HDS, DRS can be horizontally scaled out. It guarantees the completion of computational jobs even with failures. In particular, it adaptively reallocates computational resources on the fly, minimizing job execution time and maximizing utilization of allocated resources. This paper discusses the design and implementation features for HDS and DRS. It also demonstrates their performance metrics.

UR - http://www.scopus.com/inward/record.url?scp=85062624885&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062624885&partnerID=8YFLogxK

U2 - 10.1109/BigData.2018.8621954

DO - 10.1109/BigData.2018.8621954

M3 - Conference contribution

AN - SCOPUS:85062624885

T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

SP - 1865

EP - 1874

BT - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

A2 - Song, Yang

A2 - Liu, Bing

A2 - Lee, Kisung

A2 - Abe, Naoki

A2 - Pu, Calton

A2 - Qiao, Mu

A2 - Ahmed, Nesreen

A2 - Kossmann, Donald

A2 - Saltz, Jeffrey

A2 - Tang, Jiliang

A2 - He, Jingrui

A2 - Liu, Huan

A2 - Hu, Xiaohua

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Tsai CP, Hsiao H-C, Chao YC, Hsu M, Chang ARK. Bridging the Gap between Big Data System Software Stack and Applications: The Case of Semiconductor Wafer Fabrication Foundries. In Song Y, Liu B, Lee K, Abe N, Pu C, Qiao M, Ahmed N, Kossmann D, Saltz J, Tang J, He J, Liu H, Hu X, editors, Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. Institute of Electrical and Electronics Engineers Inc. 2019. p. 1865-1874. 8621954. (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). https://doi.org/10.1109/BigData.2018.8621954