The Case of a Novel Operational Distributed Storage Service for Big Data in a Semiconductor Wafer Fabrication Foundry

Andy R.K. Chang, Yu Ling Chen, Yen Zhou Huang, Hung-Chang Hsiao, Michael Hsu, Chia Chee Lee, Hsin Yin Lee, Wei An Shih, Huan Ping Su, Chia Ping Tsai, Kuan Po Tseng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present in this paper a novel infrastructural service based on Hadoop for big data storage and computing in a Taiwan's semiconductor wafer fabrication foundry. The service is named Hadoop data service (HDS), which has been built and operated in production systems for 3.5 years. It evolves over time by incrementally accommodating users' requirements. HDS is a web-based distributed big data storage facility. Users simply rely on HDS to access data objects stored in Hadoop with the HTTP protocol. In addition, HDS is scalable and reliable. Moreover, HDS is efficient and effective by intelligently selecting either Hadoop distributed file system (HDFS) or database (HBase) for publishing data objects. Specifically, HDS is transparent to existing analytics and data inquiry applications, such as Spark and Hive. This paper discusses the design and implementation features for HDS. The performance metrics of HDS are also demonstrated.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE 24th International Conference on Parallel and Distributed Systems, ICPADS 2018
PublisherIEEE Computer Society
Pages1028-1033
Number of pages6
ISBN (Electronic)9781538673089
DOIs
Publication statusPublished - 2019 Feb 19
Event24th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2018 - Singapore, Singapore
Duration: 2018 Dec 112018 Dec 13

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Volume2018-December
ISSN (Print)1521-9097

Conference

Conference24th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2018
CountrySingapore
CitySingapore
Period18-12-1118-12-13

Fingerprint

Foundries
Semiconductor materials
Fabrication
HTTP
Electric sparks
Network protocols
Big data

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Cite this

Chang, A. R. K., Chen, Y. L., Huang, Y. Z., Hsiao, H-C., Hsu, M., Lee, C. C., ... Tseng, K. P. (2019). The Case of a Novel Operational Distributed Storage Service for Big Data in a Semiconductor Wafer Fabrication Foundry. In Proceedings - 2018 IEEE 24th International Conference on Parallel and Distributed Systems, ICPADS 2018 (pp. 1028-1033). [8644546] (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS; Vol. 2018-December). IEEE Computer Society. https://doi.org/10.1109/PADSW.2018.8644546
Chang, Andy R.K. ; Chen, Yu Ling ; Huang, Yen Zhou ; Hsiao, Hung-Chang ; Hsu, Michael ; Lee, Chia Chee ; Lee, Hsin Yin ; Shih, Wei An ; Su, Huan Ping ; Tsai, Chia Ping ; Tseng, Kuan Po. / The Case of a Novel Operational Distributed Storage Service for Big Data in a Semiconductor Wafer Fabrication Foundry. Proceedings - 2018 IEEE 24th International Conference on Parallel and Distributed Systems, ICPADS 2018. IEEE Computer Society, 2019. pp. 1028-1033 (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS).
@inproceedings{a834f3633ef3417ea0b7dea60fae1b12,
title = "The Case of a Novel Operational Distributed Storage Service for Big Data in a Semiconductor Wafer Fabrication Foundry",
abstract = "We present in this paper a novel infrastructural service based on Hadoop for big data storage and computing in a Taiwan's semiconductor wafer fabrication foundry. The service is named Hadoop data service (HDS), which has been built and operated in production systems for 3.5 years. It evolves over time by incrementally accommodating users' requirements. HDS is a web-based distributed big data storage facility. Users simply rely on HDS to access data objects stored in Hadoop with the HTTP protocol. In addition, HDS is scalable and reliable. Moreover, HDS is efficient and effective by intelligently selecting either Hadoop distributed file system (HDFS) or database (HBase) for publishing data objects. Specifically, HDS is transparent to existing analytics and data inquiry applications, such as Spark and Hive. This paper discusses the design and implementation features for HDS. The performance metrics of HDS are also demonstrated.",
author = "Chang, {Andy R.K.} and Chen, {Yu Ling} and Huang, {Yen Zhou} and Hung-Chang Hsiao and Michael Hsu and Lee, {Chia Chee} and Lee, {Hsin Yin} and Shih, {Wei An} and Su, {Huan Ping} and Tsai, {Chia Ping} and Tseng, {Kuan Po}",
year = "2019",
month = "2",
day = "19",
doi = "10.1109/PADSW.2018.8644546",
language = "English",
series = "Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS",
publisher = "IEEE Computer Society",
pages = "1028--1033",
booktitle = "Proceedings - 2018 IEEE 24th International Conference on Parallel and Distributed Systems, ICPADS 2018",
address = "United States",

}

Chang, ARK, Chen, YL, Huang, YZ, Hsiao, H-C, Hsu, M, Lee, CC, Lee, HY, Shih, WA, Su, HP, Tsai, CP & Tseng, KP 2019, The Case of a Novel Operational Distributed Storage Service for Big Data in a Semiconductor Wafer Fabrication Foundry. in Proceedings - 2018 IEEE 24th International Conference on Parallel and Distributed Systems, ICPADS 2018., 8644546, Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, vol. 2018-December, IEEE Computer Society, pp. 1028-1033, 24th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2018, Singapore, Singapore, 18-12-11. https://doi.org/10.1109/PADSW.2018.8644546

The Case of a Novel Operational Distributed Storage Service for Big Data in a Semiconductor Wafer Fabrication Foundry. / Chang, Andy R.K.; Chen, Yu Ling; Huang, Yen Zhou; Hsiao, Hung-Chang; Hsu, Michael; Lee, Chia Chee; Lee, Hsin Yin; Shih, Wei An; Su, Huan Ping; Tsai, Chia Ping; Tseng, Kuan Po.

Proceedings - 2018 IEEE 24th International Conference on Parallel and Distributed Systems, ICPADS 2018. IEEE Computer Society, 2019. p. 1028-1033 8644546 (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS; Vol. 2018-December).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - The Case of a Novel Operational Distributed Storage Service for Big Data in a Semiconductor Wafer Fabrication Foundry

AU - Chang, Andy R.K.

AU - Chen, Yu Ling

AU - Huang, Yen Zhou

AU - Hsiao, Hung-Chang

AU - Hsu, Michael

AU - Lee, Chia Chee

AU - Lee, Hsin Yin

AU - Shih, Wei An

AU - Su, Huan Ping

AU - Tsai, Chia Ping

AU - Tseng, Kuan Po

PY - 2019/2/19

Y1 - 2019/2/19

N2 - We present in this paper a novel infrastructural service based on Hadoop for big data storage and computing in a Taiwan's semiconductor wafer fabrication foundry. The service is named Hadoop data service (HDS), which has been built and operated in production systems for 3.5 years. It evolves over time by incrementally accommodating users' requirements. HDS is a web-based distributed big data storage facility. Users simply rely on HDS to access data objects stored in Hadoop with the HTTP protocol. In addition, HDS is scalable and reliable. Moreover, HDS is efficient and effective by intelligently selecting either Hadoop distributed file system (HDFS) or database (HBase) for publishing data objects. Specifically, HDS is transparent to existing analytics and data inquiry applications, such as Spark and Hive. This paper discusses the design and implementation features for HDS. The performance metrics of HDS are also demonstrated.

AB - We present in this paper a novel infrastructural service based on Hadoop for big data storage and computing in a Taiwan's semiconductor wafer fabrication foundry. The service is named Hadoop data service (HDS), which has been built and operated in production systems for 3.5 years. It evolves over time by incrementally accommodating users' requirements. HDS is a web-based distributed big data storage facility. Users simply rely on HDS to access data objects stored in Hadoop with the HTTP protocol. In addition, HDS is scalable and reliable. Moreover, HDS is efficient and effective by intelligently selecting either Hadoop distributed file system (HDFS) or database (HBase) for publishing data objects. Specifically, HDS is transparent to existing analytics and data inquiry applications, such as Spark and Hive. This paper discusses the design and implementation features for HDS. The performance metrics of HDS are also demonstrated.

UR - http://www.scopus.com/inward/record.url?scp=85063332079&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063332079&partnerID=8YFLogxK

U2 - 10.1109/PADSW.2018.8644546

DO - 10.1109/PADSW.2018.8644546

M3 - Conference contribution

AN - SCOPUS:85063332079

T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS

SP - 1028

EP - 1033

BT - Proceedings - 2018 IEEE 24th International Conference on Parallel and Distributed Systems, ICPADS 2018

PB - IEEE Computer Society

ER -

Chang ARK, Chen YL, Huang YZ, Hsiao H-C, Hsu M, Lee CC et al. The Case of a Novel Operational Distributed Storage Service for Big Data in a Semiconductor Wafer Fabrication Foundry. In Proceedings - 2018 IEEE 24th International Conference on Parallel and Distributed Systems, ICPADS 2018. IEEE Computer Society. 2019. p. 1028-1033. 8644546. (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS). https://doi.org/10.1109/PADSW.2018.8644546