TY - JOUR
T1 - A SPARQL query processing system using map-phase-multi join for big data in clouds
AU - Huang, Sheng Wei
AU - Yu, Chia Ho
AU - Shieh, Ce Kuen
AU - Tsai, Ming Fong
N1 - Funding Information:
This work is supported by the Ministry of Science and Technology of R.O.C. under the grant MOST 104-2221-E-006-067-MY2 and MOST 105-2221-E-035-065.
Publisher Copyright:
Copyright © 2017 Inderscience Enterprises Ltd.
PY - 2017
Y1 - 2017
N2 - Big data refers to large datasets which are huge, complex and hard to be stored and analysed by traditional data processing tools. Linked data is one of the approaches to deal with big data which are stored and processed in TripleStore. For huge dataset, TripleStore requires more scalable techniques. 'MapReduce' programming model is the most representative of cloud technology. There are several approaches using MapReduce to serve SPARQL query but still exhibit unacceptable performance in complex queries. In this paper, we propose a map-phase-multi-join algorithm for processing SPARQL queries. Using multi-join, job initialisation time is reduced by avoiding iterative of MapReduce jobs. Furthermore, map-phase join can save bandwidth by preventing join-less data to be transferred among computing nodes. We also design a storage schema and a join-order rule which enhance the performance of our system. The evaluation results show that our system outperforms traditional join approaches in most queries.
AB - Big data refers to large datasets which are huge, complex and hard to be stored and analysed by traditional data processing tools. Linked data is one of the approaches to deal with big data which are stored and processed in TripleStore. For huge dataset, TripleStore requires more scalable techniques. 'MapReduce' programming model is the most representative of cloud technology. There are several approaches using MapReduce to serve SPARQL query but still exhibit unacceptable performance in complex queries. In this paper, we propose a map-phase-multi-join algorithm for processing SPARQL queries. Using multi-join, job initialisation time is reduced by avoiding iterative of MapReduce jobs. Furthermore, map-phase join can save bandwidth by preventing join-less data to be transferred among computing nodes. We also design a storage schema and a join-order rule which enhance the performance of our system. The evaluation results show that our system outperforms traditional join approaches in most queries.
UR - http://www.scopus.com/inward/record.url?scp=85032862270&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032862270&partnerID=8YFLogxK
U2 - 10.1504/IJIPT.2017.087555
DO - 10.1504/IJIPT.2017.087555
M3 - Article
AN - SCOPUS:85032862270
SN - 1743-8209
VL - 10
SP - 177
EP - 188
JO - International Journal of Internet Protocol Technology
JF - International Journal of Internet Protocol Technology
IS - 3
ER -