TY - JOUR
T1 - A topology-based scaling mechanism for Apache Storm
AU - Shieh, Ce Kuen
AU - Huang, Sheng Wei
AU - Sun, Li Da
AU - Tsai, Ming Fong
AU - Chilamkurti, Naveen
N1 - Funding Information:
This work is supported by the Ministry of Science and Technology of ROC under the grant nos. MOST 104-2221-E-006 -067-MY2 and MOST104-2221-E-035-021. The authors wish to acknowledge the help of Mr. Chui-Ming Chiu for the advice on implementation and two anonymous reviewers as well as the editor for their comments.
Publisher Copyright:
Copyright © 2016 John Wiley & Sons, Ltd.
PY - 2017/5/1
Y1 - 2017/5/1
N2 - As more and more well-known companies, such as Twitter, Yahoo, and Alibaba, start to focus on real-time big data applications, how to build a platform for processing real-time data becomes an important issue. Among all the real-time processing systems, Apache Storm is the most well-known and representative open-source, distributed, real-time computation system. In Storm, the computation is implemented by a topology such as a graph where nodes are operators and edges represent the data flows between operators. In big data processing and analysis systems, scalability is an important issue. Storm provides rebalance mechanism for its scalability property, which can adjust the parallelism of a running topology. However, there are some drawbacks in rebalance command, such as resource usage restriction and topology execution suspension. In this paper, we propose a topology-based scaling mechanism for Apache Storm. When a topology is overloaded, it scales by adjusting the number of the cloned topologies or replaced by another new topology with more tasks. When scaling by topology-based mechanism, it eliminates resource usage restriction and execution suspension in the topology, and the procedure is automatically launched. The experimental results show that our topology-based scaling mechanism can improve the scaling performance of Storm.
AB - As more and more well-known companies, such as Twitter, Yahoo, and Alibaba, start to focus on real-time big data applications, how to build a platform for processing real-time data becomes an important issue. Among all the real-time processing systems, Apache Storm is the most well-known and representative open-source, distributed, real-time computation system. In Storm, the computation is implemented by a topology such as a graph where nodes are operators and edges represent the data flows between operators. In big data processing and analysis systems, scalability is an important issue. Storm provides rebalance mechanism for its scalability property, which can adjust the parallelism of a running topology. However, there are some drawbacks in rebalance command, such as resource usage restriction and topology execution suspension. In this paper, we propose a topology-based scaling mechanism for Apache Storm. When a topology is overloaded, it scales by adjusting the number of the cloned topologies or replaced by another new topology with more tasks. When scaling by topology-based mechanism, it eliminates resource usage restriction and execution suspension in the topology, and the procedure is automatically launched. The experimental results show that our topology-based scaling mechanism can improve the scaling performance of Storm.
UR - http://www.scopus.com/inward/record.url?scp=84977564148&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84977564148&partnerID=8YFLogxK
U2 - 10.1002/nem.1933
DO - 10.1002/nem.1933
M3 - Article
AN - SCOPUS:84977564148
SN - 1055-7148
VL - 27
JO - International Journal of Network Management
JF - International Journal of Network Management
IS - 3
M1 - e1933
ER -