TY - JOUR
T1 - BotCluster
T2 - A session-based P2P botnet clustering system on NetFlow
AU - Wang, Chun Yu
AU - Ou, Chi Lung
AU - Zhang, Yu En
AU - Cho, Feng Min
AU - Chen, Pin Hao
AU - Chang, Jyh Biau
AU - Shieh, Ce Kuen
N1 - Funding Information:
The authors are grateful to the Ministry of Science and Technology, Taiwan for the financial support (This research was funded by contract MOST-103-22-E-006-144-MY3 ), National Center for High-Performance Computing, Taiwan for providing NetFlow log and VirusTotal for contributing the malicious IP checking.
Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2018/11/9
Y1 - 2018/11/9
N2 - This study presents a session-based P2P botnet clustering system implemented on MapReduce for aggregating malicious hosts within NetFlow traffic logs. The proposed botnet detection system, designated as BotCluster, merges the unidirectional records of NetFlow into bidirectional sessions and then utilizes a 3-level grouping to cluster similar sessions into groups with a like behavior. Besides, BotCluster would eliminate unrelated sessions and keep the large irregular sessions using the similarity and regularity of Botnets in their communication nature. The clustered groups can be considered as malicious behavioral collections because only man-made malware would generate the large of the similar pattern in network traces. The performance of BotCluster is evaluated using real-world NetFlow traffic logs collected from two university campuses in Taiwan (i.e., NCKU and CCU). The datasets have sizes of 239GB and 137GB, respectively, and contain a total of approximately 2.4 billion flows and a total of approximately 18 million IP address. The precision of the BotCluster detection results is evaluated using the VirusTotal blacklist service. It is shown that BotCluster achieves a detection precision of 96.23% and 86.62% for the NCKU and CCU datasets, respectively. Finally, when applied to a combined dataset containing the NetFlow logs of both campuses, BotCluster achieves an average precision of 97.58%. In other words, given sufficient observation duration, BotCluster provides the ability to detect even stealthy and concealed bots with a high degree of reliability.
AB - This study presents a session-based P2P botnet clustering system implemented on MapReduce for aggregating malicious hosts within NetFlow traffic logs. The proposed botnet detection system, designated as BotCluster, merges the unidirectional records of NetFlow into bidirectional sessions and then utilizes a 3-level grouping to cluster similar sessions into groups with a like behavior. Besides, BotCluster would eliminate unrelated sessions and keep the large irregular sessions using the similarity and regularity of Botnets in their communication nature. The clustered groups can be considered as malicious behavioral collections because only man-made malware would generate the large of the similar pattern in network traces. The performance of BotCluster is evaluated using real-world NetFlow traffic logs collected from two university campuses in Taiwan (i.e., NCKU and CCU). The datasets have sizes of 239GB and 137GB, respectively, and contain a total of approximately 2.4 billion flows and a total of approximately 18 million IP address. The precision of the BotCluster detection results is evaluated using the VirusTotal blacklist service. It is shown that BotCluster achieves a detection precision of 96.23% and 86.62% for the NCKU and CCU datasets, respectively. Finally, when applied to a combined dataset containing the NetFlow logs of both campuses, BotCluster achieves an average precision of 97.58%. In other words, given sufficient observation duration, BotCluster provides the ability to detect even stealthy and concealed bots with a high degree of reliability.
UR - http://www.scopus.com/inward/record.url?scp=85053308182&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053308182&partnerID=8YFLogxK
U2 - 10.1016/j.comnet.2018.08.014
DO - 10.1016/j.comnet.2018.08.014
M3 - Article
AN - SCOPUS:85053308182
SN - 1389-1286
VL - 145
SP - 175
EP - 189
JO - Computer Networks
JF - Computer Networks
ER -