This study presents a session-based P2P botnet clustering system implemented on MapReduce for aggregating malicious hosts within NetFlow traffic logs. The proposed botnet detection system, designated as BotCluster, merges the unidirectional records of NetFlow into bidirectional sessions and then utilizes a 3-level grouping to cluster similar sessions into groups with a like behavior. Besides, BotCluster would eliminate unrelated sessions and keep the large irregular sessions using the similarity and regularity of Botnets in their communication nature. The clustered groups can be considered as malicious behavioral collections because only man-made malware would generate the large of the similar pattern in network traces. The performance of BotCluster is evaluated using real-world NetFlow traffic logs collected from two university campuses in Taiwan (i.e., NCKU and CCU). The datasets have sizes of 239GB and 137GB, respectively, and contain a total of approximately 2.4 billion flows and a total of approximately 18 million IP address. The precision of the BotCluster detection results is evaluated using the VirusTotal blacklist service. It is shown that BotCluster achieves a detection precision of 96.23% and 86.62% for the NCKU and CCU datasets, respectively. Finally, when applied to a combined dataset containing the NetFlow logs of both campuses, BotCluster achieves an average precision of 97.58%. In other words, given sufficient observation duration, BotCluster provides the ability to detect even stealthy and concealed bots with a high degree of reliability.
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications