TY - JOUR
T1 - Tackling Evolving Botnet Threats
T2 - A Gradual Self-Training Neural Network Approach
AU - Lo, Ta Chun
AU - Chang, Jyh Biau
AU - Lo, Shao Hsuan
AU - Kao, Bai Jun
AU - Shieh, Ce Kuen
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Botnets pose a significant challenge to network security but are difficult to detect because of their dynamic and evolving nature, which limits the effectiveness of conventional supervised neural network detection methods. To address this problem, the present study proposes a novel neural network-based self-training framework for botnet detection, in which pseudo-labels are generated from unlabeled data by a trained classifier, which is iteratively refined over time using a combined dataset containing both training and pseudo-labeled data. Although not all of the generated pseudo-labels are applicable to every botnet, the self-training framework can label unseen botnets with behaviors similar to those of known botnets with high confidence. Several strategies are proposed for enhancing the robustness of the classification performance by minimizing the number of incorrect pseudo-labels, mitigating the effects of erroneous pseudo-labels on the overall performance of the network, and optimizing the proportion of unlabeled data for labeling. Experiments conducted on both synthetic datasets confirm the superiority of the proposed method over the base model, particularly when the training data constitutes only a small portion of the total amount dataset. Subsequent experiments also demonstrate the efficacy of the framework in successfully detecting unseen botnet variants and its commendable performance in real-world campus network traffic.
AB - Botnets pose a significant challenge to network security but are difficult to detect because of their dynamic and evolving nature, which limits the effectiveness of conventional supervised neural network detection methods. To address this problem, the present study proposes a novel neural network-based self-training framework for botnet detection, in which pseudo-labels are generated from unlabeled data by a trained classifier, which is iteratively refined over time using a combined dataset containing both training and pseudo-labeled data. Although not all of the generated pseudo-labels are applicable to every botnet, the self-training framework can label unseen botnets with behaviors similar to those of known botnets with high confidence. Several strategies are proposed for enhancing the robustness of the classification performance by minimizing the number of incorrect pseudo-labels, mitigating the effects of erroneous pseudo-labels on the overall performance of the network, and optimizing the proportion of unlabeled data for labeling. Experiments conducted on both synthetic datasets confirm the superiority of the proposed method over the base model, particularly when the training data constitutes only a small portion of the total amount dataset. Subsequent experiments also demonstrate the efficacy of the framework in successfully detecting unseen botnet variants and its commendable performance in real-world campus network traffic.
UR - http://www.scopus.com/inward/record.url?scp=85193510542&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85193510542&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3401853
DO - 10.1109/ACCESS.2024.3401853
M3 - Article
AN - SCOPUS:85193510542
SN - 2169-3536
VL - 12
SP - 69397
EP - 69409
JO - IEEE Access
JF - IEEE Access
ER -