TY - GEN
T1 - An information retrieval approach for malware classification based on Windows API calls
AU - Cheng, Julia Yu Chin
AU - Tsai, Tzung Shian
AU - Yang, Chu Sing
PY - 2013
Y1 - 2013
N2 - Automated malware toolkits allow for easy generation of new malicious programs. These new executables carry similar malicious code and demonstrate similar malicious behavior on infected hosts. In order to speed up the efficiency of mal ware detection, discriminating a malware as known or a new species of malware has become a critical issue in the security industry. In this paper, we propose a new approach to precisely classify malicious executables by employing information retrieval theory. Dynamic analysis of a sample's sequence of Windows API function calls produces corresponding parameters and values which is used as input to a standard TF-IDF weighting scheme to identify malware families by their behavior characteristics. Irrelevance reduction is developed to filter out non-relevant features and improve accuracy of malware classification. Finally, a similarity measure is used to determine the most similar malware family to the tested samples.
AB - Automated malware toolkits allow for easy generation of new malicious programs. These new executables carry similar malicious code and demonstrate similar malicious behavior on infected hosts. In order to speed up the efficiency of mal ware detection, discriminating a malware as known or a new species of malware has become a critical issue in the security industry. In this paper, we propose a new approach to precisely classify malicious executables by employing information retrieval theory. Dynamic analysis of a sample's sequence of Windows API function calls produces corresponding parameters and values which is used as input to a standard TF-IDF weighting scheme to identify malware families by their behavior characteristics. Irrelevance reduction is developed to filter out non-relevant features and improve accuracy of malware classification. Finally, a similarity measure is used to determine the most similar malware family to the tested samples.
UR - http://www.scopus.com/inward/record.url?scp=84907275724&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84907275724&partnerID=8YFLogxK
U2 - 10.1109/ICMLC.2013.6890868
DO - 10.1109/ICMLC.2013.6890868
M3 - Conference contribution
AN - SCOPUS:84907275724
T3 - Proceedings - International Conference on Machine Learning and Cybernetics
SP - 1678
EP - 1683
BT - Proceedings - International Conference on Machine Learning and Cybernetics
PB - IEEE Computer Society
T2 - 12th International Conference on Machine Learning and Cybernetics, ICMLC 2013
Y2 - 14 July 2013 through 17 July 2013
ER -