Automated malware toolkits allow for easy generation of new malicious programs. These new executables carry similar malicious code and demonstrate similar malicious behavior on infected hosts. In order to speed up the efficiency of mal ware detection, discriminating a malware as known or a new species of malware has become a critical issue in the security industry. In this paper, we propose a new approach to precisely classify malicious executables by employing information retrieval theory. Dynamic analysis of a sample's sequence of Windows API function calls produces corresponding parameters and values which is used as input to a standard TF-IDF weighting scheme to identify malware families by their behavior characteristics. Irrelevance reduction is developed to filter out non-relevant features and improve accuracy of malware classification. Finally, a similarity measure is used to determine the most similar malware family to the tested samples.