Packet classification is a very important component for today's network architecture. It can help or provide packet forwarding and other network functions. With the development of Internet and the emergence of software-defined networking (SDN), the methods designed for the traditional 5-dimensional rule set is not sufficient to process the current rule set that contains rules of 12 or more dimensions. The main problem is to process the rule sets of 12 or more dimensions in high throughput. To achieve high throughput, we study the implementations on GPU where some use a single hash table and others use Binary Range Tree to process the searching. In 12-dimensional rule sets defined by OpenFlow 1.0, 8 fields are in the format of exact value or wildcard, and so using the single hash table or binary range tree is not efficient. Another problem to implement packet classification on GPU is that we must transfer the input data and results via the PCI-E bus that will incur long bus latency. In this paper, we propose a modified hash table to process the fields that contain only exact value or wildcard, and use the compressing method to reduce memory consumption. On the other hand, we implement the proposed method on APU that uses Heterogeneous System Architecture to skip the bus latency. According to the experimental results on AMD A10-7850 APU, our method can achieve the throughput of 1814 MPPS, and can support the rule sets of more than 12K 12-dimensional rules. The achieved throughput is 10 times of the methods on legacy GPU.