For the images of an objects with multi-viewpoints, the visual words in a visual phrase may be covered by the object and thus degrades the visual phrase extraction performance. This paper presents an approach to robust visual phrase extraction using graph mining for content-based image retrieval. In this study, the concurrent appearance of two visual words can be estimated over all of the category-related images in a database. The appearance frequencies of the visual words at each image are then used to construct a relation graph of visual words. Graph mining is utilized to mine the frequent dense subgraphs from the visual word relation graphs to extract the visual phrases. Experiments were conducted on the Caltech101 database and the experimental results show that the extracted visual phrases are robust to achieve a better retrieval performance than the pair-wise visual phrase approach.