The paper proposes an unsupervised convolutional neural network (UCNN) to solve clustering and representation learning jointly in an iterative manner. The key idea behind the proposed method is that learning better feature representations of images leads to more accurate image clustering results, whereas better image clustering can benefit the feature learning with the proposed UCNN. In the proposed method, given an input image set, we first randomly pick k samples and extract their features as the initial centroids of image clusters using the proposed UCNN with an initial representation model pre-trained from the ImageNet dataset. Mini-batch k-means is then performed to assign cluster labels to individual input samples for a mini-batch of images randomly sampled from the input image set until all images are processed. Subsequently, UCNN simultaneously updates the parameters of UCNN and the centroids of image clusters iteratively based on stochastic gradient descent. Experimental results demonstrate the proposed method outperforms start-of-the-art clustering schemes in terms of accuracy and memory complexity on large-scale image sets containing millions of images.