With the rapid growth of generative adversarial networks (GANs), a photo-realistic image can be easily generated from a low-dimensional random vector nowadays. However, the generated image can be used to synthesize several persons who may have a potential effect on society with radical contents. Considering that many techniques to produce a photo-realistic facial image based on different GANs are already available, collecting training images of all possible generative models is difficult; hence, the learning-based approach would not effectively detect a fake image generated using an excluded generative model. To overcome this shortcoming, we propose a two-step pairwise learning approach to learn common fake features over the training images generated by using different generative models. First, the triplet loss will be used to simulate the relation between fake and real images and utilized to learn the discriminative features to determine whether an image is real or fake. Then, we propose a novel coupled network to accurately capture local and global image features of the fake or real images. The experimental results demonstrate that the proposed method outperforms the baseline supervised learning methods for fake facial image detection.