TY - GEN
T1 - Syncgan
T2 - 2018 IEEE International Conference on Multimedia and Expo, ICME 2018
AU - Chen, Wen Cheng
AU - Chen, Chien Wen
AU - Hu, Min Chun
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/8
Y1 - 2018/10/8
N2 - Generative adversarial network (GAN) has achieved impressive success on cross-domain generation, but it faces difficulty in cross-modal generation due to the lack of a common distribution between heterogeneous data. Most existing methods of conditional based cross-modal GANs adopt the strategy of one-directional transfer and have achieved preliminary success on text-to-image transfer. Instead of learning the transfer between different modalities, we aim to learn a synchronous latent space representing the cross-modal common concept. A novel network component named synchronizer is proposed in this work to judge whether the paired data is synchronous/corresponding or not, which can constrain the latent space of generators in the GANs. Our GAN model, named as SyncGAN, can successfully generate synchronous data (e.g., a pair of image and sound) from identical random noise. For transforming data from one modality to another, we recover the latent code by inverting the mappings of a generator and use it to generate data of different modality. In addition, the proposed model can achieve semi-supervised learning, which makes our model more flexible for practical applications.
AB - Generative adversarial network (GAN) has achieved impressive success on cross-domain generation, but it faces difficulty in cross-modal generation due to the lack of a common distribution between heterogeneous data. Most existing methods of conditional based cross-modal GANs adopt the strategy of one-directional transfer and have achieved preliminary success on text-to-image transfer. Instead of learning the transfer between different modalities, we aim to learn a synchronous latent space representing the cross-modal common concept. A novel network component named synchronizer is proposed in this work to judge whether the paired data is synchronous/corresponding or not, which can constrain the latent space of generators in the GANs. Our GAN model, named as SyncGAN, can successfully generate synchronous data (e.g., a pair of image and sound) from identical random noise. For transforming data from one modality to another, we recover the latent code by inverting the mappings of a generator and use it to generate data of different modality. In addition, the proposed model can achieve semi-supervised learning, which makes our model more flexible for practical applications.
UR - http://www.scopus.com/inward/record.url?scp=85061452974&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061452974&partnerID=8YFLogxK
U2 - 10.1109/ICME.2018.8486594
DO - 10.1109/ICME.2018.8486594
M3 - Conference contribution
AN - SCOPUS:85061452974
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2018 IEEE International Conference on Multimedia and Expo, ICME 2018
PB - IEEE Computer Society
Y2 - 23 July 2018 through 27 July 2018
ER -