In Neural networks : the official journal of the International Neural Network Society
Generative adversarial networks have been extensively studied in recent years and powered a wide range of applications, ranging from image generation, image-to-image translation, to text-to-image generation, and visual recognition. These methods typically model the mapping from latent space to image with single or multiple generators. However, they have obvious drawbacks: (i) ignoring the multi-modal structure of images, and (ii) lacking model interpretability. Importantly, the existing methods mostly assume one or more generators can cover all image modes even if we do not know the structure of data. Thus, mode dropping and collapse often take place along with GANs training. Despite the importance of exploring the data structure in generation, it has been almost unexplored. In this work, aiming at generating multi-modal images and interpreting model explicitly, we explore the theory on how to integrate GANs with data structure prior, and propose latent Dirichlet allocation based generative adversarial networks (LDAGAN). This framework is extended to combine with a variety of state-of-the-art single-generator GANs and achieves improved performance. Extensive experiments on synthetic and real datasets demonstrate the efficacy of LDAGAN for multi-modal image generation. An implementation of LDAGAN is available at https://github.com/Sumching/LDAGAN.
Pan Lili, Cheng Shen, Liu Jian, Tang Peijun, Wang Bowen, Ren Yazhou, Xu Zenglin
Generative adversarial networks (GANs), Latent Dirichlet allocation (LDA), Model interpretability, Multi-modal structure prior