layout | title | permalink | ordinal |
---|---|---|---|
page |
Stacked GANs |
/stackedgans/ |
5 |
Stacked GANs are top-down stack of GANs, each trained to generate “plausible” lower-level representations conditioned on higher-level representations. Prior to this there was quite success in bottom-up approach of discrimination by CNNs which is learning useful representations from the data, whereas learning top-down generative models will help to explain the data distribution and there was low success for data with large variations with state-of-the-art DNNs was still bad.
A bottom up DNN pre-trained for classification is referred to as the encoder
Our goal is to train a top-down generator
We first train each GAN independently and then train them jointly in an end-to-end manner.
During training we use
Huang, Xun, et al. describe this as:
Intuitively, the total variations of images could be decomposed into multiple levels, with higher-level semantic variations (e.g., attributes, object categories, rough shapes) and lower-level variations (e.g., detailed contours and textures, background clutters). This allows using different noise variables to represent different levels of variations.
The training procedure is shown in the Figure. Each generator is trained with a linear combination of three loss terms: adversarial loss, conditional loss, and entropy loss with different parametric weights.
$${\cal L}{G_i}=\lambda {\cal L}{G_i}^{adv}+{\cal L}{G_i}^{cond}+{\cal L}{G_i}^{ent}$$
For each generator
$$\displaystyle {\cal L}{D_i}={\mathbb E}{h_i\sim P_{data, E}}[-\log(D_i(h_i))] + {\mathbb E_{z_i\sim P_{z_i}, h_{i+1}\sim P_{data, E}}}[-\log(1-D_i(G_I(h_{i+1}, z)))]$$
And is trained to fool the representation discriminator with the adversarial loss defined by:
$$\displaystyle {\cal L}{G_i}^{adv}={\mathbb E}{h_{i+1}\sim P_{data, E}, z_i\sim P_{z_i}}[-\log(D_i(G_i(h_{i+1}, z)))]$$
To sample images, all $G_i$s are stacked together in a top-down manner, as shown in the Figure. We have the data distribution conditioned on the class label:
where each
To prevent
We would also like to have it not be completely deterministic and sufficiently diverse so we introduce the entropy loss $${\cal L}{G_i}^{ent}$$. We would want the conditional entropy $H(\hat h_i\mid h{i+1})$ to be as high as possible which turns out to be intractable, hence we maximize a variational lower bound on the conditional entropy by using an auxiliary distribution
$${\cal L}^{ent}{G_i}={\mathbb E}{z_i\sim p_{z_i}}[{\mathbb E}_{\hat h_i\sim G_i(\hat h_i\mid z_i)}[-\log Q_i(z_i\mid \hat h_i)]]$$ .
It can be proved that minimizing this is equivalent to maximizing a variational lower bound for