reproducing published results #8

drtonyr · 2020-05-18T09:16:48Z

Thank you very much for such an interesting paper and for making your code available.

It was a pleasure to be able to run the code so easily on CelebA.

I've run the code twice (once with seed 1 on a GTX TITAN X, then with seed 2 on a RTX 2060) and I attach the results (fake_samples_iter520000.png). By eye, these are far from the quality of images/CelebA_snapshot.png, so my issue is: should I expect the code to reproduce the results in the paper? Maybe the config is cut down so that anyone can run it, in which case can you supply the config you used? Or maybe I have a problem in how I've run it (partial logs appended).

Thanks,

Tony

2020-05-14 15:34:01 Namespace(CIFAR10=False, D_h_size=32, D_updates=1, G_h_size=
32, G_updates=1, adam_eps=1e-08, batch_size=32, beta1=0.5, beta2=0.999, cuda=Tru
e, decay=0, effective_batch_size=32, extra_folder='./OUTPUT/RealnessGAN-CelebA25
6-1_Extra', gen_every=10000, gen_extra_images=5000, image_size=256, input_folder
='/tmp/CelebA', load_ckpt=None, lr_D=0.0002, lr_G=0.0002, n_channels=3, n_gpu=1,
negative_skew=-1.0, num_outcomes=51, num_workers=8, output_folder='./OUTPUT/Rea
lnessGAN-CelebA256-1', positive_skew=1.0, print_every=1000, relativisticG=True,
seed=1, total_iters=520000, use_adaptive_reparam=True, weight_decay=0, z_size=12
8)
***** start training iter 519999 *******
2020-05-17 04:29:24 [520000 / 520000] SD: 1 Diff: -5.2646 loss_D: 5.1961 loss_G: -0.0685 time:219322.53
2020-05-17 04:29:51 Model saved.
2020-05-17 04:32:50 Finished generating extra samples at iteration 520000

kam1107 · 2020-05-19T02:56:26Z

Hi, thanks for reporting this issue. I checked the code and found there is a mistake I made when refactoring the project. Here's the fix:

RealnessGAN/train.py

Lines 185 to 191 in 287679f

    
           gauss = np.random.normal(0, 0.1, 1000) 
        
           count, bins = np.histogram(gauss, param.num_outcomes) 
        
           anchor0 = count / sum(count) 
        
           unif = np.random.uniform(-1, 1, 1000) 
        
           count, bins = np.histogram(unif, param.num_outcomes) 
        
           anchor1 = count / sum(count)

Now it should be able to achieve similar result with the paper. Besides, 520000 iters is more than enough and images tend to look dull at this stage (but FID seems to be fine with it) and we got descent results at around 300000 iters.

drtonyr · 2020-05-19T06:27:17Z

Thank you very much for your prompt reply and fix. I am running again now and will come back to you as soon as I have itteration 300000 complete.

drtonyr · 2020-05-20T15:27:32Z

I've now got results at iteration 300,000 which I'll atttach below. The dynamic range of the colours isn't my main concern, it's the non-physical distortions that make these images stand out as unreal (my son calls them halloween faces). Only a few of these faces approach the quality in the paper and examples (out of 32 generated) so it seems that there is still something substantial left to fix.

If it's only an issue with code refactoring, maybe I can help? I understand the need to release clean code, but your code runs fast enough on old GPUs and I'm prepared to run the code that works and then change it slowly towards the released code until we find the difference.

2020-05-20 11:36:50 [300000 / 520000] SD: 1 Diff: -5.0607 loss_D: 5.1545 loss_G: 0.0938 time:100899.85

kam1107 · 2020-05-21T03:33:52Z

I noticed that the result you show comes from a randomly sampled batch and indeed in this case not all of them look real. This issue is primarily caused by the entangled latent space as its granularity doesn't match with that of the image space and this is not addressed in DCGAN or our work. One possible solution is to switch to StyleGAN which implicitly learns to disentangle the latent space before fed into the network.

I re-trained the model with this refactored version and attached a batch of sampled result here, also a link to the corresponding model (for this one I switched off reparam layer):https://drive.google.com/file/d/1yTdZnHvttF43Cb5-P1-I0t-XUDDaZFD-/view?usp=sharing
One way check whether it reporduces the paper's result would be to calculate the resulting FID score. For this checkpoint I got 29.29 in FID.

kam1107 · 2020-05-21T03:55:22Z

I just realised that there might be some problem in your training data. Please refer to this link to prepare CelebA for training: https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training. It centres and aligns the samples, which is important for structured images like faces.

drtonyr · 2020-05-24T13:43:46Z

Well, I have to say that I'm impressed by your remote debuggnig skills. My immediate reaction was to think that I wouldn't be so stupid as not to use the standard eye-aligned database, but of course I was that stupid (it's a little more complicated, but not a lot). I wanted to write a script that prevents others from making such a silly mistake, the core of which is:
wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM' -O img_align_celeba.zip but Google caps the automated downloads so this bit has to be done manually. I'm now using all of img_align_celeba and I'm assuming that this is also what you are using (bar taking out some test set and I'll assume that this isn't important).

So, back to results. I now get images like the one below. I still wouldn't visually rate them the same as either your result in the paper or the ones you show above. To get closer I'll try "switched off reparam layer" - from a quick look that looks like setting "--use_adaptive_reparam False". I'll running a couple more seeds with this change later today.

I may still be confused, in your paper when you say "Figure 8: Images sampled from RealnessGAN" I had interpreted that as random samples from the generator, but your paragraph two above talks about the entangledness of the embedding space and that some distortion is expected (like in the examples in the same comment). If you have time I would like to understand how you sampled for the paper as I thought the way to sample from a GAN was to generate advesarial examples.

You are right, I do need to implement FID - I can't really compare my own work or those of others without it.

Thanks again for all your help.

Tony

krishnan-meep · 2020-07-20T06:09:50Z

I know this is a little late, but have you tried truncating the gaussian noise sampling? It's a trick they use in StyleGANs and BigGANs to get better quality samples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproducing published results #8

reproducing published results #8

drtonyr commented May 18, 2020

kam1107 commented May 19, 2020

drtonyr commented May 19, 2020

drtonyr commented May 20, 2020

kam1107 commented May 21, 2020

kam1107 commented May 21, 2020

drtonyr commented May 24, 2020

krishnan-meep commented Jul 20, 2020

reproducing published results #8

reproducing published results #8

Comments

drtonyr commented May 18, 2020

kam1107 commented May 19, 2020

drtonyr commented May 19, 2020

drtonyr commented May 20, 2020

kam1107 commented May 21, 2020

kam1107 commented May 21, 2020

drtonyr commented May 24, 2020

krishnan-meep commented Jul 20, 2020