Generated samples look bad, but training seems ok #34

grisaitis · 2018-02-17T22:57:20Z

I'm getting crappy-looking samples from networks that seem to train properly. Loss converges nicely (see last screenshot), but samples don't compare with @hardmaru's results.

Here are some examples... Any ideas for what could be causing this?

This is after training for 10920 iterations (30 * 364) with python train.py --rnn_size 400 --num_layers 3. Default args produce similar results.

Training and val loss look fine:

Has anyone been able to train and produce great results with recent tensorflow? I wonder if some defaults have changed, or interface changes are resulting in some bad set up for the loss function.

I think the loss function is the issue because loss values "look good" but actual results look bad. I think the loss function is optimizing for the wrong thing, basically.

Any ideas appreciated. My next steps are to review how the loss is defined and maybe compare it with other implementations (https://greydanus.github.io/2016/08/21/handwriting/, https://github.com/snowkylin/rnn-handwriting-generation)

The text was updated successfully, but these errors were encountered:

grisaitis · 2018-02-27T19:07:41Z

I believe 681cee9 introduced an issue. That code produces models with these weird outputs, but the previous commit does not. I tried this with two tensorflow versions: 1.0.0 and 1.3.0, which was the latest version as of November 2017.

I'll look into this a bit more... Some arguments changed for legacy_seq2seq.rnn_decoder and MultiRNNCell.

grisaitis · 2018-02-27T19:15:25Z

I should add that I modified 681cee9 slightly so that MultiRNNCell uses args.num_layers and not args.rnn_size, but that shouldn't be an issue.

zxMaxpainxz · 2018-05-11T06:56:11Z

Very helpful comment! I'm also facing the same problem. But, whenever I change legacy_seq2seq.rnn_decoder and MultiRNNCell argument, then value error occurred. Does anyone solve this problem?

bbrito · 2018-08-19T13:39:08Z

Any news on this issue? Did someone manage to solve it?

partha2409 · 2018-10-17T10:46:16Z

Hi.I am using tensorflow 1.4 and i am able to reproduce this issue.Did anyone of you manage to fix it?TIA

bbrito · 2018-10-17T12:03:48Z

I tried the other repos that @grisaitis suggested. You may get better results with it.

partha2409 · 2018-10-19T04:18:47Z

Hi.Found the bug.

In line 58 of model.py, the cell always takes zero_state as initial state.This is ok while training but not when generating samples.

While training, we pass 50 batches of 300 timesteps. Inside the tf.contrib.legacy_seq2seq.rnn_decoder the function automatically updates the state input of a timestep by state output of preceeding timestep. For the next minibatch, the states are reinitialised to zero before starting

While generating sample, we generate one point at a time.This means that whenever we run the session, tf.contrib.legacy_seq2seq.rnn_decoder is called for every timestep separately.Now,Note the bug in line 58,everytime we call the rnn_decoder we set the state as zero_state. So each timestep of the generated sample is generated without the knowledge of previous state out.

The fix is,everytime we run the sess.run to generate samples, we need to get the previous state and pass it in while generating the next sample instead of passing zero_state always. I was able to generate proper results with that fix :) Hope it helps

bbrito · 2018-11-09T12:16:12Z

@partha2409 did you tried that fix? As far as I understand in the sample function we equal the previous state to the next_state that is one of the outputs of the network. I do not believe this was the problem.

partha2409 · 2018-11-09T14:08:45Z

Hi i tried the fix and it worked. You are partially correct... in sample func in line 275 we assign the prev state to next state and in 235 we pass the new state value to the "state_in" variable...however note that in 58 the state_in variable is not at all used and always zero state is used..it should be similar to line 57 which is commented where the state_in variable is passed to the rnn_decoder.

bbrito · 2018-11-09T15:42:02Z

You are right! Well spotted! What is the performance that you got during training? Which parameters did you use?

falaktheoptimist · 2018-12-28T15:11:04Z

We're facing a similar issue. Could you explain the exact change that you made in the code @partha2409 ? And also, what results are you getting after the fix? I tried uncommenting the line above as you mentioned but that does not seem to work too.

aleksandersawicki · 2021-10-20T10:38:38Z

As mentioned earlier the problem accure in rnn_decoder

Instead of

        outputs, state_out = tf.contrib.legacy_seq2seq.rnn_decoder(
            inputs, zero_state, cell, loop_function=None, scope='rnnlm')

it could be

        self.state_in_as_tuple=(self.state_in[0,:,:],self.state_in[1,:,:])
        outputs, state_out = tf.contrib.legacy_seq2seq.rnn_decoder(
        inputs, self.state_in_as_tuple, cell, loop_function=None, scope='rnnlm')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated samples look bad, but training seems ok #34

Generated samples look bad, but training seems ok #34

grisaitis commented Feb 17, 2018

grisaitis commented Feb 27, 2018

grisaitis commented Feb 27, 2018

zxMaxpainxz commented May 11, 2018

bbrito commented Aug 19, 2018

partha2409 commented Oct 17, 2018

bbrito commented Oct 17, 2018

partha2409 commented Oct 19, 2018

bbrito commented Nov 9, 2018

partha2409 commented Nov 9, 2018

bbrito commented Nov 9, 2018

falaktheoptimist commented Dec 28, 2018

aleksandersawicki commented Oct 20, 2021

Generated samples look bad, but training seems ok #34

Generated samples look bad, but training seems ok #34

Comments

grisaitis commented Feb 17, 2018

grisaitis commented Feb 27, 2018

grisaitis commented Feb 27, 2018

zxMaxpainxz commented May 11, 2018

bbrito commented Aug 19, 2018

partha2409 commented Oct 17, 2018

bbrito commented Oct 17, 2018

partha2409 commented Oct 19, 2018

bbrito commented Nov 9, 2018

partha2409 commented Nov 9, 2018

bbrito commented Nov 9, 2018

falaktheoptimist commented Dec 28, 2018

aleksandersawicki commented Oct 20, 2021