-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated samples look bad, but training seems ok #34
Comments
I believe 681cee9 introduced an issue. That code produces models with these weird outputs, but the previous commit does not. I tried this with two tensorflow versions: 1.0.0 and 1.3.0, which was the latest version as of November 2017. I'll look into this a bit more... Some arguments changed for |
I should add that I modified 681cee9 slightly so that |
Very helpful comment! I'm also facing the same problem. But, whenever I change legacy_seq2seq.rnn_decoder and MultiRNNCell argument, then value error occurred. Does anyone solve this problem? |
Any news on this issue? Did someone manage to solve it? |
Hi.I am using tensorflow 1.4 and i am able to reproduce this issue.Did anyone of you manage to fix it?TIA |
I tried the other repos that @grisaitis suggested. You may get better results with it. |
Hi.Found the bug. In line 58 of model.py, the cell always takes zero_state as initial state.This is ok while training but not when generating samples. While training, we pass 50 batches of 300 timesteps. Inside the tf.contrib.legacy_seq2seq.rnn_decoder the function automatically updates the state input of a timestep by state output of preceeding timestep. For the next minibatch, the states are reinitialised to zero before starting While generating sample, we generate one point at a time.This means that whenever we run the session, tf.contrib.legacy_seq2seq.rnn_decoder is called for every timestep separately.Now,Note the bug in line 58,everytime we call the rnn_decoder we set the state as zero_state. So each timestep of the generated sample is generated without the knowledge of previous state out. The fix is,everytime we run the sess.run to generate samples, we need to get the previous state and pass it in while generating the next sample instead of passing zero_state always. I was able to generate proper results with that fix :) Hope it helps |
@partha2409 did you tried that fix? As far as I understand in the sample function we equal the previous state to the next_state that is one of the outputs of the network. I do not believe this was the problem. |
Hi i tried the fix and it worked. You are partially correct... in sample func in line 275 we assign the prev state to next state and in 235 we pass the new state value to the "state_in" variable...however note that in 58 the state_in variable is not at all used and always zero state is used..it should be similar to line 57 which is commented where the state_in variable is passed to the rnn_decoder. |
You are right! Well spotted! What is the performance that you got during training? Which parameters did you use? |
We're facing a similar issue. Could you explain the exact change that you made in the code @partha2409 ? And also, what results are you getting after the fix? I tried uncommenting the line above as you mentioned but that does not seem to work too. |
As mentioned earlier the problem accure in rnn_decoder Instead of
it could be
|
I'm getting crappy-looking samples from networks that seem to train properly. Loss converges nicely (see last screenshot), but samples don't compare with @hardmaru's results.
Here are some examples... Any ideas for what could be causing this?
This is after training for 10920 iterations (30 * 364) with
python train.py --rnn_size 400 --num_layers 3
. Default args produce similar results.Training and val loss look fine:
Has anyone been able to train and produce great results with recent tensorflow? I wonder if some defaults have changed, or interface changes are resulting in some bad set up for the loss function.
I think the loss function is the issue because loss values "look good" but actual results look bad. I think the loss function is optimizing for the wrong thing, basically.
Any ideas appreciated. My next steps are to review how the loss is defined and maybe compare it with other implementations (https://greydanus.github.io/2016/08/21/handwriting/, https://github.com/snowkylin/rnn-handwriting-generation)
The text was updated successfully, but these errors were encountered: