-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difference between user and sequence representation #155
Comments
Ok, I'm starting to understand. I missed this pad that you do to your input. You mask the input so that the task is kind of like 'predicting the last input token'. However, I am not following why you still mask the input at the end and use the last input representation |
While predicting, for me it would make sense if you would use the whole input, non-padded, and the whole output afterwards, not only a portion of it related with the last input token. By that I mean, using the full |
Ah ok, I think I finally understood. Only I will leave this issue open to see if somebody can confirm this. |
This sounds about right: the representation at step |
It'd be cool if we could have a custom hidden layer with that behavior to add more non-linearities and transformations to the model |
What's the big advantage over training only one time step at a time? By that means, each |
Can anyone explain me why the user and sequence representation are calculated in this way?? seems like the last state of the LSTM is the sequence representation and the rest is the user representation. I'm not following how this works.
The text was updated successfully, but these errors were encountered: