-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithread Variable Input Size Fix #160
base: master
Are you sure you want to change the base?
Multithread Variable Input Size Fix #160
Conversation
Nice! |
i think we never save a model before forwarding it, it seems reasonable as a change. the lazy initialization wont work with the threading issue here. |
@soumith that happened at least once in the past with me torch/nn#712 (comment) , but I'd say it's a fair enough change :) |
I kept both the explicit initialization in the constructor and the lazy initialization in createIODescriptors. The explicit initialization is needed so that the threads share the same tensor - if there were only the lazy initialization, then each thread would create its own copy of iSize, but they still end up using the same descriptors, which leads to the problem of descriptors sometimes not being the right size. But, when a model is created with nn modules and converted using cudnn.convert, __init is not called so iSize is not initialized. I left the lazy initialization in so this does not crash, but in that case the original bug would still exist.. |
Removed lazy initialization of iSize based on discussion above and added some hacky code to convert. It seems to still work fine based on my tests on AlexNet. Please let me know if there is a better way, or if you would prefer not to merge this change. |
f58cf88
to
60d84f9
Compare
A month ago I have taken a stab at refactoring Convolution classes : a very outdated version is here https://github.com/borisfom/cudnn.torch/tree/R5_exp (FullConvolution changed fundamentally since then). However, it may give you some ideas. The most ROI on reuse I believe can be achieved extracting the part that uses cudnnFind functons: this can be generalized across most classes, not just convolutions. Also, the selection of algo/workspace can be further improved by using new CUDNN FindEx functions. |
I found some neat examples of taking advantage of Lua binding of C functions by name in the new RNN class contributed by NVidia folks here: https://github.com/borisfom/cudnn.torch/blob/R5/RNN.lua (createDescriptors etc). |
60d84f9
to
6c51cae
Compare
Add an option to exclude modules from conversion
Update cudnn.convert in README
nn.Module.replace in cudnn.convert
Merge lost R4 changes (cudnn.convert exclusion function)
Update version check for cudnn v5 in CMakeLists
disable rnn dropout
refactoring tests, phase 1
[fix] errcheck is undeclared
working double precision
It seems like Lua 5.3 doesn't like it when you put floats into long tensors. Simply taking the floor explicitly (which is what Lua 5.2 does implicitly) seems to work.
Lua 5.3 compatibility
Prevent BatchNorm from backward in evaluate mode
Add cudnn.externalizeString
fix output params for cudnnGetFilterNdDescriptor
resetStates() also reset grad{Hidden,Cell}Output
Volumetric softmax and cross entropy criterion
Use the same instance of iSize to fix #155
After fixing this for SpatialConvolution, I went ahead and made the the same changes to other modules. Are there any tests that I can run to make sure this did not break anything?