Multithread Variable Input Size Fix #160

jonathanasdf · 2016-04-06T15:38:16Z

Use the same instance of iSize to fix #155

After fixing this for SpatialConvolution, I went ahead and made the the same changes to other modules. Are there any tests that I can run to make sure this did not break anything?

fmassa · 2016-04-06T15:52:57Z

Nice!
I think that it could break saved models that haven't the iSize field (i.e., when you save a model before ever forwarding it).
Wouldn't it be better to simply lazily-initialize self.iSize = self.iSize or torch.LongTensor(4):zero() in createIODescriptors ?

soumith · 2016-04-06T15:57:25Z

i think we never save a model before forwarding it, it seems reasonable as a change. the lazy initialization wont work with the threading issue here.

fmassa · 2016-04-06T16:10:49Z

@soumith that happened at least once in the past with me torch/nn#712 (comment) , but I'd say it's a fair enough change :)

jonathanasdf · 2016-04-06T16:41:24Z

I kept both the explicit initialization in the constructor and the lazy initialization in createIODescriptors.

The explicit initialization is needed so that the threads share the same tensor - if there were only the lazy initialization, then each thread would create its own copy of iSize, but they still end up using the same descriptors, which leads to the problem of descriptors sometimes not being the right size.

But, when a model is created with nn modules and converted using cudnn.convert, __init is not called so iSize is not initialized. I left the lazy initialization in so this does not crash, but in that case the original bug would still exist..

jonathanasdf · 2016-04-06T17:33:32Z

Removed lazy initialization of iSize based on discussion above and added some hacky code to convert. It seems to still work fine based on my tests on AlexNet. Please let me know if there is a better way, or if you would prefer not to merge this change.

borisfom · 2016-04-06T21:54:44Z

A month ago I have taken a stab at refactoring Convolution classes : a very outdated version is here https://github.com/borisfom/cudnn.torch/tree/R5_exp (FullConvolution changed fundamentally since then). However, it may give you some ideas. The most ROI on reuse I believe can be achieved extracting the part that uses cudnnFind functons: this can be generalized across most classes, not just convolutions. Also, the selection of algo/workspace can be further improved by using new CUDNN FindEx functions.

borisfom · 2016-04-06T22:00:44Z

I found some neat examples of taking advantage of Lua binding of C functions by name in the new RNN class contributed by NVidia folks here: https://github.com/borisfom/cudnn.torch/blob/R5/RNN.lua (createDescriptors etc).

…n-multithread

Add an option to exclude modules from conversion

Update cudnn.convert in README

nn.Module.replace in cudnn.convert

Merge lost R4 changes (cudnn.convert exclusion function)

Update version check for cudnn v5 in CMakeLists

disable rnn dropout

refactoring tests, phase 1

[fix] errcheck is undeclared

working double precision

It seems like Lua 5.3 doesn't like it when you put floats into long tensors. Simply taking the floor explicitly (which is what Lua 5.2 does implicitly) seems to work.

Lua 5.3 compatibility

Prevent BatchNorm from backward in evaluate mode

Add cudnn.externalizeString

fix output params for cudnnGetFilterNdDescriptor

resetStates() also reset grad{Hidden,Cell}Output

Volumetric softmax and cross entropy criterion

Use the same instance of iSize to fix soumith#155

9803eb5

jonathanasdf force-pushed the SpatialConvolution-multithread branch from f58cf88 to 60d84f9 Compare April 6, 2016 18:10

jonathanasdf changed the title ~~Spatial convolution multithread~~ Multithread Variable Input Size Fix Apr 8, 2016

Initialize iSize in convert.lua

6c51cae

jonathanasdf force-pushed the SpatialConvolution-multithread branch from 60d84f9 to 6c51cae Compare April 8, 2016 16:19

jonathanasdf and others added 18 commits April 14, 2016 13:32

Merge remote-tracking branch 'upstream/master' into SpatialConvolutio…

b4123cf

…n-multithread

set output instead of assigning

9a33679

Add an option to exclude modules from conversion

6f90a90

Merge pull request soumith#176 from apaszke/convert_exclude

4b36f1a

Add an option to exclude modules from conversion

Update cudnn.convert in README

75d2690

Merge pull request soumith#177 from apaszke/doc_update

95466c3

Update cudnn.convert in README

readme change

9a5c2f0

use replace from nn

3ee0de9

Merge pull request soumith#190 from szagoruyko/nn-replace

37f1a48

nn.Module.replace in cudnn.convert

merge cudnn.convert changes

6181f75

Merge pull request soumith#191 from szagoruyko/R4-sync

a8b9678

Merge lost R4 changes (cudnn.convert exclusion function)

Update CMakeLists.txt

66aa1dd

Merge pull request soumith#193 from ngimel/patch-1

127c56b

Update version check for cudnn v5 in CMakeLists

errcheck -> cudnn.errcheck

2de26a2

Update README.md

1f99d39

disable rnn dropout

296f0b8

Merge pull request soumith#198 from ngimel/disable_rnn_dropout

ff5b7ec

disable rnn dropout

Natalia fix corner case where both reference impl and cudnn produce nans

16e5e41

soumith and others added 30 commits August 6, 2016 13:32

Merge pull request soumith#233 from soumith/testsrefactor

6b0eb9f

refactoring tests, phase 1

Merge pull request soumith#194 from iamalbert/fix-errcheck-undeclared

327e6af

[fix] errcheck is undeclared

working double precision

7afb241

Merge pull request soumith#234 from soumith/doubleprec

db99551

working double precision

fix minor duplication

0a54527

back-compat in spatialfullconvolution

556c272

resetStates() also reset grad{Hidden,Cell}Output

15011e3

check for half with cutorch.hasHalf

59ac9af

Add cudnn.VolumetricLogSoftMax cudnn.VolumetricSoftMax.lua

a2e5587

Add VolumetricCrossEntropyCriterion.lua

c00a432

Add tests for VolumetricLogSoftMax and VolumetricCrossEntropyCriterion

5666ea0

Update README.md with new modules

78e2913

Lua 5.3 compatibility

4737787

It seems like Lua 5.3 doesn't like it when you put floats into long tensors. Simply taking the floor explicitly (which is what Lua 5.2 does implicitly) seems to work.

Merge pull request soumith#240 from bartvm/patch-1

1604e11

Lua 5.3 compatibility

prevent bn from backward in evalute mode

76b3445

Merge pull request soumith#245 from szagoruyko/bn-backward-assert

e079d80

Prevent BatchNorm from backward in evaluate mode

add cudnn.externalizeString

9b4bc3b

Merge pull request soumith#247 from albanD/string_fix

7e25b5e

Add cudnn.externalizeString

fix output params for cudnnGetFilterNdDescriptor

d669dad

Merge pull request soumith#249 from ngimel/fixenums

9c782ea

fix output params for cudnnGetFilterNdDescriptor

add error for nngraph

07b0b2d

Merge pull request soumith#235 from iamalbert/rnn-resetstates

33c36f0

resetStates() also reset grad{Hidden,Cell}Output

Add new modules to init.lua

7eeeeed

Merge pull request soumith#239 from qureai/master

440f0d5

Volumetric softmax and cross entropy criterion

making nngraph conversions non-fatal

4536a4b

adding environment variable option

8c112df

Use the same instance of iSize to fix soumith#155

09a2644

Initialize iSize in convert.lua

d31d863

set output instead of assigning

03e823e

merged

6b2e9b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithread Variable Input Size Fix #160

Multithread Variable Input Size Fix #160

jonathanasdf commented Apr 6, 2016

fmassa commented Apr 6, 2016

soumith commented Apr 6, 2016

fmassa commented Apr 6, 2016

jonathanasdf commented Apr 6, 2016 •

edited

Loading

jonathanasdf commented Apr 6, 2016

borisfom commented Apr 6, 2016

borisfom commented Apr 6, 2016

Multithread Variable Input Size Fix #160

Are you sure you want to change the base?

Multithread Variable Input Size Fix #160

Conversation

jonathanasdf commented Apr 6, 2016

fmassa commented Apr 6, 2016

soumith commented Apr 6, 2016

fmassa commented Apr 6, 2016

jonathanasdf commented Apr 6, 2016 • edited Loading

jonathanasdf commented Apr 6, 2016

borisfom commented Apr 6, 2016

borisfom commented Apr 6, 2016

jonathanasdf commented Apr 6, 2016 •

edited

Loading