You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not very familiar with how cudnn.SpatialConvolution works, but it seems when an input of a different size comes in, output gets a new reference or something, and so copies of it on a different thread end up referring to an outdated copy of output or something?
Here's a snippet:
require 'cunn'
require 'cudnn'
local model = cudnn.SpatialConvolution(1,1,1,1):cuda()
local model2 = nn.SpatialConvolution(1,1,1,1):cuda()
local nThreads = 8
torch.setnumthreads(nThreads)
local Threads = require 'threads'
Threads.serialization('threads.sharedserialize')
local mutex_id = Threads.Mutex():id()
local threads = Threads(nThreads,
function()
require 'cunn'
require 'cudnn'
end,
function()
_model = model
_model2 = model2
_mutex = (require 'threads').Mutex(mutex_id)
end
)
print("Start")
for t=1,100 do
for i=2,10 do
threads:addjob(
function()
_mutex:lock()
local inputs = torch.rand(i, 1, 1, 1):cuda()
local outputs = _model:forward(inputs)
--local outputs = _model2:forward(inputs)
if i ~= outputs:size(1) then
print("mismatch!", inputs:size(1), outputs:size(1))
end
_mutex:unlock()
end
)
end
threads:synchronize()
end
print("Done")
When I run it, I see mismatch being printed. The cunn SpatialConvolution doesn't seem to have this problem (use model2 instead of model in that snippet).
The text was updated successfully, but these errors were encountered:
I'm not very familiar with how cudnn.SpatialConvolution works, but it seems when an input of a different size comes in, output gets a new reference or something, and so copies of it on a different thread end up referring to an outdated copy of output or something?
Here's a snippet:
When I run it, I see mismatch being printed. The cunn SpatialConvolution doesn't seem to have this problem (use model2 instead of model in that snippet).
The text was updated successfully, but these errors were encountered: