-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0 #173
Comments
I have the same error.Using Pytorch0.4+python3.5. |
python3.5 and pytorch 0.3.0 no problem |
I have the same error,if I switch the lines 96,97 |
@xscjun change line: |
Anyone has solved this problem? help me tks. |
The “pos” -> torch.Size([32, 8732])
Then it worked. |
i have the same error, and how did you solve it finally? |
i have the same error, so how could you figure it out finally? |
What file should be updated? |
change the data type of N to FloatTensor. |
You may try to update your file |
@usherbob python3.6+pytorch0.4.1, I added "loss_c = loss_c.view(pos.size()[0], pos.size()[1]) #add line", but I have another issue. RuntimeError: copy_if failed to synchronize: device-side assert triggered |
Finally, I succeeded. |
I changed like this, but there was a RuntimeError still: |
by changing the order of line 97 and 98 it throws a new error for me
any suggestions? PS: I tried as well converting the loss to double as mentioned above and still the same error! ### solved |
很棒,但是有个小bug,是line 114,不是line 144 |
If your Python torch version is '0.4.1' ,you can change follow |
I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: |
loss is increasing as shown below timer: 2.2050 sec. help me to solve the issue. |
thanks,that is usefully for me,but ,step3 is:line 183,184,188,191, 5 item ,loss_x.data[0] >> loss_x.data or loss.data[0] >> loss.data |
would be loss_x.data[0] >> loss_x.item() better? |
@TianSong1991 Thanks a lot.Pytorch 1.0+Python 3.5 success! |
much obligated! |
but loss is nan |
|
I have the same problem. Why loss is nan? |
Hi , why don`t the loss_l divide by N? |
I've encountered the same one here, have you solve this problem? |
learning rate is too big
| |
郭腾伟
邮箱:[email protected]
|
签名由 网易邮箱大师 定制
On 09/22/2019 11:52, HaoWu1993 wrote:
Pytorch version:
>> import torch
>> print(torch.__version__)
1.1.0
Python version:
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
multibox_loss.py:
Switch the two lines 97,98:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0 # filter out pos boxes for now
Change line114
N = num_pos.data.sum() -> N = num_pos.data.sum().double()
and change the following two lines to:
loss_l = loss_l.double()
loss_c = loss_c.double()
train.py
loss_l.data[0] >> loss_l.data
loss_c.data[0] >> loss_c.data
loss.data[0] >> loss.data
And here is my output:
timer: 11.9583 sec.
iter 0 || Loss: 11728.9388 || timer: 0.2955 sec.
iter 10 || Loss: nan || timer: 0.2843 sec.
iter 20 || Loss: nan || timer: 0.2890 sec.
iter 30 || Loss: nan || timer: 0.2934 sec.
iter 40 || Loss: nan || timer: 0.2865 sec.
iter 50 || Loss: nan || timer: 0.2855 sec.
iter 60 || Loss: nan || timer: 0.2889 sec.
iter 70 || Loss: nan || timer: 0.2857 sec.
iter 80 || Loss: nan || timer: 0.2843 sec.
iter 90 || Loss: nan || timer: 0.2835 sec.
iter 100 || Loss: nan || timer: 0.2846 sec.
iter 110 || Loss: nan || timer: 0.2946 sec.
iter 120 || Loss: nan || timer: 0.2860 sec.
iter 130 || Loss: nan || timer: 0.2846 sec.
iter 140 || Loss: nan || timer: 0.2962 sec.
iter 150 || Loss: nan || timer: 0.2989 sec.
iter 160 || Loss: nan || timer: 0.2857 sec.
I've encountered the same one here, have you solve this problem?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I think the loss is much enormous, you should add two lines: |
I don't change line 114, and then nan loss disappears. |
Those values of loss: loc_loss, conf_loss are much huge out of the memory, you would utilize the codes: N = num_pos.data.sum().double() loss_l = loss_l.double()
loss_c = loss_c.double()
loss_l /= N
loss_c /= N
And at the train.py, you should using the follow two lines instead of your codes
loc_loss += loss_l.item()
conf_loss += loss_c.item()
with best wish, better luck, good fortune.
…------------------ 原始邮件 ------------------
发件人: "琉璃梦"<[email protected]>;
发送时间: 2019年10月18日(星期五) 晚上10:01
收件人: "amdegroot/ssd.pytorch"<[email protected]>;
抄送: "YUXIAOHONG"<[email protected]>; "Comment"<[email protected]>;
主题: Re: [amdegroot/ssd.pytorch] RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0 (#173)
Pytorch version:
>>> import torch >>> print(torch.__version__) 1.1.0
Python version:
Python 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0] on linux
multibox_loss.py:
Switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now Change line114 N = num_pos.data.sum() -> N = num_pos.data.sum().double() and change the following two lines to: loss_l = loss_l.double() loss_c = loss_c.double()
train.py
loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data
And here is my output:
timer: 11.9583 sec. iter 0 || Loss: 11728.9388 || timer: 0.2955 sec. iter 10 || Loss: nan || timer: 0.2843 sec. iter 20 || Loss: nan || timer: 0.2890 sec. iter 30 || Loss: nan || timer: 0.2934 sec. iter 40 || Loss: nan || timer: 0.2865 sec. iter 50 || Loss: nan || timer: 0.2855 sec. iter 60 || Loss: nan || timer: 0.2889 sec. iter 70 || Loss: nan || timer: 0.2857 sec. iter 80 || Loss: nan || timer: 0.2843 sec. iter 90 || Loss: nan || timer: 0.2835 sec. iter 100 || Loss: nan || timer: 0.2846 sec. iter 110 || Loss: nan || timer: 0.2946 sec. iter 120 || Loss: nan || timer: 0.2860 sec. iter 130 || Loss: nan || timer: 0.2846 sec. iter 140 || Loss: nan || timer: 0.2962 sec. iter 150 || Loss: nan || timer: 0.2989 sec. iter 160 || Loss: nan || timer: 0.2857 sec.
I've encountered the same one here, have you solve this problem?
I don't change line 114, and then nan loss disappears.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
good! It work very good! Tank you ! |
@haibochina What? |
|
I think PR are welcommed. |
thank you @haibochina ,about the issue of lose=nan, your method is very good! |
I also had a nan loss issue after fixing multibox_loss.py In my case it was because I was trying to use custom annotations and loading them as If anyone else is trying to do the same thing, the correct format is Training works now |
Because of the loss too big, I change line 115 to
solve the issue |
@TianSong1991, I follow your solution and got it running normally... but after a while (after iter 90) the loss exploded to nan..., did you experience the same thing? |
with @TianSong1991 solution except the step3 changed to following: |
what's your torch version and python version? |
When encountered maybe you can change lr=1e-4,when i change ,then timer: 10.1423 sec. |
thanks,this answer solves my problem. |
I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: I'm also this answer solved my probelm. |
Thank you. I've solved the problem. Thank you again.
…------------------ 原始邮件 ------------------
发件人: "HyunJun Lee"<[email protected]>;
发送时间: 2020年10月23日(星期五) 上午10:14
收件人: "amdegroot/ssd.pytorch"<[email protected]>;
抄送: "贺智龙"<[email protected]>; "Comment"<[email protected]>;
主题: Re: [amdegroot/ssd.pytorch] RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0 (#173)
I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps:
step1 and step2 change the multibox_loss.py!
step1: switch the two lines 97,98:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0 # filter out pos boxes for now
step2: change the line114 N = num_pos.data.sum() to
N = num_pos.data.sum().double()
loss_l = loss_l.double()
loss_c = loss_c.double()
setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data
I'm also this answer solved my probelm.
more correctly,
loss_l = loss_l.double()/N
loss_c = loss_c.doubel()/N
:)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
if loss is |
or batch_size is too small or both |
There is still a problem. In step 1, it should be changer like this:
otherwise the loss will be a 'nan'. |
If you are using PyTorch 2, please follow this:
This solved my problem! |
@sonukiller I'm still getting nan loss even with your suggestion and the previous one. Do you suggest t remove all the .data attribute and substitute Variable with classic torch.tensor? |
rps@rps:~/桌面/ssd.pytorch$ python3 train.py
/home/rps/桌面/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use
with torch.no_grad():
instead.self.priors = Variable(self.priorbox.forward(), volatile=True)
/home/rps/桌面/ssd.pytorch/layers/modules/l2norm.py:17: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
init.constant(self.weight,self.gamma)
Loading base network...
Initializing weights...
train.py:214: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
init.xavier_uniform(param)
Loading the dataset...
Training SSD on: VOC0712
Using the specified args:
Namespace(basenet='vgg16_reducedfc.pth', batch_size=32, cuda=True, dataset='VOC', dataset_root='/home/rps/data/VOCdevkit/', gamma=0.1, lr=0.001, momentum=0.9, num_workers=4, resume=None, save_folder='weights/', start_iter=0, visdom=False, weight_decay=0.0005)
train.py:169: UserWarning: volatile was removed and now has no effect. Use
with torch.no_grad():
instead.targets = [Variable(ann.cuda(), volatile=True) for ann in targets]
Traceback (most recent call last):
File "train.py", line 255, in
train()
File "train.py", line 178, in train
loss_l, loss_c = criterion(out, targets)
File "/home/rps/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/rps/桌面/ssd.pytorch/layers/modules/multibox_loss.py", line 97, in forward
loss_c[pos] = 0 # filter out pos boxes for now
RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0
anyone helps,please...
The text was updated successfully, but these errors were encountered: