Can it be used in DDP? #6

chengjianhong · 2021-04-24T03:53:44Z

Hi, I use the GardNorm in my segmentation and classification task. I want to use the DistributedDataParallel to train it. But it occurs the error: "RuntimeError: derivative for batch_norm_backward_elemt is not implemented". Can you give me some advice?

  Lgard.backward()
  File "/homeb/jhcheng/anaconda3/envs/py37-torch/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/homeb/jhcheng/anaconda3/envs/py37-torch/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: derivative for batch_norm_backward_elemt is not implemented

The text was updated successfully, but these errors were encountered:

lthilnklover · 2021-12-02T09:55:01Z

Having similar problem. Trying to differentiate the gradient norm with ddp, but got the same error message. It works fine (I think) with single gpu.

It also works with ddp without syncbatch. So I am guessing that this problem is related to syncbatch

danieltudosiu · 2022-07-01T13:58:12Z

I face the same issues!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can it be used in DDP? #6

Can it be used in DDP? #6

chengjianhong commented Apr 24, 2021

lthilnklover commented Dec 2, 2021 •

edited

Loading

danieltudosiu commented Jul 1, 2022

Can it be used in DDP? #6

Can it be used in DDP? #6

Comments

chengjianhong commented Apr 24, 2021

lthilnklover commented Dec 2, 2021 • edited Loading

danieltudosiu commented Jul 1, 2022

lthilnklover commented Dec 2, 2021 •

edited

Loading