Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can it be used in DDP? #6

Open
chengjianhong opened this issue Apr 24, 2021 · 2 comments
Open

Can it be used in DDP? #6

chengjianhong opened this issue Apr 24, 2021 · 2 comments

Comments

@chengjianhong
Copy link

Hi, I use the GardNorm in my segmentation and classification task. I want to use the DistributedDataParallel to train it. But it occurs the error: "RuntimeError: derivative for batch_norm_backward_elemt is not implemented". Can you give me some advice?

  Lgard.backward()
  File "/homeb/jhcheng/anaconda3/envs/py37-torch/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/homeb/jhcheng/anaconda3/envs/py37-torch/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: derivative for batch_norm_backward_elemt is not implemented
@lthilnklover
Copy link

lthilnklover commented Dec 2, 2021

Having similar problem. Trying to differentiate the gradient norm with ddp, but got the same error message. It works fine (I think) with single gpu.

It also works with ddp without syncbatch. So I am guessing that this problem is related to syncbatch

@danieltudosiu
Copy link

I face the same issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants