Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is accelerate called twice on a 3090 ? #3002

Open
CodeAlexx opened this issue Dec 5, 2024 · 0 comments
Open

is accelerate called twice on a 3090 ? #3002

CodeAlexx opened this issue Dec 5, 2024 · 0 comments

Comments

@CodeAlexx
Copy link

trying to full tune flux scnell and getting this error.. gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 200000
steps: 0%| | 0/200000 [00:00<?, ?it/s]
epoch 1/1
2024-12-05 08:22:50 INFO epoch is incremented. current_epoch: 0, train_util.py:715
epoch: 1
Traceback (most recent call last):
File "/home/alex/kohya_ss/sd-scripts/flux_train.py", line 850, in
train(args)
File "/home/alex/kohya_ss/sd-scripts/flux_train.py", line 683, in train
accelerator.backward(loss)
File "/home/alex/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2155, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/home/alex/kohya_ss/venv/lib/python3.10/site-packages/torch/tensor.py", line 581, in backward
torch.autograd.backward(
File "/home/alex/kohya_ss/venv/lib/python3.10/site-packages/torch/autograd/init.py", line 347, in backward
engine_run_backward(
File "/home/alex/kohya_ss/venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in engine_run_backward
return Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/alex/kohya_ss/sd-scripts/flux_train.py", line 488, in grad_hook
accelerator.clip_grad_norm
(tensor, args.max_grad_norm)
File "/home/alex/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2303, in clip_grad_norm

self.unscale_gradients()
File "/home/alex/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2253, in unscale_gradients
self.scaler.unscale
(opt)
File "/home/alex/kohya_ss/venv/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 327, in unscale

raise RuntimeError(
RuntimeError: unscale_() has already been called on this optimizer since the last update().
steps: 0%| | 0/200000 [00:09<?, ?it/s]
Traceback (most recent call last):
File "/home/alex/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/home/alex/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/alex/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/home/alex/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/alex/kohya_ss/venv/bin/python3.10', '/home/alex/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/home/alex/emver1rev1/model/config_dreambooth-20241205-082224.toml']' returned non-zero exit status 1.
08:22:57-324187 INFO Training has ended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant