-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error in training #142
Comments
Have you solved this problem? I also encountered this problem |
same problem |
same problem |
how to solve this problem |
Seems like NCCL isn't available in Windows. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello, author, I made a mistake in training. What is the specific reason for the error?
python scripts/segmentation_train.py --data_name ISIC --data_dir F:\liuxiao\project\dataset\isbi_3b_medsegdiff --out_dir F:\liuxiao\project\MedSegDiff\outdir --image_size 256 --num_channels 128 --class_cond False --num_res_blocks 2 --num_heads 1 --learn_sigma True --use_scale_shift_norm False --attention_resolutions 16 --diffusion_steps 1000 --noise_schedule linear --rescale_learned_sigmas False --rescale_timesteps False --lr 1e-4 --batch_size 8
error:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "F:\miniconda\envs\medsegdiff\lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
File "F:\miniconda\envs\medsegdiff\lib\site-packages\urllib3\connectionpool.py", line 844, in urlopen
retries = retries.increment(
File "F:\miniconda\envs\medsegdiff\lib\site-packages\urllib3\util\retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8850): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000024EDE7FCC40>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "F:\miniconda\envs\medsegdiff\lib\site-packages\visdom_init_.py", line 756, in _send
return self.handle_post(
File "F:\miniconda\envs\medsegdiff\lib\site-packages\visdom_init.py", line 720, in _handle_post
r = self.session.post(url, data=data)
File "F:\miniconda\envs\medsegdiff\lib\site-packages\requests\sessions.py", line 637, in post
return self.request("POST", url, data=data, json=json, **kwargs)
File "F:\miniconda\envs\medsegdiff\lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "F:\miniconda\envs\medsegdiff\lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "F:\miniconda\envs\medsegdiff\lib\site-packages\requests\adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8850): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000024EDE7FCC40>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。'))
[WinError 10061] 由于目标计算机积极拒绝,无法连接。
on_close() takes 1 positional argument but 3 were given
Visdom python client failed to establish socket to get messages from the server. This feature is optional and can be disabled by initializing Visdom with
use_incoming_socket=False
, which will prevent waiting for this request to timeout.[W socket.cpp:663] [c10d] The client socket has failed to connect to [::ffff:127.0.1.1]:59878 (system error: 10049 - 在其上下文中,该请求的地址无效。).
Traceback (most recent call last):
File "scripts/segmentation_train.py", line 118, in
main()
File "scripts/segmentation_train.py", line 26, in main
dist_util.setup_dist(args)
File "F:\liuxiao\project\MedSegDiff.\guided_diffusion\dist_util.py", line 46, in setup_dist
dist.init_process_group(backend=backend, init_method="env://")
File "F:\miniconda\envs\medsegdiff\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
File "F:\miniconda\envs\medsegdiff\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group
default_pg, _ = _new_process_group_helper(
File "F:\miniconda\envs\medsegdiff\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
The text was updated successfully, but these errors were encountered: