Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTX 4090 Laptop results #18

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
/opt/conda/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
warnings.warn(
/opt/conda/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
0%| | 0.00/97.8M [00:00<?, ?B/s] 1%|▏ | 1.41M/97.8M [00:00<00:07, 14.3MB/s] 3%|▎ | 2.77M/97.8M [00:00<00:07, 14.0MB/s] 4%|▍ | 4.19M/97.8M [00:00<00:06, 14.4MB/s] 6%|▌ | 5.77M/97.8M [00:00<00:06, 14.2MB/s] 7%|▋ | 7.12M/97.8M [00:00<00:07, 13.4MB/s] 9%|▉ | 9.00M/97.8M [00:00<00:06, 15.3MB/s] 11%|█ | 10.9M/97.8M [00:00<00:05, 16.5MB/s] 13%|█▎ | 12.7M/97.8M [00:00<00:05, 17.4MB/s] 15%|█▍ | 14.4M/97.8M [00:00<00:04, 17.6MB/s] 17%|█▋ | 16.3M/97.8M [00:01<00:04, 18.1MB/s] 18%|█▊ | 18.0M/97.8M [00:01<00:04, 17.6MB/s] 20%|██ | 19.9M/97.8M [00:01<00:04, 18.1MB/s] 22%|██▏ | 21.7M/97.8M [00:01<00:04, 18.5MB/s] 24%|██▍ | 23.5M/97.8M [00:01<00:04, 18.3MB/s] 26%|██▌ | 25.3M/97.8M [00:01<00:04, 18.0MB/s] 28%|██▊ | 27.0M/97.8M [00:01<00:04, 17.7MB/s] 29%|██▉ | 28.8M/97.8M [00:01<00:04, 18.0MB/s] 31%|███▏ | 30.7M/97.8M [00:01<00:03, 18.5MB/s] 33%|███▎ | 32.5M/97.8M [00:01<00:03, 18.1MB/s] 35%|███▌ | 34.2M/97.8M [00:02<00:04, 15.7MB/s] 37%|███▋ | 35.9M/97.8M [00:02<00:04, 16.1MB/s] 39%|███▊ | 37.8M/97.8M [00:02<00:03, 17.0MB/s] 41%|████ | 39.7M/97.8M [00:02<00:03, 17.5MB/s] 43%|████▎ | 41.6M/97.8M [00:02<00:03, 18.2MB/s] 45%|████▍ | 43.6M/97.8M [00:02<00:03, 18.7MB/s] 46%|████▋ | 45.4M/97.8M [00:02<00:02, 18.6MB/s] 48%|████▊ | 47.2M/97.8M [00:02<00:02, 18.3MB/s] 50%|█████ | 48.9M/97.8M [00:03<00:03, 16.6MB/s] 52%|█████▏ | 50.5M/97.8M [00:03<00:03, 15.9MB/s] 54%|█████▍ | 52.6M/97.8M [00:03<00:02, 17.4MB/s] 56%|█████▌ | 54.3M/97.8M [00:03<00:02, 17.5MB/s] 57%|█████▋ | 56.0M/97.8M [00:03<00:02, 17.6MB/s] 59%|█████▉ | 57.9M/97.8M [00:03<00:02, 18.2MB/s] 61%|██████ | 59.9M/97.8M [00:03<00:02, 18.9MB/s] 63%|██████▎ | 61.7M/97.8M [00:03<00:02, 18.5MB/s] 65%|██████▌ | 63.7M/97.8M [00:03<00:01, 19.1MB/s] 67%|██████▋ | 65.6M/97.8M [00:03<00:01, 19.3MB/s] 69%|██████▉ | 67.5M/97.8M [00:04<00:01, 19.6MB/s] 71%|███████ | 69.4M/97.8M [00:04<00:01, 18.8MB/s] 73%|███████▎ | 71.2M/97.8M [00:04<00:01, 17.3MB/s] 75%|███████▍ | 72.9M/97.8M [00:04<00:01, 17.2MB/s] 76%|███████▋ | 74.6M/97.8M [00:04<00:01, 16.7MB/s] 78%|███████▊ | 76.2M/97.8M [00:04<00:01, 16.3MB/s] 80%|███████▉ | 77.7M/97.8M [00:04<00:01, 15.0MB/s] 81%|████████ | 79.2M/97.8M [00:04<00:01, 15.1MB/s] 83%|████████▎ | 80.7M/97.8M [00:04<00:01, 15.1MB/s] 84%|████████▍ | 82.1M/97.8M [00:05<00:01, 15.2MB/s] 86%|████████▌ | 83.7M/97.8M [00:05<00:00, 15.2MB/s] 87%|████████▋ | 85.4M/97.8M [00:05<00:00, 15.6MB/s] 89%|████████▉ | 86.9M/97.8M [00:05<00:00, 15.7MB/s] 91%|█████████ | 88.5M/97.8M [00:05<00:00, 16.0MB/s] 92%|█████████▏| 90.0M/97.8M [00:05<00:00, 15.8MB/s] 94%|█████████▎| 91.6M/97.8M [00:05<00:00, 15.6MB/s] 95%|█████████▌| 93.1M/97.8M [00:05<00:00, 15.7MB/s] 97%|█████████▋| 94.6M/97.8M [00:05<00:00, 15.7MB/s] 98%|█████████▊| 96.2M/97.8M [00:05<00:00, 15.9MB/s]100%|█████████▉| 97.7M/97.8M [00:06<00:00, 15.9MB/s]100%|██████████| 97.8M/97.8M [00:06<00:00, 16.9MB/s]
NOTE! Installing ujson may make loading annotations faster.
DLL 2023-07-12 20:08:14.147871 - PARAMETER dataset path : /data/object_detection epochs : 1 batch size : 96 eval batch size : 32 no cuda : False seed : None checkpoint path : None mode : benchmark-training eval on epochs : [21, 31, 37, 42, 48, 53, 59, 64] lr decay epochs : [43, 54] learning rate : 0.0 momentum : 0.9 weight decay : 0.0005 lr warmup : None backbone : resnet50 backbone path : None num workers : 4 AMP : True precision : amp
Using seed = 1235
loading annotations into memory...
Traceback (most recent call last):
File "main.py", line 286, in <module>
train(train_loop_func, logger, args)
File "main.py", line 146, in train
cocoGt = get_coco_ground_truth(args)
File "/workspace/benchmark/Detection/SSD/ssd/data.py", line 73, in get_coco_ground_truth
cocoGt = COCO(annotation_file=val_annotate, use_ext=True)
File "/opt/conda/lib/python3.8/site-packages/pycocotools/coco.py", line 89, in __init__
dataset = json.load(open(annotation_file, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: '/data/object_detection/annotations/instances_val2017.json'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3160) of binary: /opt/conda/bin/python
Traceback (most recent call last):
File "/opt/conda/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==1.13.0a0+d0d6b1f', 'console_scripts', 'torchrun')())
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
main.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-07-12_20:08:16
host : 24a2da7181c1
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 3160)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
DONE!
Expand Down
2 changes: 2 additions & 0 deletions pytorch/results/4090laptop_v1/PyTorch_SSD_AMP/benchmark.para
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
GLOBAL_BATCH 96
GPU 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/opt/conda/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
warnings.warn(
/opt/conda/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
NOTE! Installing ujson may make loading annotations faster.
DLL 2023-07-12 20:15:36.177470 - PARAMETER dataset path : /data/object_detection epochs : 1 batch size : 48 eval batch size : 32 no cuda : False seed : None checkpoint path : None mode : benchmark-training eval on epochs : [21, 31, 37, 42, 48, 53, 59, 64] lr decay epochs : [43, 54] learning rate : 0.0 momentum : 0.9 weight decay : 0.0005 lr warmup : None backbone : resnet50 backbone path : None num workers : 4 AMP : False precision : fp32
Using seed = 4198
loading annotations into memory...
Traceback (most recent call last):
File "main.py", line 286, in <module>
train(train_loop_func, logger, args)
File "main.py", line 146, in train
cocoGt = get_coco_ground_truth(args)
File "/workspace/benchmark/Detection/SSD/ssd/data.py", line 73, in get_coco_ground_truth
cocoGt = COCO(annotation_file=val_annotate, use_ext=True)
File "/opt/conda/lib/python3.8/site-packages/pycocotools/coco.py", line 89, in __init__
dataset = json.load(open(annotation_file, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: '/data/object_detection/annotations/instances_val2017.json'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5329) of binary: /opt/conda/bin/python
Traceback (most recent call last):
File "/opt/conda/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==1.13.0a0+d0d6b1f', 'console_scripts', 'torchrun')())
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
main.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-07-12_20:15:40
host : 24a2da7181c1
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 5329)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
DONE!
2 changes: 2 additions & 0 deletions pytorch/results/4090laptop_v1/PyTorch_SSD_FP32/benchmark.para
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
GLOBAL_BATCH 48
GPU 1
Loading