Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the NCCL test rate of my 4090x8 card so low? #270

Open
gogoman010310 opened this issue Nov 20, 2024 · 0 comments
Open

Why is the NCCL test rate of my 4090x8 card so low? #270

gogoman010310 opened this issue Nov 20, 2024 · 0 comments

Comments

@gogoman010310
Copy link

This is my test result. I turned off IOMMU and ACS. It improved a little bit compared to before, but it is still very low. This rate is not even as good as the 3090x8 I used before.

NCCL_ALGO=RING CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 numactl --cpunodebind=0,1,4,5 --membind=0,1,4,5 ./build/all_reduce_perf -b 256 -e 1G -w 20 -n 100 -f 2 -g 8

nThread 1 nGpus 8 minBytes 256 maxBytes 1073741824 step: 2(factor) warmup iters: 20 iters: 100 agg iters: 1 validation: 1 graph: 0

Using devices

Rank 0 Group 0 Pid 6151 on ubuntu device 0 [0x01] NVIDIA GeForce RTX 4090

Rank 1 Group 0 Pid 6151 on ubuntu device 1 [0x25] NVIDIA GeForce RTX 4090

Rank 2 Group 0 Pid 6151 on ubuntu device 2 [0x41] NVIDIA GeForce RTX 4090

Rank 3 Group 0 Pid 6151 on ubuntu device 3 [0x61] NVIDIA GeForce RTX 4090

Rank 4 Group 0 Pid 6151 on ubuntu device 4 [0x81] NVIDIA GeForce RTX 4090

Rank 5 Group 0 Pid 6151 on ubuntu device 5 [0xa1] NVIDIA GeForce RTX 4090

Rank 6 Group 0 Pid 6151 on ubuntu device 6 [0xc1] NVIDIA GeForce RTX 4090

Rank 7 Group 0 Pid 6151 on ubuntu device 7 [0xe1] NVIDIA GeForce RTX 4090

out-of-place in-place

size count type redop root time algbw busbw #wrong time algbw busbw #wrong

(B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)

     256            64     float     sum      -1    41.35    0.01    0.01      0    41.33    0.01    0.01      0
     512           128     float     sum      -1    41.18    0.01    0.02      0    41.78    0.01    0.02      0
    1024           256     float     sum      -1    41.67    0.02    0.04      0    41.59    0.02    0.04      0
    2048           512     float     sum      -1    41.67    0.05    0.09      0    41.59    0.05    0.09      0
    4096          1024     float     sum      -1    41.67    0.10    0.17      0    41.76    0.10    0.17      0
    8192          2048     float     sum      -1    41.50    0.20    0.35      0    41.09    0.20    0.35      0
   16384          4096     float     sum      -1    40.23    0.41    0.71      0    39.81    0.41    0.72      0
   32768          8192     float     sum      -1    40.70    0.81    1.41      0    40.81    0.80    1.41      0
   65536         16384     float     sum      -1    51.74    1.27    2.22      0    51.50    1.27    2.23      0
  131072         32768     float     sum      -1    95.67    1.37    2.40      0    95.23    1.38    2.41      0
  262144         65536     float     sum      -1    189.2    1.39    2.42      0    189.3    1.38    2.42      0
  524288        131072     float     sum      -1    380.4    1.38    2.41      0    380.0    1.38    2.41      0
 1048576        262144     float     sum      -1    374.1    2.80    4.91      0    374.7    2.80    4.90      0
 2097152        524288     float     sum      -1    712.2    2.94    5.15      0    712.8    2.94    5.15      0
 4194304       1048576     float     sum      -1   1398.3    3.00    5.25      0   1397.9    3.00    5.25      0
 8388608       2097152     float     sum      -1   2818.8    2.98    5.21      0   2819.1    2.98    5.21      0
16777216       4194304     float     sum      -1   5636.9    2.98    5.21      0   5635.2    2.98    5.21      0
33554432       8388608     float     sum      -1    11314    2.97    5.19      0    11314    2.97    5.19      0
67108864      16777216     float     sum      -1    22786    2.95    5.15      0    22790    2.94    5.15      0

134217728 33554432 float sum -1 45706 2.94 5.14 0 45696 2.94 5.14 0
268435456 67108864 float sum -1 91494 2.93 5.13 0 91512 2.93 5.13 0
536870912 134217728 float sum -1 183860 2.92 5.11 0 183851 2.92 5.11 0
1073741824 268435456 float sum -1 371272 2.89 5.06 0 371306 2.89 5.06 0

Out of bounds values : 0 OK

Avg bus bandwidth : 2.99014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant