You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear developer,
Recently, I run nccl test on the following machine:
2× InfiniBand EDR (Connect-X4)
4× NVIDIA V100 GPU, 16 GB HBM
Based on my best knowledge, NCCL tests measure BW per direction. Therefore, the results would be 25 GB/s on the V100, However, I am getting 41.55 GB/s, which is significantly higher than the theoretical BW (25 GB/s).
Here is topology matrix:
GPU0 GPU1 GPU2 GPU3 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV2 NV2 NV2 PIX SYS 0-19,40-59 0 N/A
GPU1 NV2 X NV2 NV2 PIX SYS 0-19,40-59 0 N/A
GPU2 NV2 NV2 X NV2 SYS PIX 20-39,60-79 1 N/A
GPU3 NV2 NV2 NV2 X SYS PIX 20-39,60-79 1 N/A
NIC0 PIX PIX SYS SYS X SYS
NIC1 SYS SYS PIX PIX SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
Dear developer,
Recently, I run nccl test on the following machine:
2× InfiniBand EDR (Connect-X4)
4× NVIDIA V100 GPU, 16 GB HBM
Based on my best knowledge, NCCL tests measure BW per direction. Therefore, the results would be 25 GB/s on the V100, However, I am getting 41.55 GB/s, which is significantly higher than the theoretical BW (25 GB/s).
Here is topology matrix:
Here is the SLURM job that I submitted:
Here is the output of this test:
I also attached topology of the machine:
V100_topology.txt
I would appreciate it if you could add some comments on my findings and help me understand this discrepancy.
Thanks
The text was updated successfully, but these errors were encountered: