How do we comprehend the factor between algBw and busBw? #235

lianghao208 · 2024-07-20T06:42:24Z

AllGather, Alltoall, Gather, ReduceScatter, Scatter:

algBw = (n-1)/n * busBw

AllReduce:

algBw = 2*(n-1)/n * busBw

Broadcast, Reduce, Send/Recv:

algBw = busBw

How do we comprehend the factor between algBw and busBw?

Particularly, I think the communication amount of Broadcast is just the same as Scatter, why are the factors different between them?

And It seems Alltoall communicate a lot more than AllGather, why are their factors the same?

The text was updated successfully, but these errors were encountered:

kiskra-nvidia · 2024-07-21T04:32:05Z

You can find the explanation in https://github.com/NVIDIA/nccl-tests/blob/master/doc/PERFORMANCE.md...

In particular, regarding the difference between Broadcast and Scatter, Broadcast always needs to send out a complete buffer, whereas Scatter doesn't need to send the part destined for the root process (since that data is already there). I.e., for n == 2, Scatter needs to send out only half of the data that Brodcast needs to send.

lianghao208 · 2024-07-21T12:58:37Z

In particular, regarding the difference between Broadcast and Scatter, Broadcast always needs to send out a complete buffer, whereas Scatter doesn't need to send the part destined for the root process (since that data is already there). I.e., for n == 2, Scatter needs to send out only half of the data that Brodcast needs to send.

@kiskra-nvidia Thanks for the link, it helps. But I am still confused about difference between Broadcast and Scatter. For Broadcast, do you mean the root process(has a complete buffer) still needs to send out a complete buffer to itself, whereas Scatter doesn't? Since the root process already has the complete data, should the number of communicate be n-1 as well?

kiskra-nvidia · 2024-07-21T20:02:25Z

Perhaps we misunderstood each other. I was answering your question about the communication amount, which I understood to be a question about the volume of data. Broadcast needs to send a complete buffer S to n-1 destinations. Scatter needs to send to n-1 destinations as well, but for each destination it needs to send just 1/n-th of the buffer S. So it's the same in terms of the number of messages but not in terms of the volume of data.

lianghao208 · 2024-07-22T02:08:38Z

@kiskra-nvidia
So if the volume of data send in Broadcast is S*(n-1), then the volume of data send in Scatter will be S*(n-1)/n.
If I understand correctly, in Broadcast, the conversion relation between algBw and busBw will be:

algBw = (n-1)busBw

instead of

algBw = busBw

Do you know what else do I miss?

LJjia · 2024-08-06T09:44:29Z

I has same question

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do we comprehend the factor between algBw and busBw? #235

How do we comprehend the factor between algBw and busBw? #235

lianghao208 commented Jul 20, 2024

kiskra-nvidia commented Jul 21, 2024

lianghao208 commented Jul 21, 2024

kiskra-nvidia commented Jul 21, 2024

lianghao208 commented Jul 22, 2024

LJjia commented Aug 6, 2024

How do we comprehend the factor between algBw and busBw? #235

How do we comprehend the factor between algBw and busBw? #235

Comments

lianghao208 commented Jul 20, 2024

kiskra-nvidia commented Jul 21, 2024

lianghao208 commented Jul 21, 2024

kiskra-nvidia commented Jul 21, 2024

lianghao208 commented Jul 22, 2024

LJjia commented Aug 6, 2024