Collective operations are core to any distributed performance oriented system.
All the elements in such a distributed system participate together, hence the name collective operations. Distributed system is loosely used here so take it with a pinch of salt. What I mean by elements are the individual nodes.
I am not going to perform any scientific profiling of each of the algorithms. I am lazy.
Interested and not so lazy folks check this paper:
Optimization of Collective Communication Operations in MPICH
There will be a part two where I cover reduce-scatter and reduce.
manim
I used manim package to generate the vidoes.
It is an amazing software.
Broadcast
When one node has all the data and all the nodes should have all the data.
Binomial tree
To be honest, this should be called “binary tree”.
van Geijn’s algorithm
Kind of cool. Still amazes me.
If you have a super large chunk of data, sending the super large chunk to each and every node is going to cost a lot of bandwidth. Instead, break it into smaller chunks and give a chunk to each and every node. Now, tell all those nodes (including ourselves) to share among each other.
AllGather
When each of the nodes has some data and all the nodes should have all the data.
Bruck’s algorithm
The good ol’ distance doubling trick applied here.
All data transactions happen in one direction.
Recursive doubling
Pairwise swapping. Double the distance each step.
Ring
Nearest neighbour.