Collective operations are core to any distributed performance oriented system.

All the elements in such a distributed system participate together, hence the name collective operations. Distributed system is loosely used here so take it with a pinch of salt. What I mean by elements are the individual nodes.

I am not going to perform any scientific profiling of each of the algorithms. I am lazy.

Interested and not so lazy folks check this paper:

Optimization of Collective Communication Operations in MPICH

There will be a part two where I cover reduce-scatter and reduce.


I used manim package to generate the vidoes.

It is an amazing software.


When one node has all the data and all the nodes should have all the data.

Binomial tree

To be honest, this should be called “binary tree”.

van Geijn’s algorithm

Kind of cool. Still amazes me.

If you have a super large chunk of data, sending the super large chunk to each and every node is going to cost a lot of bandwidth. Instead, break it into smaller chunks and give a chunk to each and every node. Now, tell all those nodes (including ourselves) to share among each other.


When each of the nodes has some data and all the nodes should have all the data.

Bruck’s algorithm

The good ol’ distance doubling trick applied here.

All data transactions happen in one direction.

Recursive doubling

Pairwise swapping. Double the distance each step.


Nearest neighbour.