A case study on RFI excision algorithm

I always knew optimization options are a big deal, but, this case study has made me a preacher of GCC optmization. As if compilers themselves are not magically enough, compiler optimizations, in addition to that magic, add a whole new dimension to the alchemy.

Algorithm

Before I delve into the optimization options, it will be worthwhile to talk about the simple yet effective algorithm to excise (or simply filter) Radio Frequency Interferences (RFIs). Given a filterbank data which is essentially a nsamps X nchans matrix where element is the power corresponding to the time and frequency. We first compute time and frequency marginalized distributions. From these distributions, we compute a measure of standard deviation which we use to flag certain samples which don’t follow the same distribution.

Computationally speaking, there are the following steps:

  1. Full filterbank traversal – nsamps X nchans
  2. Measure of std. dev. – nsamps, nchans
  3. Flagging on the marginalized distributions – nsamps, nchans
  4. Filtering – nsamps X nchans

In a typical setting, nchans=4096 and nsamps=1280 for a second of filterbank data. So, enough to say, there are many array operations.

GCC Optimization options

Official thank you to this GNU GCC page

There are 5 optimization options I will be playing with. I will be passing -ggdb in all the builds. I am using the unity build design principle (checkout this link here) where I don’t compile individual objects and then link together. Instead, I put everything (by everything I mean all the class definitions I use) in one big file. I don’t care about compile times because I know my binaries won’t be that big.

Those five are:

  • O1
  • O2
  • O3
  • Os
  • Ofast

Results

Ofast is worthy of it’s name. It’s the fastest and it’s my new favorite.

Optimization reports

Let’s just focus on changes brought into code when going from O0 to Ofast. Because Ofast is the fastest.

Now, Ofast = O3 + -ffast-math, so let’s compare Ofast and fast-math.

Clearly, Ofast is still the undisputed winner. We also note that fastmath is almost performing like O0 which tells us all the important optmizations are happening with O3 and fastmath isn’t helping us much.

Despite this, let’s continue our focus on Ofast. A disclaimer that the algorithm at hand is nice in that it still works as expected while not having the luxury of IEEE/ANSI standards conformity. If that isn’t case, O3 is harmful.

When we look at the optimization report, we see things which we expect. I am only focusing on code related with this algorithm in the optimization report. This cuts short the entire report from 25523 lines to 1259 lines.

It seems like there are many failures at optimizations and yet the performance is greatly improved. I don’t understand everything about it but I can start to see the possible ways to optimize it.

Enough to let me sleep tonight.

Compiler optimizations literally turn my lead of code into gold, hence it’s no less than alchemy.