A case study on RFI excision algorithm
I always knew optimization options are a big deal, but, this case study has made me a preacher of GCC optmization. As if compilers themselves are not magically enough, compiler optimizations, in addition to that magic, add a whole new dimension to the alchemy.
Before I delve into the optimization options, it will be worthwhile to talk about the simple yet effective algorithm to excise (or simply filter) Radio Frequency Interferences (RFIs).
Given a filterbank data which is essentially a
nsamps X nchans matrix where element is the power corresponding to the time and frequency.
We first compute time and frequency marginalized distributions. From these distributions, we compute a measure of standard deviation which we use to flag certain samples which don’t follow the same distribution.
Computationally speaking, there are the following steps:
- Full filterbank traversal –
nsamps X nchans
- Measure of std. dev. –
- Flagging on the marginalized distributions –
- Filtering –
nsamps X nchans
In a typical setting,
nsamps=1280 for a second of filterbank data. So, enough to say, there are many array operations.
GCC Optimization options
Official thank you to this GNU GCC page
There are 5 optimization options I will be playing with. I will be passing
-ggdb in all the builds.
I am using the unity build design principle (checkout this link here) where I don’t compile individual objects and then link together. Instead, I put everything (by everything I mean all the class definitions I use) in one big file.
I don’t care about compile times because I know my binaries won’t be that big.
Those five are:
Ofast is worthy of it’s name. It’s the fastest and it’s my new favorite.
Let’s just focus on changes brought into code when going from
Ofast is the fastest.
Ofast = O3 + -ffast-math, so let’s compare
Ofast is still the undisputed winner. We also note that
fastmath is almost performing like
O0 which tells us all the important optmizations are happening with
fastmath isn’t helping us much.
Despite this, let’s continue our focus on
Ofast. A disclaimer that the algorithm at hand is nice in that it still works as expected while not having the luxury of IEEE/ANSI standards conformity. If that isn’t case,
O3 is harmful.
When we look at the optimization report, we see things which we expect.
I am only focusing on code related with this algorithm in the optimization report. This cuts short the entire report from
25523 lines to
It seems like there are many failures at optimizations and yet the performance is greatly improved. I don’t understand everything about it but I can start to see the possible ways to optimize it.
Enough to let me sleep tonight.
Compiler optimizations literally turn my lead of code into gold, hence it’s no less than alchemy.