A case study on RFI excision algorithm
I always knew optimization options are a big deal, but, this case study has made me a preacher of GCC optmization. As if compilers themselves are not magically enough, compiler optimizations, in addition to that magic, add a whole new dimension to the alchemy.
Algorithm
Before I delve into the optimization options, it will be worthwhile to talk about the simple yet effective algorithm to excise (or simply filter) Radio Frequency Interferences (RFIs).
Given a filterbank data which is essentially a nsamps X nchans
matrix where element is the power corresponding to the time and frequency.
We first compute time and frequency marginalized distributions. From these distributions, we compute a measure of standard deviation which we use to flag certain samples which don’t follow the same distribution.
Computationally speaking, there are the following steps:
- Full filterbank traversal –
nsamps X nchans
- Measure of std. dev. –
nsamps
,nchans
- Flagging on the marginalized distributions –
nsamps
,nchans
- Filtering –
nsamps X nchans
In a typical setting, nchans=4096
and nsamps=1280
for a second of filterbank data. So, enough to say, there are many array operations.
GCC Optimization options
Official thank you to this GNU GCC page
There are 5 optimization options I will be playing with. I will be passing -ggdb
in all the builds.
I am using the unity build design principle (checkout this link here) where I don’t compile individual objects and then link together. Instead, I put everything (by everything I mean all the class definitions I use) in one big file.
I don’t care about compile times because I know my binaries won’t be that big.
Those five are:
- O1
- O2
- O3
- Os
- Ofast
Results
Ofast
is worthy of it’s name. It’s the fastest and it’s my new favorite.
Optimization reports
Let’s just focus on changes brought into code when going from O0
to Ofast
. Because Ofast
is the fastest.
Now, Ofast = O3 + -ffast-math
, so let’s compare Ofast
and fast-math
.
Clearly, Ofast
is still the undisputed winner. We also note that fastmath
is almost performing like O0
which tells us all the important optmizations are happening with O3
and fastmath
isn’t helping us much.
Despite this, let’s continue our focus on Ofast
. A disclaimer that the algorithm at hand is nice in that it still works as expected while not having the luxury of IEEE/ANSI standards conformity. If that isn’t case, O3
is harmful.
When we look at the optimization report, we see things which we expect.
I am only focusing on code related with this algorithm in the optimization report. This cuts short the entire report from 25523
lines to 1259
lines.
It seems like there are many failures at optimizations and yet the performance is greatly improved. I don’t understand everything about it but I can start to see the possible ways to optimize it.
Enough to let me sleep tonight.
Compiler optimizations literally turn my lead of code into gold, hence it’s no less than alchemy.