# A case study on RFI excision algorithm

I always knew optimization options are a big deal, but, this case study has made me a preacher of GCC optmization. As if compilers themselves are not magically enough, compiler optimizations, in addition to that magic, add a whole new dimension to the alchemy.

## Algorithm

Before I delve into the optimization options, it will be worthwhile to talk about the simple yet effective algorithm to excise (or simply filter) Radio Frequency Interferences (RFIs).
Given a filterbank data which is essentially a `nsamps X nchans`

matrix where element is the power corresponding to the time and frequency.
We first compute time and frequency marginalized distributions. From these distributions, we compute a measure of standard deviation which we use to flag certain samples which don’t follow the same distribution.

Computationally speaking, there are the following steps:

- Full filterbank traversal –
`nsamps X nchans`

- Measure of std. dev. –
`nsamps`

,`nchans`

- Flagging on the marginalized distributions –
`nsamps`

,`nchans`

- Filtering –
`nsamps X nchans`

In a typical setting, `nchans=4096`

and `nsamps=1280`

for a second of filterbank data. So, enough to say, there are many array operations.

## GCC Optimization options

Official thank you to this GNU GCC page

There are 5 optimization options I will be playing with. I will be passing `-ggdb`

in all the builds.
I am using the unity build design principle (checkout this link here) where I don’t compile individual objects and then link together. Instead, I put everything (by everything I mean all the class definitions I use) in one big file.
I don’t care about compile times because I know my binaries won’t be that big.

Those five are:

- O1
- O2
- O3
- Os
- Ofast

## Results

`Ofast`

is worthy of it’s name. It’s the fastest and it’s my new favorite.

## Optimization reports

Let’s just focus on changes brought into code when going from `O0`

to `Ofast`

. Because `Ofast`

is the fastest.

Now, `Ofast = O3 + -ffast-math`

, so let’s compare `Ofast`

and `fast-math`

.

Clearly, `Ofast`

is still the undisputed winner. We also note that `fastmath`

is almost performing like `O0`

which tells us all the important optmizations are happening with `O3`

and `fastmath`

isn’t helping us much.

Despite this, let’s continue our focus on `Ofast`

. A disclaimer that the algorithm at hand is nice in that it still works as expected while not having the luxury of IEEE/ANSI standards conformity. If that isn’t case, `O3`

is harmful.

When we look at the optimization report, we see things which we expect.
I am only focusing on code related with this algorithm in the optimization report. This cuts short the entire report from `25523`

lines to `1259`

lines.

It seems like there are many failures at optimizations and yet the performance is greatly improved. I don’t understand everything about it but I can start to see the possible ways to optimize it.

Enough to let me sleep tonight.

Compiler optimizations literally turn my lead of code into gold, hence it’s no less than alchemy.