You wrote:

However, for such a small transform, you're seeing a lot of overhead in fft(a) from setting up the FFT plan, setting up the trig tables, etcetera. To really get the full benefit of FFTW, you need to create a precomputed plan:
julia> p = plan_fft(a);
julia> @time b3 = p*a;
0.002879 seconds (1.39 k allocations: 97.771 KB)
julia> @time b3 = p*a;
0.000026 seconds (10 allocations: 32.375 KB)
On my machine, my_fft(a) is about 6 times lower than fft(a), but about 33 times slower than p*a.
You wrote:

However, for such a small transform, you're seeing a lot of overhead in
fft(a)from setting up the FFT plan, setting up the trig tables, etcetera. To really get the full benefit of FFTW, you need to create a precomputed plan:On my machine,
my_fft(a)is about 6 times lower thanfft(a), but about 33 times slower thanp*a.