Skip to content

Optimize FFT on Complex<T> #99

@Shnatsel

Description

@Shnatsel

Parallel in-place deinterleaving from #95 only gets us about on par with RustFFT for large sizes.

Here are some ideas on how we could go faster.

Parallel out-of-place interleave/deinterleave

Instead of .zip()/.unzip() use Rayon's .unzip_to_vecs() or persistent scratch buffers for deinterleaving and plain old .par_iter() for interleaving. #100

Complexity: Needs more explicit SIMD instead of iterator autovectorization, but quite straightforward. Needs heuristics for switching over to the parallel impl.

Trade-offs: Out-of-place, higher memory use.

Fuse deinterleaving with BRAVO and interleaving with last step of FFT

We're loading/storing data in there anyway, so might as well also do the interleaving transformations while we're there.

Complexity: An additional variant for BRAVO and for one FFT pass. The FFT pass adaptation is kinda awkward for DiT because the final pass is the one that spans both halves. In DiF it would be the butterflies of size 2 instead, which are way easier to parallelize. We can probably make it work with the plan-and-execute trick like in #95/#97 with some modest overhead.

Trade-offs: Out-of-place, higher memory use. No more 2-thread BRAVO, which wouldn't matter if #97 pans out. Might require unsafe non-temporal stores to go fast on a single thread on x86.

Deinterleave/interleave in each FFT kernel

If we're in ALU-rich, bandwidth-poor regime, then extra math in each kernel might not be that bad.

Trade-offs: More pervasive compute cost. No more 2-thread BRAVO, which wouldn't matter if #97 pans out. Needs a dedicated BRAVO variant, which should be straightforward.

Specialized interleaved kernels

Instead of deinterleaving before and interleaving after in each kernel, adapt the math to operate on interleaved values directly. This is what other FFT implementations appear to do.

Complexity: writing an entire new set of kernels.

Trade-offs: same as the previous option, maybe slightly cheaper

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions