Thanks for tracking that one down. Summary: the FFTW people claim they weren’t t...

mpreda · on Oct 17, 2018

Prime95 https://www.mersenne.org/download/ has what is probably the fastest CPU FFT implementation; specialized and carefully tuned to Intel & AMD CPUs, with AVX assembly.

My project GpuOwl https://github.com/preda/gpuowl/ has a GPU double-precision FFT convolution implementation which is faster (and simpler IMO) than clFFT.

radarsat1 · on Oct 17, 2018

curious at what size GPU becomes faster than CPU ? (taking into account memory transfer time..)

ChrisRackauckas · on Oct 17, 2018

>I’m curious if anyone has a page with updated benchmarks for various sizes of FFTs on modern CPUs and maybe GPUs, testing whatever FFT libraries out there are still being actively maintained (including those which limit themselves to power-of-2 sizes)

This is the most updated that I know of online (2014)

https://github.com/JuliaLang/julia/pull/6193#issuecomment-61...

If the array is large enough for a GPU, then yes a GPU version would rip it to shreds (not shown, but known). You can see the MKL advantage in there, but FFTW still performs pretty admirably. Everyone pretty much uses MKL or FFTW these days. I don't think there's as much development in this space because these are all pretty well-optimized now. Multi-process parallel for these is supposedly not difficult and can use a very similar strategy, but GPU really requires some different approaches and that's where the development is these days. Also, a lot of mathematicians have had a more recent focus on advancing the realm of spectral discretizations beyond FFTs, for example see Chebfun and ApproxFun, and things like FastTransforms.jl where the interesting this is non-uniformly spaced FFTs and similar transforms.

jacobolus · on Oct 17, 2018

> for example see Chebfun and [Julia port] ApproxFun

These are great technology which should be widely known and adopted. They are designed for working with functions defined on an interval rather than a periodic domain, and doing a bunch more with them. http://www.chebfun.org

Ironically in the context of this discussion, these tools could get a whole heck of a lot faster with some dedicated effort from some low-level optimization experts.

wyldfire · on Oct 17, 2018

Popular GPU FFT implementations are cuFFT [1] and clFFT [2]. IIRC cuFFT performs very well.

[1] https://developer.nvidia.com/cufft

[2] https://clmathlibraries.github.io/clFFT/