The problem I've always had with FFTs is that it's incredibly difficult to write an optimal FFT for X size, Y direction, Z variability (X=2^4-2^64, Y=FWD,REV,BIDIR, Z=2^4-2^(log2(X)))
Does the metaprogramming element of FFTW work well, or does it boil down to "If X=2^4, Y=FWD, Z=X: build24FWDX()"?
Based another comment by someone who studied the source, it sounds like the ocaml code generates C code for the various optimized kernels. Wonder how the C kernel chooser code looks like.
FWIW, proper hygienic macros and scientific focused array types makes this sort of meta-programming optimization in Julia kinda fun.
Though just using type dispatching / multi-methods might be all that’s needed. It all makes me want to have a reason to write that kind of optimized code... For example if you can lift the size, variable, and variability into types you could match on your config roughly like: “fft(X :: Pwr2_64, Y :: FWD, Dir :: REV, Var :: BIDIR, Z) = 2^4-2^(log2(X))”.
This was back before Julia could do auto-SIMD optimizations, and it got quite close to FFTW without SIMD. He mentioned somewhere that modern Julia should be able to get very close to FFTW with SIMD, given that Julia now has better inlining heuristics, interprocedural optimizations, automatic SIMD, etc. (2014 was the stone age for Julia). If you read the rest of the thread, most of the discussion was about a standard library system so FFTW could be moved out. With Julia v1.0 FFTW is now in the standard library, which gives room for a pure Julia FFT to be standard. So this should get revived soon and we will have a beautiful generic FFTW algorithm. I can't wait :)
Yeah, I meant to say it's no longer in the standard library. It's not given special privileges anymore, so we are free to iterate alternatives to it. I'm not sure how to edit and fix that post (there's another typo in there :( )
Does the metaprogramming element of FFTW work well, or does it boil down to "If X=2^4, Y=FWD, Z=X: build24FWDX()"?