The saxpy example is very strange. They clearly show that the left side (gcc -O3) doesn't generate any SIMD code and that their "STOKE" optimizer does. This isn't a fair comparison. All they need to do is add an -march flag (and apparently the restrict keyword):
Exactly, the STOKE results shown here are actually inferior to those produced by both gcc and clang when you actually know how to use the compiler and restrict pointer aliasing.
Are there any actual examples of stochastic optimization techniques outperforming compilers, rather than programmers' woefully lacking knowledge of compilers?
> STOKE uses a sound procedure to validate the equality of loop-free code sequences... both sequences produce the same side effects on live outputs when executed from the same initial machine state.
Well, if in effect STOKE is doing dynamic alias analysis, then it's not really inferior, right? Maybe it could be a PGO-style compiler pass?
I think that some of us might overestimate the capability of most software teams, because it's pretty uncommon to find people who know what it means to add __restrict to function args.
While I guess it's true that not all that many people understand the restrict annotations in the various compilers, I don't think it's an unreasonable expectation from people writing saxpy-like computational kernels. It's certainly a prerequisite to doing research work on optimizations like this.
STOKE isn't really doing dynamic alias analysis per se, although if run for long enough its results should theoretically be equivalent to such analysis. As the example assembly shows, however, it's far from a sure thing. More generally, things like PGO exist because they specifically do things traditional compiler-level optimizers fundamentally can't. For STOKE to make a similar claim we should see much better code examples than the one shown here.
On a related note, as far as I can tell the STOKE code seems to be incorrect w.r.t. aliasing between y and x, i.e. if x == y + 1.
Makes me wonder what SMT solver claims are worth in the literature.
I've worked with SMT solvers, Z3 in particular, before and it's definitely doable to prove two pieces of code equivalent, but that depends on the implementation used. Unfortunately, most academics, even when publishing at very influential conferences, rarely present their code for audit.
Checking whether your code is correct under aliasing has nothing to do with the SMT solver. The solver can only analyze the model of the program given to it and it looks like the STOKE memory model assumes that the arguments don't alias.
Does anyone know what “fixed-point” in “fixed-point x86_64 assembly code sequence” in the article means? I can think of three things (function combinator, fixed point of a real-valued function, fixed point arithmetic), none of which make sense to me in the context.
https://goo.gl/7Ll4dp
Now the GCC results look pretty much exactly like the "STOKE" results...