Stochastic Program Optimization

AbacusAvenger · on March 31, 2017

The saxpy example is very strange. They clearly show that the left side (gcc -O3) doesn't generate any SIMD code and that their "STOKE" optimizer does. This isn't a fair comparison. All they need to do is add an -march flag (and apparently the restrict keyword):

https://goo.gl/7Ll4dp

Now the GCC results look pretty much exactly like the "STOKE" results...

keldaris · on March 31, 2017

Exactly, the STOKE results shown here are actually inferior to those produced by both gcc and clang when you actually know how to use the compiler and restrict pointer aliasing.

Are there any actual examples of stochastic optimization techniques outperforming compilers, rather than programmers' woefully lacking knowledge of compilers?

wyldfire · on March 31, 2017

> STOKE uses a sound procedure to validate the equality of loop-free code sequences... both sequences produce the same side effects on live outputs when executed from the same initial machine state.

Well, if in effect STOKE is doing dynamic alias analysis, then it's not really inferior, right? Maybe it could be a PGO-style compiler pass?

I think that some of us might overestimate the capability of most software teams, because it's pretty uncommon to find people who know what it means to add __restrict to function args.

keldaris · on March 31, 2017

While I guess it's true that not all that many people understand the restrict annotations in the various compilers, I don't think it's an unreasonable expectation from people writing saxpy-like computational kernels. It's certainly a prerequisite to doing research work on optimizations like this.

STOKE isn't really doing dynamic alias analysis per se, although if run for long enough its results should theoretically be equivalent to such analysis. As the example assembly shows, however, it's far from a sure thing. More generally, things like PGO exist because they specifically do things traditional compiler-level optimizers fundamentally can't. For STOKE to make a similar claim we should see much better code examples than the one shown here.

frozenport · on March 31, 2017

>> uncommon to find people who know what it means to add __restrict

I disagree, most people will add this keyword when optimizing code.

zbjornson · on March 31, 2017

STOKE also replaced IMULL with PMULLW and ADDL with PADDW (32 vs 16 bit).

wohlergehen · on March 31, 2017

On a related note, as far as I can tell the STOKE code seems to be incorrect w.r.t. aliasing between y and x, i.e. if x == y + 1. Makes me wonder what SMT solver claims are worth in the literature.

Karrot_Kream · on March 31, 2017

I've worked with SMT solvers, Z3 in particular, before and it's definitely doable to prove two pieces of code equivalent, but that depends on the implementation used. Unfortunately, most academics, even when publishing at very influential conferences, rarely present their code for audit.

wsxcde · on April 1, 2017

STOKE is on github here: https://github.com/StanfordPL/stoke.

Most CS conferences are making a big push towards "artifact evaluation," which means that many papers now come with peer-reviewed implementations.

wsxcde · on April 1, 2017

Checking whether your code is correct under aliasing has nothing to do with the SMT solver. The solver can only analyze the model of the program given to it and it looks like the STOKE memory model assumes that the arguments don't alias.

fla · on March 31, 2017

-march=native shows interesting results too

rawnlq · on March 31, 2017

I recall reading about this from the same guys a few years ago (I'm not from this field so I can't really figure out what's actually different):

http://theory.stanford.edu/~aiken/publications/papers/asplos...

I thought it was really cool that they managed to find improvements on hand-optimized bit tricks from "Hacker's Delight" this way.

sedachv · on March 31, 2017

Does anyone know what “fixed-point” in “fixed-point x86_64 assembly code sequence” in the article means? I can think of three things (function combinator, fixed point of a real-valued function, fixed point arithmetic), none of which make sense to me in the context.

wsxcde · on April 1, 2017

It means fixed-point arithmetic. Floating point in SMT doesn't work very well, so that's why they have this restriction.

prestonbriggs · on March 31, 2017

A friend wondered how they can make their improvements to SAXPY and claim they're correct w/o using the restrict keyword.

tjpaudio · on March 31, 2017

This is a bit more low-level than I comprehend. Would some one mind explaining the significance and use of this?

sedachv · on March 31, 2017

This is extending Massalin's idea of superoptimization (https://www2.cs.arizona.edu/~collberg/Teaching/553/2011/Reso...), which is a practical application of Gregory Chaitin's work on applying information theory to algorithms (https://www.cs.auckland.ac.nz/~chaitin/acm75.pdf) with better search methods than exhaustive search.

contravariant · on March 31, 2017

Weird that they mention using MCMC and Metropolis Hastings but don't mention simulated annealing, which would be the logical next step.