Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Stochastic Program Optimization (acolyer.org)
89 points by mping on March 31, 2017 | hide | past | favorite | 18 comments


The saxpy example is very strange. They clearly show that the left side (gcc -O3) doesn't generate any SIMD code and that their "STOKE" optimizer does. This isn't a fair comparison. All they need to do is add an -march flag (and apparently the restrict keyword):

https://goo.gl/7Ll4dp

Now the GCC results look pretty much exactly like the "STOKE" results...


Exactly, the STOKE results shown here are actually inferior to those produced by both gcc and clang when you actually know how to use the compiler and restrict pointer aliasing.

Are there any actual examples of stochastic optimization techniques outperforming compilers, rather than programmers' woefully lacking knowledge of compilers?


> STOKE uses a sound procedure to validate the equality of loop-free code sequences... both sequences produce the same side effects on live outputs when executed from the same initial machine state.

Well, if in effect STOKE is doing dynamic alias analysis, then it's not really inferior, right? Maybe it could be a PGO-style compiler pass?

I think that some of us might overestimate the capability of most software teams, because it's pretty uncommon to find people who know what it means to add __restrict to function args.


While I guess it's true that not all that many people understand the restrict annotations in the various compilers, I don't think it's an unreasonable expectation from people writing saxpy-like computational kernels. It's certainly a prerequisite to doing research work on optimizations like this.

STOKE isn't really doing dynamic alias analysis per se, although if run for long enough its results should theoretically be equivalent to such analysis. As the example assembly shows, however, it's far from a sure thing. More generally, things like PGO exist because they specifically do things traditional compiler-level optimizers fundamentally can't. For STOKE to make a similar claim we should see much better code examples than the one shown here.


>> uncommon to find people who know what it means to add __restrict

I disagree, most people will add this keyword when optimizing code.


STOKE also replaced IMULL with PMULLW and ADDL with PADDW (32 vs 16 bit).


On a related note, as far as I can tell the STOKE code seems to be incorrect w.r.t. aliasing between y and x, i.e. if x == y + 1. Makes me wonder what SMT solver claims are worth in the literature.


I've worked with SMT solvers, Z3 in particular, before and it's definitely doable to prove two pieces of code equivalent, but that depends on the implementation used. Unfortunately, most academics, even when publishing at very influential conferences, rarely present their code for audit.


STOKE is on github here: https://github.com/StanfordPL/stoke.

Most CS conferences are making a big push towards "artifact evaluation," which means that many papers now come with peer-reviewed implementations.


Checking whether your code is correct under aliasing has nothing to do with the SMT solver. The solver can only analyze the model of the program given to it and it looks like the STOKE memory model assumes that the arguments don't alias.


-march=native shows interesting results too


I recall reading about this from the same guys a few years ago (I'm not from this field so I can't really figure out what's actually different):

http://theory.stanford.edu/~aiken/publications/papers/asplos...

I thought it was really cool that they managed to find improvements on hand-optimized bit tricks from "Hacker's Delight" this way.


Does anyone know what “fixed-point” in “fixed-point x86_64 assembly code sequence” in the article means? I can think of three things (function combinator, fixed point of a real-valued function, fixed point arithmetic), none of which make sense to me in the context.


It means fixed-point arithmetic. Floating point in SMT doesn't work very well, so that's why they have this restriction.


A friend wondered how they can make their improvements to SAXPY and claim they're correct w/o using the restrict keyword.


This is a bit more low-level than I comprehend. Would some one mind explaining the significance and use of this?


This is extending Massalin's idea of superoptimization (https://www2.cs.arizona.edu/~collberg/Teaching/553/2011/Reso...), which is a practical application of Gregory Chaitin's work on applying information theory to algorithms (https://www.cs.auckland.ac.nz/~chaitin/acm75.pdf) with better search methods than exhaustive search.


Weird that they mention using MCMC and Metropolis Hastings but don't mention simulated annealing, which would be the logical next step.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: