Win32/64 C Compiler Benchmarks

DarkShikari · on April 2, 2012

If the author is reading this, x264 now officially supports the Intel compiler, which should make it much easier to benchmark.

Additionally, x264 should probably categorized under "no significant floating point calculations".

nimrody · on April 2, 2012

Besides performance, Intel's compiler offers somewhat better error messages / warnings

(Not affiliated with Intel. Just very satisfied with their performance on numeric-heavy workloads).

Ralith · on April 2, 2012

Disappointed to see no clang results, especially as he discusses compilation speed.

octopus · on April 2, 2012

While you can install Clang on Windows under Mingw and you can compile C codes, it runs really slow (this is only from my own experience).

Ralith · on April 2, 2012

Really? Any idea why, when it excels so well speed-wise on Linux and OSX?

octopus · on April 2, 2012

I don't know why it is slower, I've just noticed that it is considerably slower than gcc and VC.

As a side note, on my Mac, Clang is really fast for compiling code.

justincormack · on April 2, 2012

If you follow the email thread, you can see that the big math differences are based on libraries, not part of the compiler per se. Not to say they do not need improving.

http://gcc.gnu.org/ml/gcc/2012-01/msg00215.html

berkut · on April 2, 2012

It's not just the math libraries (although these are indeed much better - especially under Linux), its loop unrolling and vectorization using intrinsics are much better than other compilers in my experience.

However, no compiler I've found can get close to hand-crafted intrinsics in non-trivial cases.

justincormack · on April 2, 2012

That may well still be so, but these benchmarks did not seem to pick that up, I dont think any of the code was significantly vectorizable anyway....

AshleysBrain · on April 2, 2012

I've heard Intel's compiler makes optimisations that best favour the specifics of Intel CPUs. All other compilers are likely to make CPU-neutral optimisations. Do you think the results would be much different if run on an AMD chip?

DarkShikari · on April 2, 2012

Last I heard (according to Agner), Intel's compiler still intentionally incorrectly detects AMD's CPUs and throws them to the CPU-generic code. x264 has a hacked loader function (borrowed from Agner) to avoid this, though I don't know if the test used it.

Now I'm not sure how much this'd actually affect. The autovectorization in Intel's compiler is weak and, at least on Win64, sse2 is allowed in normal code without CPU dispatching. It does affect library functions like math and memcpy, but that'll only matter if your program spends a ton of time in them.

kiloaper · on April 2, 2012

Wasn't there some legal ruling about that? I found some discussions [1] about it. The section about CPU vendor string manipulation is very interesting.

[1] http://www.agner.org/optimize/blog/read.php?i=49

rogerbinns · on April 2, 2012

He should have disabled the "Turbo" setting on the processor otherwise results will be somewhat randomized. It can take a while for the processor to switch into turbo mode, and the decision to do so can be based on other work being done. Additionally prior thermal load can result in turbo being disabled until temperatures reduce to normal levels.

michaelhoffman · on April 2, 2012

These results are useless for me since they turned on compiler "optimizations" that may lead to incorrect results in floating point calculations.

richurd · on April 2, 2012

I found this pretty inane. He used MinGW instead of native gcc on GNU/Linux.

markokocic · on April 2, 2012

Mingw _is_ native gcc on Windows. Even more native than Visual C, since it doesn't need additional c runtime redistributable, like msvc needs.

Malus · on April 2, 2012

The point of the benchmarks is to compare compilers on the Windows platform, so there is no reason to bench other operating systems.