Clue is an experiment and probably not useful for real work --- go look at the memory model and you'll see why. (C89 allows for some really weird but standards-compliant architectures.) ints are about 56 bits wide and their value is undefined on rollover, for example. Plus it's not finished; varargs and switch are the big missing factor.
Regarding the node version: because I forgot to update the version number on the website. It was actually 0.6.19. Updated. JS performance is heavily penalised due to the aforesaid goto issue, though. (Procedural languages without goto are toys, dammit.)
Regarding C optimisation: yes, precisely. However this does put LuaJIT at an unfair advantage since it can unroll loops to its hearts' content, while gcc can't. It's probably worth rerunning at -O3 just to see what's different.
Incidentally, I have 2/3 of a Common Lisp backend (someone contributed a backend but not the run-time library). Anyone want to complete it?
Great work David, glad to see an update after almost 5 years!
I looked into Clue for compiling C to Common Lisp before writing https://github.com/vsedach/Vacietis because I wanted something that would interop better with CL types and be able to run self-contained.
Why is Lua 5.2 so much faster than Lua 5.1? Lua 5.2 supports a new goto keyword. This is incredibly useful when doing this kind of compilation as it allows me to pass execution directly from basic block to basic block. Lua 5.1 doesn't have this, which means I have to fake goto using what boils down to a switch statement. This is much less efficient.
Looking at the source "boils down to a switch statement" appears to be a chain of ifs. It makes me wonder how fast it would have been to use tail calls.
Also possibly worth noting is that lua 5.1 bytecode has gotos, if you're willing to step down a level.
Clue 0.3 did precisely that. Unfortunately it required a patched Lua to produce the special bytecode. I decided that that was cheating --- plus it meant the Lua backend wouldn't generate code for LuaJIT.
Changing it to generate tail calls would be an excellent idea, but to do it right would require a major rewrite of the backend (most of the code generator is common code).
Keep in mind that the benchmark in question (whetstone) is at best a microbenchmark, and doesn't necessarily reflect the performance of real apps... Still, it demonstrates how well LuaJIT can nail this sort of tight inner-loop code.
Almost everywhere (Linux kernel included IIRC) use -Os now as the effects of the memory hierarchy are much more important than raw instructions per second. A saved stall is worth hundreds of instructions
Linux defaults to -O2, -Os can be switched with CONFIG_CC_OPTIMIZE_FOR_SIZE. Arch Linux doesn't enable it, dunno about others.
-Os isn't a silver bullet since it enables use of some high level instructions which may be implemented less efficiently on modern CPUs and reduces code alignment possibly causing some short functions or loops to span multiple cache lines.
GOTO and weird optimization may be very useful in some circumstances but there's a case where they'll never be desirable: security.
I'm 100% that the future security-wise is stuff like esL4 (the L4 micro-kernel, but which has been formally verified to be free from a lot of common mistakes typically leading to security exploits).
So before criticizing things as "toys" because they have don't have goto etc. you have to realize that there are ends (security) that do justify quite some means.
And don't get me wrong: I've done my faire share of 680x0 and 80x86 assembly coding and just loved to be "in control" of everything. It gave me a sensation of power.
But now I much prefer to look far ahead and dream about the days where we'll be able to use provers on not just micro-kernels that are 7000 lines long (and already on such a trivial number of lines already find hundreds of potential security exploits which have all been fixed) but also use provers on much bigger programs.
So saying: "I want to be able to modify a lookup table by accessing two bytes as if they were a 16-bit word so that on the next pass I'll automagically JMP to this place" (I'm just making that up) is, IMHO, a bit shortsighted when considering the real problems we face today.
Most people have way enough power and totally underused computers (often with many cores idling). The problem is hardly CPU perfs.
You don't have to verify the code with gotos, you have to verify the translator. After all, if you were serious about this, you'd have to stop using most computers since they do actually run programs with JMPs in in.
Monads are basically continuations are basically GOTOs. If you have a useful type system (like Haskell, not like Java), you get the formal verification of correctness done for free by your type checker.
Clue is an experiment and probably not useful for real work --- go look at the memory model and you'll see why. (C89 allows for some really weird but standards-compliant architectures.) ints are about 56 bits wide and their value is undefined on rollover, for example. Plus it's not finished; varargs and switch are the big missing factor.
Regarding the node version: because I forgot to update the version number on the website. It was actually 0.6.19. Updated. JS performance is heavily penalised due to the aforesaid goto issue, though. (Procedural languages without goto are toys, dammit.)
Regarding C optimisation: yes, precisely. However this does put LuaJIT at an unfair advantage since it can unroll loops to its hearts' content, while gcc can't. It's probably worth rerunning at -O3 just to see what's different.
Incidentally, I have 2/3 of a Common Lisp backend (someone contributed a backend but not the run-time library). Anyone want to complete it?