Had a brief look at asmjit; as it seems it only supports x86 and x86_64 and is not really an abstraction (i.e. a platform independent IR). I will try to find out why they didn't use e.g. LLVM or sljit.
LLVM is much too heavyweight for a JIT. It's slow at generating code, and makes it difficult to implement common JIT optimizations such as inline caches, which rely on code-patching.
If using LuaJIT it would be faster than CPython for sure, maybe even as fast as PyPy. Here are some measurement results comparing the C++ SOM interpreter with a LuaJIT bytecode compiler: https://github.com/rochus-keller/Som#a-som-to-luajit-bytecod.... The bytecode version of the SOM benchmark running on LuaJIT is about factor 24 faster than the C++ based interpreter. But the Gral implementation of SOM is even faster.
(Which is, of course, ironic given that the entire original goal of LLVM was only to build a JIT; it was too slow, but made an OK enough compiler backend ;P.)
> the entire original goal of LLVM was only to build a JIT
That's not exactly true from what I recall of speaking with Vikram and Chris. The idea was to build a compiler framework that could support profile-guided optimization (and other then-advanced JIT techniques) for idle-time or install-time optimization of C/C++ codes. Chris's original thesis doesn't even mention JIT compilers, except as comparison point for techniques, and instead focuses on things like link-time optimization. (Indeed, Chris offered LLVM to replace gcc's then-broken LTO infrastructure, but this was turned down.)
Clearly, Chris's thought process has to be correct to some real extent and mostly must be deferred to in the end; but, I was a grad student interested in the Low-Level Virtual Machine at the time, and I remember it was absolutely considered by everyone to be a runtime execution system with a JIT ;P and what made it unique vs. "high-level" virtual machines was that it was going to allow unsafe code, trusted by default, but otherwise work like any other virtual machine, allowing for JIT-like benefits (profile guided being one, as you mention)... I am totally willing to believe that everyone misunderstood the project at the time and the at the well over a decade since has corrupted what memory I have of the discussions.
There is an aarch64 branch in AsmJit project that provides an experimental AArch64 backend. It's pretty complete.
Having a platform independent IR in AsmJit is not planned. AsmJit is more about control and ISA completeness. IR could of course be implemented on top of AsmJit, but it seems nobody really needed that so far - users that need complex IR can use LLVM and users that want something really simple can use SLJIT. AsmJit is just in a different category and it has a lot of useful features considering its size.
Also keep in mind that the lua speedup is in interpreted vs compiled, and erlang is already a compiled language that had a fairly performant intermediate bytecode.
I had a look at https://github.com/asmjit/asmjit/tree/master/src/asmjit and didn't find any indication that DynASM is used. Anyway, DynASM and LuaJIT are available for many different architectures, not only x86 and x86_64.
This quote from your reference gives the answer: We also considered using dynasm, but found the tooling that comes with asmjit to be better.
EDIT: But it still gives no answer why they preferred asmjit in favour of other solutions than LLVM or DynASM (where the latter is just an assembler which helps to integrate with C code, not a platform abstraction like LLVM or others).
Had a brief look at asmjit; as it seems it only supports x86 and x86_64 and is not really an abstraction (i.e. a platform independent IR). I will try to find out why they didn't use e.g. LLVM or sljit.
EDIT: according to this article (https://www.erlang-solutions.com/blog/performance-testing-th...) the speed-up factor caused by the JIT is about 1.3 to 2.3 (as a comparison the speed-up between the PUC Lua 5.1 interpreter and LuaJIT 2.0 is about factor 15 in geometric mean over all benchmarks, see http://luajit.org/performance_x86.html).