I've heard the argument before that in need, one can use a FFI to optimize bottlenecks in high-level code, but I've never understood.
Won't using a high-level language incur an omnipresent speed slump? And even if a bottleneck exists, how would using a FFI remedy crucial problems in the language, like the absence of unsigned types or that all types are boxed. The types will have to be unboxed anyway, so whether that happens in foreign code or in the interpreter/JIT code won't matter.
At least in Haskell you have unboxed primitive types, memory mapped IO, bump-pointer allocation, and compilation to direct loops that are often identical to what GCC produces (or very close).
> Won't using a high-level language incur an omnipresent speed slump?
Yes, but most programs don't require high performance everywhere - in a library like JGit for instance, most operations are probably plenty fast written in Java even for very large projects; it's likely only a few are problematic.
> And even if a bottleneck exists, how would using a FFI remedy crucial problems in the language, like the absence of unsigned types or that all types are boxed.
That's maybe an argument to allow more control over memory layout and machine representation in high level languages - although there are ways around this, like defining your data types as a C++ class and then providing a high level binding.
> Won't using a high-level language incur an omnipresent speed slump?
Not sure that is true. Just look at pypy(http://pypy.org/) which claims that run-time optimizations in the interpreted interpreter outperforms the C interpreter, and quite significantly in many cases. So I don't think it's true that high-level languages are always slower. It has a lot to do with the optimizations you can do at run-time. There is also an interesting paper on developing an OS based on run-time code synthesis for optimizing performance (http://valerieaurora.org/synthesis/SynthesisOS/). The major drawback of languages like C is that it can only optimize things at compile-time. I think as projects get larger and we move towards parallel structures and algorithms the need for languages that support run-time optimizations will be greeter.
You're right that the FFI can create significant friction, but once you're in C-land, you get C-level performance. So you need to move whole algorithms into C. In a O(n²) algorithm, the O(n) FFI friction will be negligible for a large enough value of n.
like the absence of unsigned types or that all types are boxed
It isn't always that straightforward. With Java, if you move your code into C you may also need to keep all of your data in C-land to avoid the overhead of copying it back and forth. Then the data is harder to access from Java, plus you can't rely on garbage collection to free that memory when you're done with it.
Won't using a high-level language incur an omnipresent speed slump? And even if a bottleneck exists, how would using a FFI remedy crucial problems in the language, like the absence of unsigned types or that all types are boxed. The types will have to be unboxed anyway, so whether that happens in foreign code or in the interpreter/JIT code won't matter.