This link doesn't quite give what you are after (it's mostly about static compilation in the Java HotSpot compiler), but I believe the lock elision features (http://www.ibm.com/developerworks/java/library/j-jtp10185/in...) have to be done at runtime in the JVM (because of late binding).
Obviously this doesn't totally invalidate your argument ("except in the cases where runtime code loading is used"), but it is worth noting that in many languages late binding is normal, and so this is the general case.
Also, HP's research Dynamo project "inadvertently" became practical. Programs "interpreted" by Dynamo are often faster than if they were run natively. Sometimes by 20% or more.http://arstechnica.com/reviews/1q00/dynamo/dynamo-1.html
This is a common misinterpretation of the Dynamo paper: they compiled their C code at the _lowest_ optimization level and then ran the (suboptimal) machine code through Dynamo. So there was actually something left to optimize.
Think about it this way: a 20% difference isn't unrealistic if you compare -O1 vs. -O3.
But it's completely unrealistic to expect a 20% improvement if you'd try this with the machine code generated by a modern C compiler at the highest optimization level.
This link doesn't quite give what you are after (it's mostly about static compilation in the Java HotSpot compiler), but I believe the lock elision features (http://www.ibm.com/developerworks/java/library/j-jtp10185/in...) have to be done at runtime in the JVM (because of late binding).
Obviously this doesn't totally invalidate your argument ("except in the cases where runtime code loading is used"), but it is worth noting that in many languages late binding is normal, and so this is the general case.
Also, HP's research Dynamo project "inadvertently" became practical. Programs "interpreted" by Dynamo are often faster than if they were run natively. Sometimes by 20% or more. http://arstechnica.com/reviews/1q00/dynamo/dynamo-1.html