Why will performance go up when shifting from compilation targeting C to CPS-style compilation to assembly without compiler optimizations implemented? C compilers are crammed full of optimizations, so I would naively think that writing your own code generator is a lose in both complexity and performance.