Glancing at the source, I think the whitespace parsing could be improved. Since ...

acqq · on July 26, 2013

In every line there are more consecutive whitespaces at the begining unless the code isn't indented.

nkurz · on July 26, 2013

Great, that sounds like a fine opportunity for a "best guess": if newline, expect whitespace. Currently, I'd guess that 'whitespace except newline' is the default prediction for the switch() at line 451. I'd also guess that if not followed by a space or tab, a newline is frequently followed by another newline.

Maybe you could combine the case statements for space and newline, and do a branchless 'cmov' to increment loc.linenum if the match was a newline? This could be combined with loop to grab all the whitespace/newlines in one if you think whitespace is occurs in clumps.

acqq · on July 26, 2013

FWIW I'd never do a CPU dependent asm code in the lex.

nkurz · on July 26, 2013

I played with how to phrase that. You don't need to actually use a CMOV, just write code that allows your compiler to use one if supported.

  int tmp = real;
  if (foo == bar) {
    tmp = newval;
  }
  real = tmp;

I've seen it referred to as a 'hammock'[1], and at least for GCC and ICC it usually is a strong enough hint.

[1] http://people.engr.ncsu.edu/ericro/publications/conference_M...