Glancing at the source, I think the whitespace parsing could be improved. Since whitespace will rarely if ever (?) be followed by more whitespace, and since most other things are likely to be followed by whitespace, just having both "parse_expecting_whitespace()" and "parse_probably_not_whitespace()" should improve the prediction.
Other tokens probably have similar but less strong "preferences". It's possible that a smart enough branch predictor will learn these on its own, but changing things so that each token ends with its own predicted branch would likely be a win.
Since the branch predictor uses the address of the branch as the 'key', the general goal is to have a separate branch instruction (if, switch) in each place where the probabilities are different enough to favor a different prediction. A wrong prediction costs about 15 cycles, but a correctly predicted branch costs less than 1, so you can put in quite a lot of these and still come out ahead.
Perhaps just ending every token handler with a best guess at what comes next?
if (T->ptr[0] == most_likely_next)
handle_most_likely_next(T);
else scan(T);
I don't have the syntax in my head, by I think you can use 'perf' to record mispredicted indirect branches and then display that sorted by caller.
Great, that sounds like a fine opportunity for a "best guess": if newline, expect whitespace. Currently, I'd guess that 'whitespace except newline' is the default prediction for the switch() at line 451. I'd also guess that if not followed by a space or tab, a newline is frequently followed by another newline.
Maybe you could combine the case statements for space and newline, and do a branchless 'cmov' to increment loc.linenum if the match was a newline? This could be combined with loop to grab all the whitespace/newlines in one if you think whitespace is occurs in clumps.
Other tokens probably have similar but less strong "preferences". It's possible that a smart enough branch predictor will learn these on its own, but changing things so that each token ends with its own predicted branch would likely be a win.
Since the branch predictor uses the address of the branch as the 'key', the general goal is to have a separate branch instruction (if, switch) in each place where the probabilities are different enough to favor a different prediction. A wrong prediction costs about 15 cycles, but a correctly predicted branch costs less than 1, so you can put in quite a lot of these and still come out ahead.
Perhaps just ending every token handler with a best guess at what comes next?
I don't have the syntax in my head, by I think you can use 'perf' to record mispredicted indirect branches and then display that sorted by caller.