At one point I got really into the idea of ASTs for languages. I've used pygments, the python syntax highlighter, with the intention of parsing the meaning of code.
These syntax highlighters are great but I think that underlie the lack of a defined and accessible AST parse definition for most languages. Highlight.js and others kind of just rely regexes-- https://github.com/isagalaev/highlight.js/blob/master/src/la...
It'd be great if we could parse through programming languages to get their meaning. I want to get tuples back!
```
a = 23
```
would give
(variable(name=a), assignedTo, number(23))
This does exist for some languages, but mostly compiled ones. But it would make the syntax highlighting even more robust! Antlr is the best one around at the moment. http://www.antlr.org/
I believe CodeMirror does something like this. It uses restartable parsers (or it used to) to make parsing fast while editing.
I experimented editing code using jQuery.Syntax. I used the match tree it generated to figure out where to restart parsing and it was pretty fast, it would only re-evaluate the current line in most cases.
CodeMirror still uses the restartable parsers. But they produce a flat sequence of styles, not a hierarchical AST. E.g. `foo bar* baz` in markdown becomes <span class="cm-em">foo </span><span class="cm-strong cm-em">bar</span><span class="cm-em"> baz*</span>.
Also, most parsers reuse a small set of style names (that are covered by themes) without much regard to semantic appropriateness. E.g. markdown lists cycle through 'variable-2', 'variable-3', 'keyword'.
These syntax highlighters are great but I think that underlie the lack of a defined and accessible AST parse definition for most languages. Highlight.js and others kind of just rely regexes-- https://github.com/isagalaev/highlight.js/blob/master/src/la...
It'd be great if we could parse through programming languages to get their meaning. I want to get tuples back!
``` a = 23 ``` would give
(variable(name=a), assignedTo, number(23))
This does exist for some languages, but mostly compiled ones. But it would make the syntax highlighting even more robust! Antlr is the best one around at the moment. http://www.antlr.org/