Abstract Syntax Tree
drizzle started with a single-pass compiler. It took a stream of tokens and converted them into bytecode. There was no intermediate representation and to be honest: there was no need for it. A single-pass compiler is sufficient for a programming language. Just take a look at Lua: it's small, simple and has fast startup times.
But this is a hobby project and I wanted to explore the mystical abstract syntax tree. I was a little surprised to find no code examples. I get the idea behind it, but there seems to be no universal way of doing it. The design I ended up with uses a tagged union:
class Expression;
using Expr = std::unique_ptr<Expression>;
class Expression {
public:
enum class Type {
Group,
};
struct Group {
Expr expression;
};
Type type;
union {
Group group;
};
};
I did the same for statements. The program
statement is the root of the tree. It contains other statements, which might be expressions, loops or control flow constructs. With the IR in place, I was also able to split the parser and compiler into two entities. The parser spits out an AST and the compiler uses it to produce bytecode. Take the following program as an example:
var x = 0
while x < 100:
if x % 2 == 0:
print x
x = x + 1
Running drizzle --ast file.dz
prints the tree:
program
var x
literal 0
while
binary <
variable x
literal 100
if
binary ==
binary %
variable x
literal 2
literal 0
print
variable x
expression_statement
assign x
binary +
variable x
literal 1
I think it turned out pretty readable. Having this was a lifesaver for testing. I could throw anything at the parser and check if the produced AST matched my expectations. It will also be useful for compile-time optimizations. If we encounter a binary
expression with two literal
expressions as operands, we can fold them and emit a simplified literal
expression.