Harmonia

Blender and ASTDef: Language Definition Tools

Blender: A Lexer and Parser Generator

(Andrew Begel)

Blender is a lexer and parser generator that reads in lexical descriptions (written in a variant of Flex format) and grammars (written in a variant of Bison format) and combines them to produce lexer and parser tables. In addition, it also writes graph-based data structures in ASTDef format for the parser to refer to the grammar at runtime.

Blender has the ability to merge several lexical and grammar descriptions into a single set of lexer and parser tables. This enables Blender to support embedded languages, such as JavaScript embedded in HTML or JavaDoc embedded in Java. A grammar may include several lexical description files. For instance, a grammar for Java may include three: a description of the Java keywords, a description for comments, and another one for strings. A grammar nonterminal from one grammar may be included in another grammar description. When the parser intends to shift terminals that belong to that nonterminal, it first sets the lexer into the proper state to read lexemes from that langauge before invoking the lexer. Because Harmonia's grammar descriptions enable one to create ambiguous grammars, the use of embedded languages requires the support of both ambiguous parsing and ambiguous lexing.


ASTDef: A Tree Definition Translator

ASTDef is a language for defining syntax tree node classes. ASTDef specifications are processed by a tool of the same name to yield a C++ implementation of the AST node classes. One source of ASTDef specifications is the Bison2 parser generator. Additional AST definition code might be supplied by the language implementer, for hand-coded semantic analyses that traverse the syntax tree data structure. Automatically generated analyses might also produce AST definition code.

The first task of the ASTDef translator is to process all of the AST definition code. Since the ASTDef language is a derivative of C++, the specifications need to be parsed much like any other program. However, C++ is notoriously difficult to parse; thus, when we designed the syntax for ASTDef, we included some syntactic sugar that made parsing easier. Some modifications were to precede each method declaration with the method keyword, and each field declaration with the slot keyword. Additionally, method bodies are not parsed at all. Instead, lexical tricks are used to treat them as strings which are then stored within ASTDef's internal representation.

After processing AST definitions, ASTDef performs some simple validations such as checking that no method or field names clash (more rigorous error checking is left to the C++ compiler). It then carries out a number of transformations on the internal representation, and translates AST definitions to C++. ASTDef also generates all of the runtime support code.