Harmonia Language Definition Tools

The picture on the right (TODO) depicts the typical process for building a Harmonia language plug-in. The input consists of a lexical specification, a grammar for the programming language, and a small hand-coded file describing the language module configuration. Optionally, the input may include extra code to be included in the generated definitions of the AST classes.

The lexical specification is processed by the off-the-shelf Flex scanner generator to produce a batch lexer. This lexer can be used by the Harmonia language kernel to provide incremental lexical analysis. The syntactic specification is pre-processed by the Ladle tool, whose main job is to perform EBNF to BNF grammar transformation. The output of Ladle is a syntactic specification compatible with Bison. We use a modified variant of Bison, called Bison2, which outputs parse tables and AST class definitions rather than parser source code. AST definitions are subsequently checked, combined with any extra definitions provided by the language plug-in implementer, and translated into the C++ source code by the ASTDef tool. Finally, a C++ compiler is used to combine the C++ class definitions, parse tables, the batch lexer, and the language module interface implementation into a dynamically loadable library for the Harmonia language analysis kernel.

We are currently replacing Flex/Ladle/Bison2 portion of the toolchain with Blender, a more versatile lexer and parser generator implemented using the Harmonia framework.

Ladle: An EBNF Translator

The function of Ladle is to transform the high-level EBNF syntactic specification to a BNF notation amenable for processing by the Bison2 parser generator. The Ladle tool processes EBNF grammar into an internal representation and carries out the sequence of four operations:
  • Input Validation. In this stage the grammar is checked for errors that would prevent its further interpretation by the tool. Such errors include duplicate or missing names, syntax errors, etc. The input validation step happens while translating the grammar into internal representation.
  • EBNF to BNF Translation. During this step, Ladle transforms the internal representation of the grammar from EBNF to BNF. Because BNF is a subset of EBNF, the transformations take place within the same data model, i.e. the grammar is "rewritten" into the BNF form.
  • Grammar Verification. The expanded BNF grammar is checked for semantic problems, such as certain ambiguities that cannot be handled by our parser generator. If such errors are found, the grammar is rejected.
  • BNF Output. Following verification the grammar is emitted in standard Bison BNF format for further processing. Additional user annotations that are outside the scope of Bison BNF format are emitted in an auxiliary file that our modified version of Bison can read.

Bison2: An LALR(1) and GLR parser generator

Bison2 is a variant of the Bison parser generator modified from the original version to include features necessary for constructing Harmonia language plug-ins. Unlike its predecessor, Bison2 does not generate any parser code. Instead, its output consists of parse tables that can be used by the Harmonia parser driver and AST node class definitions. When processing a GLR grammar, Bison2 outputs a conflict table which includes additional actions to be taken in parser states with shift/reduce and reduce/reduce conflicts.

ASTDef: A Tree Definition Translator

ASTDef is a language for defining syntax tree node classes. ASTDef specifications are processed by a tool of the same name to yield a C++ implementation of the AST node classes. One source of ASTDef specifications is the Bison2 parser generator. Additional AST definition code might be supplied by the language implementer, for hand-coded semantic analyses that traverse the syntax tree data structure. Automatically generated analyses might also produce AST definition code.

The first task of the ASTDef translator is to process all of the AST definition code. Since the ASTDef language is a derivative of C++, the specifications need to be parsed much like any other program. However, C++ is notoriously difficult to parse; thus, when we designed the syntax for ASTDef, we included some syntactic sugar that made parsing easier. Some modifications were to precede each method declaration with the method keyword, and each field declaration with the slot keyword. Additionally, method bodies are not parsed at all. Instead, lexical tricks are used to treat them as strings which are then stored within ASTDef's internal representation.

After processing AST definitions, ASTDef performs some simple validations such as checking that no method or field names clash (more rigorous error checking is left to the C++ compiler). It then carries out a number of transformations on the internal representation, and translates AST definitions to C++. ASTDef also generates all of the runtime support code.


(Andrew Begel)

Blender is a lexer and parser generator that subsumes the roles of our previous tool chain of Flex, Ladle and Bison2. Blender reads in lexical descriptions (written in a variant of Flex format) and grammars (written in a variant of Ladle format) and combines them to produce Flex and Bison parser tables. In addition, it also writes graph-based data structures for the parser to refer to the grammar at runtime.

Blender has the ability to merge several lexical and grammar descriptions into a single set of lexer and parser tables. This enables Blender to support embedded languages, such as JavaScript embedded in HTML or JavaDoc embedded in Java. A grammar may include several lexical description files. For instance, a grammar for Java may include three: a description of the Java keywords, a description for comments, and another one for strings. A grammar nonterminal from one grammar may be included in another grammar description. When the parser intends to shift terminals that belong to that nonterminal, it first sets the lexer into the proper state to read lexemes from that langauge before invoking the lexer. Because Harmonia's grammar descriptions enable one to create ambiguous grammars, the use of embedded languages requires the support of both ambiguous parsing and ambiguous lexing.

Ladle to ViaVoice Grammar (SRCL) Translator

(Brian Chin)

One of our projects uses IBM ViaVoice to provide speech recognition services to XEmacs. Just as we describe a programming language grammar using Ladle, ViaVoice provides a grammar description language to support spoken command languages called SRCL. Ladle and ViaVoice's descriptions are similar but not compatible.

Brian Chin has written a tool to convert Ladle grammar descriptions into ViaVoice grammar descriptions and wrote the SRCL language module for Harmonia.