Harmonia Language Definition ToolsThe picture on the right (TODO) depicts the typical process for building a Harmonia language plug-in. The input consists of a lexical specification, a grammar for the programming language, and a small hand-coded file describing the language module configuration. Optionally, the input may include extra code to be included in the generated definitions of the AST classes.
The lexical specification is processed by the off-the-shelf Flex scanner generator to produce a batch lexer. This lexer can be used by the Harmonia language kernel to provide incremental lexical analysis. The syntactic specification is pre-processed by the Ladle tool, whose main job is to perform EBNF to BNF grammar transformation. The output of Ladle is a syntactic specification compatible with Bison. We use a modified variant of Bison, called Bison2, which outputs parse tables and AST class definitions rather than parser source code. AST definitions are subsequently checked, combined with any extra definitions provided by the language plug-in implementer, and translated into the C++ source code by the ASTDef tool. Finally, a C++ compiler is used to combine the C++ class definitions, parse tables, the batch lexer, and the language module interface implementation into a dynamically loadable library for the Harmonia language analysis kernel.
We are currently replacing Flex/Ladle/Bison2 portion of the toolchain with Blender, a more versatile lexer and parser generator implemented using the Harmonia framework.
- Input Validation. In this stage the grammar is checked for errors that would prevent its further interpretation by the tool. Such errors include duplicate or missing names, syntax errors, etc. The input validation step happens while translating the grammar into internal representation.
- EBNF to BNF Translation. During this step, Ladle transforms the internal representation of the grammar from EBNF to BNF. Because BNF is a subset of EBNF, the transformations take place within the same data model, i.e. the grammar is "rewritten" into the BNF form.
- Grammar Verification. The expanded BNF grammar is checked for semantic problems, such as certain ambiguities that cannot be handled by our parser generator. If such errors are found, the grammar is rejected.
- BNF Output. Following verification the grammar is emitted in standard Bison BNF format for further processing. Additional user annotations that are outside the scope of Bison BNF format are emitted in an auxiliary file that our modified version of Bison can read.
The first task of the ASTDef translator is to process all of the AST definition code. Since the ASTDef language is a derivative of C++, the specifications need to be parsed much like any other program. However, C++ is notoriously difficult to parse; thus, when we designed the syntax for ASTDef, we included some syntactic sugar that made parsing easier. Some modifications were to precede each method declaration with the method keyword, and each field declaration with the slot keyword. Additionally, method bodies are not parsed at all. Instead, lexical tricks are used to treat them as strings which are then stored within ASTDef's internal representation.
After processing AST definitions, ASTDef performs some simple validations such as checking that no method or field names clash (more rigorous error checking is left to the C++ compiler). It then carries out a number of transformations on the internal representation, and translates AST definitions to C++. ASTDef also generates all of the runtime support code.
Blender is a lexer and parser generator that subsumes the roles of our previous tool chain of Flex, Ladle and Bison2. Blender reads in lexical descriptions (written in a variant of Flex format) and grammars (written in a variant of Ladle format) and combines them to produce Flex and Bison parser tables. In addition, it also writes graph-based data structures for the parser to refer to the grammar at runtime.
One of our projects uses IBM ViaVoice to provide speech recognition services to XEmacs. Just as we describe a programming language grammar using Ladle, ViaVoice provides a grammar description language to support spoken command languages called SRCL. Ladle and ViaVoice's descriptions are similar but not compatible.
Brian Chin has written a tool to convert Ladle grammar descriptions into ViaVoice grammar descriptions and wrote the SRCL language module for Harmonia.