bison
flex, and have a nice
set of tokens to play with. Unfortunately, there are some kinds of language
analysis that are too complex for lexers. One example is matching open
parentheses with their corresponding close parentheses: it turns out to be
impossible for a lexer (which only handles regular expressions) to match
arbitrarily nested sets of parentheses, which is something you would want in
a computer language. For this we need something more powerful, and that
thing is a parser. A parser takes a sequences of tokens (provided by
the lexer) and either
Writing parsers by hand is hard work (much harder than writing lexers). So,
by analogy with lexers, there are programs called parser generators
that take an input file representing the grammar of the language to be parsed
and generate C code that implements the parser. Also by analogy with lexers,
parser generators are written in a special language that is in some ways
similar to C, but that is specialized to the task of parsing. The standard
unix parser generator program is yacc, which stands for "yet
another compiler-compiler". Again, we will be using an improved version
called bison. The name bison is a pun on
yacc and on "GNU" (which in turn stands for "GNU's not unix").
GNU is the name of software from the Free
Software Foundation.
What the bison grammar description looks like is described in
great detail here.
You should read as much of this as you need to. Briefly, the main section of
the grammar describes productions (valid expressions in the language)
in terms of simpler expressions, finally bottoming out with tokens. When an
expression is matched, you can define an action to execute. This may involve
evaluating the expression (in our case), or generating part of an abstract
syntax tree (for a more complex language).
sqrt(2.0). There should be a built-in function called "print"
which will print an expression to stdout, followed by a newline e.g.
a = 2 + 2;
print(a);
prints "4" to stdout. You can also add other built-in functions e.g.
sqrt and exp. For the purposes of this lab, functions
should take a single argument only, which will be a number. (Extra credit:
extend this to functions that can take arbitrary numbers of arguments.)
bison to define the grammar. Your grammar should not have any
shift-reduce or reduce-reduce conflicts (these indicate ambiguities in the
grammar).
Your program should also be able to run scripts. This is done using the unix "sharp-bang" hack ("#!") as follows:
#! /usr/bin/env calculator a = 2 + 2; print(a);When you put this into a file (called e.g. "print_4") and do this:
% chmod +x print_4 % print_4it will print "4" to stdout.
calculator program.
lex and yacc (and by extension on
flex and bison) out there.
This book contains as a project a miniature computer language which is a superset of this week's assignment. Thus, if you want to cheat, this will be a good place to look ;-) I recommend you avoid using this book unless you're really stuck. Instead, look at their solution after you're done and compare it with yours.