Ocaml track: assignment 6: Implementing Scheme, part 2

Goals

This lab will complete the implementation of a miniature Scheme interpreter begun in lab 5. In this lab, we will take the output of parsing (the AST data structure) and interpret (evaluate) it to give a result. Lab 5 was all about the boring part of implementing a language; this lab is the fun part.

Concepts covered this week

Semantics
Environments
Evaluation
Lexical scoping

Overview

Basic sequence of interpretation

To recap the basic sequence of interpretation, we start off with a file of source code, represented as a big string. The steps involved in interpreting this string and computing results are:

convert the string to a sequence of tokens using a lexer (lab 5)
convert the sequence of tokens into one or more S-expressions using a parser (lab 5)
convert the S-expressions into abstract syntax trees (ASTs) (lab 5)
evaluate the ASTs and compute a result for each AST that gets evaluated (this lab).

Goal of this lab

The goal of this lab is to extend the code you wrote for lab 5 and add in the final piece, which is the code that takes an AST and evaluates it to give a result. In order to write this lab you must have working code for lab 5, so if your lab 5 code isn't working yet, fix that before you start this lab.

Semantics of the bogoscheme language

In this section, we'll describe the semantics of the bogoscheme language in detail. Up until now, all we've shown you is the form of the language, but in order to write an evaluator for a language, you need to know what the meaning of all legal program expressions are, and what values can be produced by these expressions. That's what we'll cover here.

Values

The result of evaluating a bogoscheme expression is a bogoscheme value. Here is the data type of bogoscheme values (from the file env.mli):

(** Identifiers. *)
type id = string

(** Type of legal values i.e. results of evaluation. *)
type value = 
   | Val_unit
   | Val_bool    of bool
   | Val_int     of int
   | Val_prim    of (value list -> value)      (* primitive functions *)
   | Val_lambda  of env * id list * Ast.expr list

(** Type of environments. *)
and env = { parent: env option; bindings: (id, value) Hashtbl.t }

We'll discuss environments below. The first three value types (Val_unit, Val_bool, and Val_int) represent unit, boolean, and integer values respectively. Val_prim values represent primitive (built-in) functions. These are functions like print which are not lambda expressions; the evaluator simply has to know how to evaluate them directly. Finally, Val_lambda values represent functions that are defined within the language i.e. lambda expressions. lambda expressions contain a list of identifiers (the formal arguments of the lambda expression), a list of expressions (the body of the lambda expression), and (perhaps surprisingly) an environment in which names are looked up. The relationship of lambda expressions to their environments will be discussed below.

Evaluation of any bogoscheme expression either results in one of the values described above, or it results in an error.

Evaluating expressions

Expressions in the language are represented by the type Ast.expr:

type id = string

type expr =
   | Expr_unit
   | Expr_bool   of bool
   | Expr_int    of int
   | Expr_id     of id
   | Expr_define of id * expr
   | Expr_if     of expr * expr * expr
   | Expr_lambda of id list * expr list
   | Expr_apply  of expr * expr list

There are eight different kinds of expressions, which evaluate as follows:

The first three kinds of expressions represent literal constants: the unit value, boolean values, and integers. To evaluate them, you just create the corresponding value (from the value type described above) and return it as the value of the expression as a whole.
Expr_id represents identifiers. To evaluate an identifier, you look it up in the current environment (see below). The value of the expression is the value the identifier is bound to in the environment.
Expr_define represents definitions, which when evaluated associate an identifier with a value in the current environment (see below). This expression's value is the unit value.
Expr_if represents if expressions i.e. conditionals. An if expression contains three subexpressions. The first subexpression is evaluated, and it should evaluate to a boolean value (if not, it's an error and a Type_error exception should be raised). If it evaluates to a "true" (#t) value, the second subexpression is evaluated, and its value is the value of the if expression as a whole (in this case, the third expression is not evaluated). Alternatively, if the first subexpression evaluates to a "false" (#f) value, the third subexpression is evaluated, and its value is the value of the if expression as a whole (in this case, the second expression is not evaluated).
Expr_lambda represents lambda expressions i.e. functions. Evaluating a lambda expression is easy: you just convert it into the corresponding value (from the value type described above).
Expr_apply represents a function being applied to an argument. We'll talk about how to evaluate function applications below.

Environments

The notion of an environment is a key concept in implementations of programming languages. An environment is a mapping between identifiers in the language and their values. In addition, an environment has a pointer to its "enclosing environment", unless it is a special environment called the "global environment" or "top-level environment". Here's how we represent environments in the evaluator:

(** Type of environments. *)
and env = { parent: env option; bindings: (id, value) Hashtbl.t }

The and is because this piece of code is part of the code shown above when we described the value type; the value type and the env type are actually mutually-recursive types (each depends on the other). Environments are represented as an ocaml record type. The identifier/value mappings are represented by an ocaml hash table, which is a value of type Hashtbl.t in the ocaml standard library (you should browse the ocaml documentation here to find out more about ocaml hash tables). This forms the bindings field of the record. The enclosing environment is represented by a value of type env option because there may or may not be an enclosing environment (there will be for all environments except for the global environment).

Note that a hash table is not a purely functional data structure, so you can mutate the contents of a hash table in-place. In fact, this is the normal way to use a hash table, and you should do that where appropriate in this lab.

Environments are used to record the values of identifiers in bogoscheme expressions. It is important to understand that evaluation of expressions can only be done in the context of a particular environment. Without environments, the only expressions that could be evaluated would be ones without any identifiers, which would make the language nothing more than a glorified calculator. Since most expressions have one or more identifiers, and we need environments to give meaning to these identifiers, we simply require that evaluation of any expression must always be done in the context of some environment. Put differently, an environment is the representation of the context in which an expression is evaluated.

To emphasize this last point, let's look at the type of the evaluator function, from eval.mli:

val eval : Ast.expr -> Env.env -> Env.value

What this says is that the evaluator takes an AST expression and an environment, and produces a value. Without an environment, the function cannot be called and so no value will be produced.

Evaluation of certain kinds of expressions can create, access or modify bindings in environments, as follows:

Evaluating an identifier expression (Expr_id) is done by looking up the identifier in the environment. In our implementation, this is done by looking it up in the bindings hash table. If it isn't found there, then the parent environment (if it exists) is searched. If there is no parent environment (which means that you searched all the way up to the global environment and still couldn't find a binding for the identifier), a Not_found exception is raised.
Evaluating a define expression (Expr_define) is done by first evaluating the subexpression to get a value. Then the identifer and the value are associated with each other in the bindings hash table of the current environment. Note that you do not create a new environment while evaluating a define expression; you merely add a binding to an existing environment.
Evaluating function applications also involves environments; this is discussed in the next section.

Note that all of the hash table manipulations are kept inside the env.ml file; the rest of the code only sees the abstract interface to environments as defined in the env.mli file. We'll let you figure out which hash table functions are most appropriate for implementing the environment interface. The good news is that the environment code is extremely simple to implement; all the hard work has already been done for you in the hash table implementation.

Evaluating function applications

Evaluating function applications (Expr_apply in the expression type) is a bit involved, so we'll discuss it in detail here. It consists of the following steps:

Evaluate all of the arguments to the function (the expr list in Eval_apply of expr * expr list) in any order. All of these expressions are evaluated in the current environment. The result of this is a list of values.
Evaluate the function itself (the expr in Eval_apply of expr * expr list). This should either evaluate to a primitive function (Val_prim in the value type) or a lambda value (Val_lambda in the value type). Anything else is an error, and a Type_error exception should be raised.
If the function evaluates to a Val_prim value (primitive function), then just apply that function to the evaluated argument list. The result is the result of the function application as a whole.
If the function evaluates to a Val_lambda value (lambda value), then things become a bit more complicated. In this case, you need to do the following:
1. Create a brand-new environment which maps the argument names of the lambda expression (the id list in Expr_lambda of id list * expr list) to the list of values you computed in step 1 above. So if the argument names were x, y, and z and the computed values were 1, 2, and 3, the environment would map x to 1, y to 2, and z to 3.
2. The parent environment of this new environment is the environment from the lambda expression (the env component). This is the environment in which the lambda expression was originally defined. Note: the environment in which the function application is being evaluated is not used to create the new environment.
3. Now that you have the new environment, you simply evaluate the code in the body of the lambda expression (the expr list in Expr_lambda of id list * expr list) in the context of this new environment.

And that's it. This way of handling environments and creating new environments every time you apply a lambda expression to its arguments is known as lexical scoping. If you've taken CS 1, this should seem familiar to you -- you have just implemented the environment model of Scheme (aside from set!, which is not part of our langauge..

Program to write

Your job in this lab is to complete the bogoscheme interpreter you began in lab 5.

Just like lab 5, there are a lot of supporting files for this lab, and we've again prepared a tarball of all the files, which is located here. To unpack it, do:

% tar xvf lab6.tar

Again, just like lab 5, a lot of the code has already been written for you and should not be changed. The only files that will need changing are the following:

the files lexer.mll, parser.mly, and ast.ml should be copied over from your completed lab 5 code.
the files env.ml and eval.ml should be completed at the places marked "TODO". Of course, the "TODO" comments should be removed.

In particular, note that we provide interface files (.mli files) for all the modules in the lab, which you should read, as the comments in those files will prove valuable to you.

We provide the files main.ml and primitives.ml, which do not need to be changed, but which you should read anyway. main.ml is the entry point to the interpreter, while primitives.ml contains the implementations of primitive functions. These include arithmetic operators (+, -, *, /), comparison operators (=, !=, <, >, <=, >=), and the print function.

We also provide a Makefile. Typing make will make the bogoscheme interpreter, which is called bs. Typing make test will run the test script (see below). Typing make clean will remove all the files created during compilation.

The file env.ml contains the implementation of environments, while eval.ml contains the evaluator itself. Again, these are the only files in this lab that you need to edit, aside from copying over your parsing code from lab 5. You need to add less than 10 lines of code to env.ml, and less than 30 lines of code to eval.ml. This lab is heavy on thinking and very light on actual coding, so take your time and make sure you understand what you're doing.

Testing the code

The bogoscheme interpreter will be an executable called bs, and it will be invoked as follows:

% bs filename.bs

for some Scheme source code file called filename.bs. We have created a number of Scheme source code files in the tests/ subdirectory of the code. You can run them separately e.g.

% bs tests/test_factorial.bs

or you can just type make test, which will run the test script run_test. This will run all the test programs. If all is well, the test programs will print one or more #t values to the terminal. If anything else is printed (aside from lines indicating which test script is being run and how many #ts should be printed), it means that an error occurred, which in turn means that your interpreter has a bug which you should fix.

To hand in

The files env.ml and eval.ml.

References

Documentation for the ocaml hash table module.