Ocaml track: assignment 6: Implementing Scheme, part 2


Goals

This lab will complete the implementation of a miniature Scheme interpreter begun in lab 5. In this lab, we will take the output of parsing (the AST data structure) and interpret (evaluate) it to give a result. Lab 5 was all about the boring part of implementing a language; this lab is the fun part.


Concepts covered this week


Overview

Basic sequence of interpretation

To recap the basic sequence of interpretation, we start off with a file of source code, represented as a big string. The steps involved in interpreting this string and computing results are:

Goal of this lab

The goal of this lab is to extend the code you wrote for lab 5 and add in the final piece, which is the code that takes an AST and evaluates it to give a result. In order to write this lab you must have working code for lab 5, so if your lab 5 code isn't working yet, fix that before you start this lab.


Semantics of the bogoscheme language

In this section, we'll describe the semantics of the bogoscheme language in detail. Up until now, all we've shown you is the form of the language, but in order to write an evaluator for a language, you need to know what the meaning of all legal program expressions are, and what values can be produced by these expressions. That's what we'll cover here.

Values

The result of evaluating a bogoscheme expression is a bogoscheme value. Here is the data type of bogoscheme values (from the file env.mli):

(** Identifiers. *)
type id = string

(** Type of legal values i.e. results of evaluation. *)
type value = 
   | Val_unit
   | Val_bool    of bool
   | Val_int     of int
   | Val_prim    of (value list -> value)      (* primitive functions *)
   | Val_lambda  of env * id list * Ast.expr list

(** Type of environments. *)
and env = { parent: env option; bindings: (id, value) Hashtbl.t }

We'll discuss environments below. The first three value types (Val_unit, Val_bool, and Val_int) represent unit, boolean, and integer values respectively. Val_prim values represent primitive (built-in) functions. These are functions like print which are not lambda expressions; the evaluator simply has to know how to evaluate them directly. Finally, Val_lambda values represent functions that are defined within the language i.e. lambda expressions. lambda expressions contain a list of identifiers (the formal arguments of the lambda expression), a list of expressions (the body of the lambda expression), and (perhaps surprisingly) an environment in which names are looked up. The relationship of lambda expressions to their environments will be discussed below.

Evaluation of any bogoscheme expression either results in one of the values described above, or it results in an error.

Evaluating expressions

Expressions in the language are represented by the type   Ast.expr:

type id = string

type expr =
   | Expr_unit
   | Expr_bool   of bool
   | Expr_int    of int
   | Expr_id     of id
   | Expr_define of id * expr
   | Expr_if     of expr * expr * expr
   | Expr_lambda of id list * expr list
   | Expr_apply  of expr * expr list

There are eight different kinds of expressions, which evaluate as follows:

Environments

The notion of an environment is a key concept in implementations of programming languages. An environment is a mapping between identifiers in the language and their values. In addition, an environment has a pointer to its "enclosing environment", unless it is a special environment called the "global environment" or "top-level environment". Here's how we represent environments in the evaluator:

(** Type of environments. *)
and env = { parent: env option; bindings: (id, value) Hashtbl.t }

The and is because this piece of code is part of the code shown above when we described the value type; the value type and the env type are actually mutually-recursive types (each depends on the other). Environments are represented as an ocaml record type. The identifier/value mappings are represented by an ocaml hash table, which is a value of type Hashtbl.t in the ocaml standard library (you should browse the ocaml documentation here to find out more about ocaml hash tables). This forms the bindings field of the record. The enclosing environment is represented by a value of type env option because there may or may not be an enclosing environment (there will be for all environments except for the global environment).

Note that a hash table is not a purely functional data structure, so you can mutate the contents of a hash table in-place. In fact, this is the normal way to use a hash table, and you should do that where appropriate in this lab.

Environments are used to record the values of identifiers in bogoscheme expressions. It is important to understand that evaluation of expressions can only be done in the context of a particular environment. Without environments, the only expressions that could be evaluated would be ones without any identifiers, which would make the language nothing more than a glorified calculator. Since most expressions have one or more identifiers, and we need environments to give meaning to these identifiers, we simply require that evaluation of any expression must always be done in the context of some environment. Put differently, an environment is the representation of the context in which an expression is evaluated.

To emphasize this last point, let's look at the type of the evaluator function, from eval.mli:

val eval : Ast.expr -> Env.env -> Env.value

What this says is that the evaluator takes an AST expression and an environment, and produces a value. Without an environment, the function cannot be called and so no value will be produced.

Evaluation of certain kinds of expressions can create, access or modify bindings in environments, as follows:

Note that all of the hash table manipulations are kept inside the env.ml file; the rest of the code only sees the abstract interface to environments as defined in the env.mli file. We'll let you figure out which hash table functions are most appropriate for implementing the environment interface. The good news is that the environment code is extremely simple to implement; all the hard work has already been done for you in the hash table implementation.

Evaluating function applications

Evaluating function applications (Expr_apply in the expression type) is a bit involved, so we'll discuss it in detail here. It consists of the following steps:

  1. Evaluate all of the arguments to the function (the expr list in Eval_apply of expr * expr list) in any order. All of these expressions are evaluated in the current environment. The result of this is a list of values.

  2. Evaluate the function itself (the expr in Eval_apply of expr * expr list). This should either evaluate to a primitive function (Val_prim in the value type) or a lambda value (Val_lambda in the value type). Anything else is an error, and a Type_error exception should be raised.

  3. If the function evaluates to a Val_prim value (primitive function), then just apply that function to the evaluated argument list. The result is the result of the function application as a whole.

  4. If the function evaluates to a Val_lambda value (lambda value), then things become a bit more complicated. In this case, you need to do the following:

    1. Create a brand-new environment which maps the argument names of the lambda expression (the id list in Expr_lambda of id list * expr list) to the list of values you computed in step 1 above. So if the argument names were x, y, and z and the computed values were 1, 2, and 3, the environment would map x to 1, y to 2, and z to 3.

    2. The parent environment of this new environment is the environment from the lambda expression (the env component). This is the environment in which the lambda expression was originally defined. Note: the environment in which the function application is being evaluated is not used to create the new environment.

    3. Now that you have the new environment, you simply evaluate the code in the body of the lambda expression (the expr list in Expr_lambda of id list * expr list) in the context of this new environment.

And that's it. This way of handling environments and creating new environments every time you apply a lambda expression to its arguments is known as lexical scoping. If you've taken CS 1, this should seem familiar to you -- you have just implemented the environment model of Scheme (aside from set!, which is not part of our langauge..


Program to write

Your job in this lab is to complete the bogoscheme interpreter you began in lab 5.

Just like lab 5, there are a lot of supporting files for this lab, and we've again prepared a tarball of all the files, which is located here. To unpack it, do:

% tar xvf lab6.tar

Again, just like lab 5, a lot of the code has already been written for you and should not be changed. The only files that will need changing are the following:

In particular, note that we provide interface files (.mli files) for all the modules in the lab, which you should read, as the comments in those files will prove valuable to you.

We provide the files main.ml and primitives.ml, which do not need to be changed, but which you should read anyway. main.ml is the entry point to the interpreter, while primitives.ml contains the implementations of primitive functions. These include arithmetic operators (+, -, *, /), comparison operators (=, !=, <, >, <=, >=), and the print function.

We also provide a Makefile. Typing   make   will make the bogoscheme interpreter, which is called bs. Typing   make test   will run the test script (see below). Typing   make clean   will remove all the files created during compilation.

The file env.ml contains the implementation of environments, while eval.ml contains the evaluator itself. Again, these are the only files in this lab that you need to edit, aside from copying over your parsing code from lab 5. You need to add less than 10 lines of code to env.ml, and less than 30 lines of code to eval.ml. This lab is heavy on thinking and very light on actual coding, so take your time and make sure you understand what you're doing.


Testing the code

The bogoscheme interpreter will be an executable called bs, and it will be invoked as follows:

% bs filename.bs

for some Scheme source code file called filename.bs. We have created a number of Scheme source code files in the tests/ subdirectory of the code. You can run them separately e.g.

% bs tests/test_factorial.bs

or you can just type make test, which will run the test script run_test. This will run all the test programs. If all is well, the test programs will print one or more #t values to the terminal. If anything else is printed (aside from lines indicating which test script is being run and how many #ts should be printed), it means that an error occurred, which in turn means that your interpreter has a bug which you should fix.


To hand in

The files env.ml and eval.ml.


References