Ocaml track: assignment 4: Data structures, modules, and functors


Goals

To learn how to write complex data structures, and then package them in modules and functors for better reusability.

It has been said that the advantage of using a language like ocaml with a sophisticated type system only really becomes apparent when the data structures you use in your program become significantly complex. This lab will allow you to get familiar with how to write code for data structures that are more complex than simple lists or trees.

Language concepts covered this week

Reading

Chapters 12 and 13 of the textbook.

Program to write

This week, you will be implementing a data structure called a priority queue. The specific implementation you will create is called a leftist heap. So let's talk about these things now.

Priority queues

NOTE: There is a priority queue implementation given in the ocaml manual as part of the module documentation. Please don't consult that for this assignment (the implementations are different anyway).

A priority queue is a data structure that is ordered so that it provides fast access to its minimum element, as defined by some ordering relationship. Conceptually, you can think of that element as "the next thing to be processed". It also has to have an operation which deletes the minimum element and returns another priority queue, as well as several other operations, which are summarized here as an ocaml module signature:

module type PriorityQueueSig =
  sig
    exception Empty

    type t         (* Abstract type of elements of queue. *)
    type queue     (* Abstract type of queue. *)

    val empty      : queue                 (* Return empty queue.      *)
    val is_empty   : queue  -> bool        (* Check if queue is empty. *)
    val insert     : queue  -> t -> queue  (* Insert item into queue.  *)
    val find_min   : queue  -> t           (* Return minimum element.  *)
    val delete_min : queue  -> queue       (* Delete minimum element.  *)
    val from_list  : t list -> queue       (* Convert list to queue.   *)
  end

Of these operations, note that the operation from_list is not fundamental to the definition of a priority queue, but it's extremely useful.

Also note that we are being loose about the words "insert" and "delete". These operations are purely functional, so that when we say we "insert" an element into a queue, we really mean that we create a new queue which has the extra element in it; the original queue is unchanged. Similarly, when we "delete" an element from a queue, we really mean that we create a new queue without the offending element; the original queue is once again unchanged.

Leftist heaps

Now that we know how our data structure is supposed to behave, the next question is: how do we implement it? Naturally, there are lots of choices, each of which will have different trade-offs. In this lab you're going to implement priority queues as leftist heaps. This is a data structure that has the following attributes:

The interesting feature of leftist heaps is that they support very efficient merging of two heaps to form a combined heap. (Of course, since we're using purely functional code, you don't alter the original heaps.) Leftist heaps can be merged in O(log N) time, where N is the total number of elements in the resulting heap. Furthermore, once the merge operation (which, you'll note, is not a required operation for priority queues) is implemented, you can define most of the rest of the operations very easily in terms of it. Specifically, you can define the insert and delete_min operations in terms of merging, and the other operations are then trivial to define, except for the list-to-heap conversion routine. That can be done easily in O(N * log(N)) time, and with more difficulty in O(N) time. You're not required to find the most efficient solution, but it's a good exercise.

Merging leftist heaps

OK, so now we need to figure out how to merge two leftist heaps to create a new leftist heap. The basic algorithm is quite simple:

(*) Here is how to make a new heap from a minimum element and two heaps: the resulting heap must have:

This algorithm will preserve the leftist heap property in the merged heap.

Making a module

Your implementation should be written as follows (assuming that the priority queue will have integers as elements):

module PriorityQueue : (PriorityQueueSig with type t = int) =
  struct
  (* Your code goes here. *)
  end

Note that since the type t is abstract, if you want to use a real type as the elements of your priority queue (and it would be pretty useless otherwise), you have to specify which type you want the PriorityQueueSig's t value to represent. This is kind of clunky syntax. Also, you might ask why you can't just make it parametric, like a type 'a list. You actually can in some cases, but here we need a type with an ordering relation, and there is no way to guarantee that any arbitrary type will (a) be orderable at all -- what if it's a function type? or (b) will be orderable using the same function (e.g. the built-in compare function). So it's better to write the code as it's written above. Also, this will make it easy to transform this into a functor (see below).

Here's an interesting point to ponder: why don't we also have to specify what the abstract queue type represents in the code above?

Use your priority queue implementation to write a heap sort function. This will take a list as its argument and will

Only use your module's exported functions (those in the signature) in your solution. Test it by using the heap sort to sort a list of integers.

This code should go into a file called priority_queue.ml.

Making a functor

As written, the code is dependent on the built-in comparison functions. To make this more generic, let's define some types and module signatures:

(* Type for ordered comparisons. *)
type comparison = LessThan | Equal | GreaterThan


(* Signature for ordered objects. *)
module type OrderedSig =
  sig
    type t
    val compare: t -> t -> comparison
  end


(* Signature for priority queues. *)
module type PriorityQueueSig =
  sig
    exception Empty

    type t
    type queue

    val empty      : queue
    val is_empty   : queue  -> bool
    val insert     : queue  -> t -> queue
    val find_min   : queue  -> t
    val delete_min : queue  -> queue
    val from_list  : t list -> queue
  end

What you have to do now is generalize your priority queue implementation into a functor that takes a module matching the OrderedSig signature as its argument, and produces a priority queue (implemented using a leftist heap) which is specialized for that particular type of data. For instance, you can define a module of ordered strings like this:

module OrderedString =
  struct
    type t = string
    let compare x y = 
      if x = y then Equal else if x < y then LessThan else GreaterThan
  end

and then define your string priority queue like this:

  
module StringPQ = MakePriorityQueue(OrderedString)

Once you've done this, redefine your heap sort function using a StringPQ as the heap. Note that this heap sort will only work on strings. Use it to sort a list of strings.

To get you started, here is a skeleton of the code you should use for the functor definition:

module MakePriorityQueue (Elt : OrderedSig) 
  : (PriorityQueueSig with type t = Elt.t) =
  struct
    (* Your code goes here... *)
  end

Note that again you have to specify what the type t in the PriorityQueueSig represents. Here, it better be the same type as the type t in the OrderedSig argument (which we have called Elt).

Your functor code should go into a file called make_priority_queue.ml.

To hand in

The files priority_queue.ml and make_priority_queue.ml. Add some test code to both files to show that your heap sort works correctly.

References