To learn how to write complex data structures, and then package them in modules and functors for better reusability.
It has been said that the advantage of using a language like ocaml with a sophisticated type system only really becomes apparent when the data structures you use in your program become significantly complex. This lab will allow you to get familiar with how to write code for data structures that are more complex than simple lists or trees.
Chapters 12 and 13 of the textbook.
This week, you will be implementing a data structure called a priority queue. The specific implementation you will create is called a leftist heap. So let's talk about these things now.
NOTE: There is a priority queue implementation given in the ocaml manual as part of the module documentation. Please don't consult that for this assignment (the implementations are different anyway).
A priority queue is a data structure that is ordered so that it provides fast access to its minimum element, as defined by some ordering relationship. Conceptually, you can think of that element as "the next thing to be processed". It also has to have an operation which deletes the minimum element and returns another priority queue, as well as several other operations, which are summarized here as an ocaml module signature:
module type PriorityQueueSig = sig exception Empty type t (* Abstract type of elements of queue. *) type queue (* Abstract type of queue. *) val empty : queue (* Return empty queue. *) val is_empty : queue -> bool (* Check if queue is empty. *) val insert : queue -> t -> queue (* Insert item into queue. *) val find_min : queue -> t (* Return minimum element. *) val delete_min : queue -> queue (* Delete minimum element. *) val from_list : t list -> queue (* Convert list to queue. *) end
Of these operations, note that the operation from_list
is not
fundamental to the definition of a priority queue, but it's extremely
useful.
Also note that we are being loose about the words "insert" and "delete". These operations are purely functional, so that when we say we "insert" an element into a queue, we really mean that we create a new queue which has the extra element in it; the original queue is unchanged. Similarly, when we "delete" an element from a queue, we really mean that we create a new queue without the offending element; the original queue is once again unchanged.
Now that we know how our data structure is supposed to behave, the next question is: how do we implement it? Naturally, there are lots of choices, each of which will have different trade-offs. In this lab you're going to implement priority queues as leftist heaps. This is a data structure that has the following attributes:
It can either be a leaf (representing an empty heap) or a node which contains:
Thus, a leftist heap is a binary tree with additional rank information stored in the nodes. The tree will in general not be balanced; it will usually have more elements in the left subtree than in the right. We say that the tree "skews to the left".
The element stored at the top of the tree is smaller (more precisely, is no larger) than any of the elements stored beneath it in either subtree.
As mentioned, each node has an integer value associated with it, which is called its rank. The rank of a node is equal to the length of its right spine, which is the number of nodes in the rightmost path from the node in question to an empty (leaf) node (this will also turn out to be the shortest path to a leaf node). Thus, a leaf node has rank 0, and a heap with a left and/or right subheap which is a leaf has rank 1. However, ranks are only stored in nodes, not in leaves. Having the rank stored in the nodes makes many operations much faster.
The rank of any left child is required to be at least as large as the rank of its right sibling. This is why it's called a "leftist" heap.
insert
and
delete_min
operations in terms of merging, and the other
operations are then trivial to define, except for the list-to-heap conversion
routine. That can be done easily in O(N * log(N)) time, and with more
difficulty in O(N) time. You're not required to find the most efficient
solution, but it's a good exercise.
If either heap is empty, return the other heap.
If the first heap's minimum element is smaller than the second heap's minimum element, make a new heap (*) from the first heap's minimum element, the first heap's left subheap, and the result of merging the first heap's right subheap with the second heap.
Otherwise, make a new heap (*) from the second heap's minimum element, the second heap's left subheap, and the result of merging the first heap with the second heap's right subheap.
(*) Here is how to make a new heap from a minimum element and two heaps: the resulting heap must have:
This algorithm will preserve the leftist heap property in the merged heap.
Your implementation should be written as follows (assuming that the priority queue will have integers as elements):
module PriorityQueue : (PriorityQueueSig with type t = int) = struct (* Your code goes here. *) end
Note that since the type t
is abstract, if you want to use a
real type as the elements of your priority queue (and it would be pretty
useless otherwise), you have to specify which type you want the
PriorityQueueSig
's t
value to represent. This is
kind of clunky syntax. Also, you might ask why you can't just make it
parametric, like a type 'a list
. You actually can in some
cases, but here we need a type with an ordering relation, and there is no way
to guarantee that any arbitrary type will (a) be orderable at all -- what if
it's a function type? or (b) will be orderable using the same function
(e.g. the built-in compare
function). So it's better to
write the code as it's written above. Also, this will make it easy to
transform this into a functor (see below).
Here's an interesting point to ponder: why don't we also have to specify
what the abstract queue
type represents in the code above?
Use your priority queue implementation to write a heap sort function. This will take a list as its argument and will
from_list
function of the moduleOnly use your module's exported functions (those in the signature) in your solution. Test it by using the heap sort to sort a list of integers.
This code should go into a file called priority_queue.ml
.
As written, the code is dependent on the built-in comparison functions. To make this more generic, let's define some types and module signatures:
(* Type for ordered comparisons. *) type comparison = LessThan | Equal | GreaterThan (* Signature for ordered objects. *) module type OrderedSig = sig type t val compare: t -> t -> comparison end (* Signature for priority queues. *) module type PriorityQueueSig = sig exception Empty type t type queue val empty : queue val is_empty : queue -> bool val insert : queue -> t -> queue val find_min : queue -> t val delete_min : queue -> queue val from_list : t list -> queue end
What you have to do now is generalize your priority queue implementation into a functor that takes a module matching the OrderedSig signature as its argument, and produces a priority queue (implemented using a leftist heap) which is specialized for that particular type of data. For instance, you can define a module of ordered strings like this:
module OrderedString = struct type t = string let compare x y = if x = y then Equal else if x < y then LessThan else GreaterThan end
and then define your string priority queue like this:
module StringPQ = MakePriorityQueue(OrderedString)
Once you've done this, redefine your heap sort function using a StringPQ as the heap. Note that this heap sort will only work on strings. Use it to sort a list of strings.
To get you started, here is a skeleton of the code you should use for the functor definition:
module MakePriorityQueue (Elt : OrderedSig) : (PriorityQueueSig with type t = Elt.t) = struct (* Your code goes here... *) end
Note that again you have to specify what the type t
in the
PriorityQueueSig
represents. Here, it better be the same type
as the type t
in the OrderedSig
argument (which we
have called Elt
).
Your functor code should go into a file called
make_priority_queue.ml
.
The files priority_queue.ml
and
make_priority_queue.ml
. Add some test code to both files to
show that your heap sort works correctly.