Tree-Set, Part 1

In the next two assignments you will implement an ordered-set data type that internally uses a binary search tree to store and retrieve its values. A binary search tree is a binary tree where each node stores a value, and a node's left subtree stores values less than the node's value, and the right subtree stores values greater than the node's value. This is an invariant on the data-structure that must always hold.

Because we are implementing an ordered set using a tree, we will call it the TreeSet, and declare it in a file treeset.h. If you want to separate the function definitions from the declaration then you can do this in treeset.cpp. For now, our TreeSet will only store int values. We will make the TreeSet much more generalized in a future assignment.

You do not need to maintain a balanced tree in this assignment. Balanced binary trees impose additional constraints on the distances between the root and the leaves of the tree, so that searching for a particular value is always O(log(N)) worst case. Our tree won't be quite as good, but it will definitely be simpler to implement.

The TreeSet may only expose operations that are relevant to ordered sets; it must not expose any internal implementation details, like the fact that the set is implemented with a tree. Therefore, you will need to create some internal "tree-node" type that is only visible within your tree-set class, which can hold a value, and pointers to a left and right subtree.

In this implementation you must use C++ smart pointers. Smart pointers implement the "Resource Acquisition Is Initialization" (RAII) pattern for heap-allocated memory, and they make it much easier to avoid leaking memory, because they completely eliminate the need to delete heap allocations. For this assignment you will want to use the std::shared_ptr<T> type, declared in the <memory> C++ standard header.

Note: Your tree nodes should not need to have a pointer to their parent node. C++ smart pointers work through reference-counting, and if nodes point to both their children and their parent then the reference graph will have cycles in it. You can break such cycles by using the std::weak_ptr<T>, but you actually won't need for nodes to know their parents for the functionality you will implement in this data structure.

Public Interface

Here are the public operations you must provide on your TreeSet class:

Implementation Requirements

As stated earlier, it should not be visible outside of your class that you are building the set with a tree, so the tree-node type should be declared inside the private section of the class. To reduce typing, you may also want to declare a smart-pointer type with the using syntax; if you do so, do this inside the private section of your class as well.

You must also provide a private sanity_check operation that traverses the set's internal tree and verifies that the tree's invariant is maintained: each node's left subtree should only contain values less than the node's value, and the right subtree should only contain values greater than the node's value. It is up to you how you decide to implement this; the reference implementation has a function like this:

bool sanity_check(sp_node n, int minval, int maxval)

The function verifies that the node n holds a value between minval and maxval, and then it recursively checks the children of n with the same function, updating minval and/or maxval appropriately. The function prints all identified issues to cerr (not stopping after the first issue is found), and it returns false if any issues are identified along the way. This way the function can be used with assert() everywhere the tree is changed.

Once you have completed a sanity-check function, call it every time you mutate the contents of the tree, so that you can catch any bugs immediately.

Because you are using smart pointers in your implementation, your tree set implementation should not leak any memory. You can use tools like Valgrind to check your implementation.

Finally, to repeat the earlier requirement, your tree nodes may not have a pointer to their parent node. This will make adding and deleting nodes a bit more complex, but it should not create a substantial increase in complexity if you do it correctly.

Implementation Suggestions

The most difficult part of this assignment is implementing deletion of values, because the tree must maintain the binary-search-tree invariant stated earlier. Wikipedia has a helpful article on binary search trees, and the deletion section is particularly helpful.

Recall that in C++, you can create member functions, constructors, destructors, etc. on structs as well as on classes.

Finally, your del(int value) implementation may find it useful to recursively invoke itself in certain cases, but unnecessarily slow implementations will be penalized.

Testing

A test suite for your class is provided here. This program does some simple testing, and then it does an exhaustive test of tree-sets up to 6 elements, adding and deleting values in all possible orders. This is not a very large collection, but the act of inserting and deleting in all possible orders should exhaustively exercise your add/delete code. If your code uses your sanity-check function before and after every modification, you can have very high confidence that your code is correct after you have passed the entire suite.

Note that the N=5 and N=6 tests may take some time; the reference implementation requires about 10 seconds for the N=6 test.

Make sure to fix any test failures you encounter.

Build Automation

As usual, you will need to write a Makefile for your project that builds and tests your TreeSet code.

Submitting Your Work

Once you have completed the above tasks, submit your work through csman. Make sure to submit these files:

You do not need to submit the test code; we will test your program with a fresh copy of these files.

Assignment Feedback Survey

Please also complete and submit a feedback survey with your submission, telling us about your experience with this assignment. Doing so will help us to improve CS11 Advanced C++ in the future.


Copyright © 2019-2020 by California Institute of Technology. All rights reserved. Generated from treeset-1.md.