In this assignment you will get to explore an implementation of the B+ tree index structure. The tasks to complete are as follows:

A description of these tasks, as well as a number of extra-credit tasks, is given below.

Overview

In the edu.caltech.nanodb.storage.btreefile package you will find a basic implementation of a B+ tree tuple file. This implementation follows the description given in class, with a few important differences. Possibly the most important one is that the B+ tree implementation can be used to store tables as well as indexes. This is an important design choice, because it becomes easy to perform file-scans over indexes without having to make any significant changes to the file-scan code paths.

NanoDB's CREATE TABLE command allows you to specify the storage format to use for each table. For example, this will create a table using the B+ tree format:

CREATE TABLE bt (
    a INTEGER,
    b VARCHAR(20)
) PROPERTIES (storage = 'btree');

INSERT INTO bt VALUES (1, 'abc');
INSERT INTO bt VALUES (2, 'xyz');
INSERT INTO bt VALUES (1, 'def');
...

Since the B+ tree format keeps its records in sequential order, you should notice when you "SELECT * FROM bt;" that the records come back sorted on all columns. (Note that the above will not work until you have completed the first step of Part 1.)

Given this design, an index is simply a table that is built against another table, containing a subset of the referenced table's columns, and including an extra column that holds file-pointers to tuples in the referenced table. Rows in the index are populated from the referenced table. Finally, the index uses some file organization that facilitates equality-based and/or range-based lookups against the index, such as a sequential or a hashed file format. Of course, for us this will be a B+ tree.

To illustrate the approach, we can create a table and an index as follows:

CREATE TABLE t (
    a INTEGER,
    b VARCHAR(30),
    c FLOAT
);

CREATE INDEX i ON t (a);

Under the hood, we will end up with a tuple file "t.tbl" that has a schema (t.a : INTEGER, t.b : VARCHAR(30), t.c : FLOAT). Additionally, we will have an index file called "t_i.idx" that has the schema (t.a : INTEGER, t.#TUPLE_PTR : file-pointer), with one row for every row in table t. (Recall that index names are unique on a per-table basis, but two different tables can have indexes with the same name. Thus, we must use the table's name as part of the index's filename.)

Here are some additional notes on NanoDB's B+ tree implementation:

Other than that, the implementation follows the description in class almost exactly.

Important Implementation Classes

Here are the major components in this B+ tree implementation. You will notice that it is mostly similar to the heap file implementation, with a few obvious differences due to the implementation details.

The BTreeTupleFile class provides most of the operations for accessing or modifying tuples in a B+ tree file. It delegates many file-manipulation tasks to two classes, InnerPageOperations and LeafPageOperations, but it does perform some of the most basic operations such as looking up a leaf-entry in the file based on a search-key. A third class, FileOperations, is responsible for finding new empty pages in the file when more data needs to be stored, or recording that a formerly-used page is now empty.

The InnerPageOperations and LeafPageOperations classes handle larger-scale tasks like inserting entries into B+ tree nodes, splitting nodes, and relocating entries between nodes. If you review this code, you will note that the implementations are very similar, but just different enough to force two separate implementations. (Oh well.)

These two classes also use the InnerPage and LeafPage wrapper-classes to manipulate individual B+ tree pages. Each of these classes is used to wrap a DBPage object, allowing the contents of the node to be manipulated more easily.

Finally, the BTreeTupleFileManager class provides file-level operations such as creating a new B+ tree file, storing the metadata, and so forth.

Keys and Indexes

NanoDB can be configured to automatically create indexes in certain situations. For example, if a table is declared with a primary key, or one or more UNIQUE constraints, the database can automatically create unique indexes on these keys. Therefore, you can also create indexes by issuing commands like:

CREATE TABLE t (
    a INTEGER PRIMARY KEY,
    b VARCHAR(30) UNIQUE,
    c FLOAT
);

Foreign key constraints can also cause NanoDB to create indexes on both the referencing table and on the referenced table, to expedite the enforcement of referential integrity constraints.

That said, all of this functionality is initially turned off, since your B+ tree implementation isn't yet functional. After you have gotten B+ trees working, you can enable this functionality by modifying the PropertyRegistry.initProperties() (server.properties package) method to turn on these features:

(These properties cannot be set at just anytime; they must be set at the beginning of NanoDB execution, since they affect the files that are stored. Also, you may want to erase the contents of your datafiles directory if you wish to turn these on going forward.)

Note that indexes will not make queries or constraint checks any faster because your planner doesn't know how to use them. This is an extra-credit option if you get through the main parts of the assignment.

Index Management Mechanism

Indexes must be updated anytime a table is changed. To facilitate this, NanoDB fires events before and after commands are executed, and also before/after any row is inserted, updated, or deleted. The implementation for this mechanism is in the edu.caltech.nanodb.server package, in the EventDispatcher class. Components can implement the CommandEventListener interface to receive before/after command events, or the RowEventListener interface to receive before/after insert/update/delete events. Such listeners are then registered on the EventDispatcher singleton to receive notifications when these events occur.

At the end of StorageManager.initialize(), the Storage Manager registers a row-event listener called the IndexUpdater (in package edu.caltech.nanodb.indexes), which handles all index updates. Anytime a table is modified, the index-updater goes through the table's indexes, applying the appropriate updates.

Important Final Notes!!!

The B+ tree and index code in NanoDB is still a bit buggy. There are a number of index and B+ tree update/delete tests, which all pass against the solution implementation, but don’t be surprised if you run into a little difficulty.

Also, the B+ tree implementation doesn't completely implement pinning and unpinning. Therefore, you may want to turn off unpin-error messages if they become annoying. (There is an extra-credit task if you wish to get the pinning and unpinning working properly, but you are not required to do this!)

Finally, as mentioned earlier, none of your planners know how to use indexes, so just implementing indexes won't make queries faster; your planner must also know how to take advantage of available indexes.

Part 1: Complete Missing B+ Tree Operations

The B+ tree implementation you have been given is missing several important pieces, which you must implement. Those pieces are outlined in this section. You can exercise your implementation manually by creating tables of the btree format, as indicated earlier. As you get your implementation working, you should be able to create a table, insert tuples in any order, and when you select from it you should see that the tuples are always scanned in sorted order. Note that tuples are sorted by all columns.

You can also use the VERIFY command to check your table for structural issues.

VERIFY bt;

This command will check the table for any issues, along with any indexes built against the table. All problems that are encountered will be printed out to the console.

Here are the tasks for you to complete:

  1. All operations - adding a tuple, removing a tuple, or searching for tuples - require the B+ tree structure to be navigated from root to leaf. This operation is partially implemented in the navigateToLeafPage() method of the BTreeTupleFile. You will need to complete this implementation.

    Note that this method only navigates the inner-page structure of the index until it reaches a leaf, and then the leaf page is returned to the caller. What happens after that depends on the specific operation being performed.

    Also, all key-comparisons should be performed with the comparePartialTuples() method of the TupleComparator class (edu.caltech.nanodb.expressions package). This method allows tuples of different lengths to be compared, which allows us to search on any prefix of the tuple file’s columns, not just the full set of columns. (It will also allow us to find tuples in indexes without specifying the file-pointer at the end of the search-key.) Note that the comparePartialTuples() method has several comparison modes; use SHORTER_IS_LESS for this operation.

    Once this function is finished, you should be able to create a table using the B+ tree storage format, insert records into it, and see that the contents of the table always appear in order. However, if your table gets large enough to require two leaf pages, the implementation will fail. The reason is that NanoDB doesn't yet know how to split a leaf page into two leaves. Continue to the next step...

  2. To support B+ tree files larger than one leaf page, the implementation must be able to split a leaf into two leaves, and then update the parent of the leaf with the new leaf-pointer. This operation is handled by the splitLeafAndAddTuple() method of the LeafPageOperations class. You will need to complete this implementation.

    As always, there are many helper functions to help you with the implementation, on both the LeafPage and InnerPage classes. Probably the most complicated part will be updating the parent of the leaf properly, but you can use the InnerPageOperations class to help you with this task.

    Note that the pagePath argument must always be the path to the specific page being manipulated by a given function. Thus, when calling InnerPageOperations functions, you must remove the last element from the pagePath list. This is simple to do, and fast too, even though we are using an ArrayList for the collection: since we are removing the last element in the array-list, this will be a constant-time operation.

    Once you are done with this task, you should be able to create B+ tree files with many leaf pages. There is one more problem, though - the index implementation still can’t support multiple inner pages. To fix this issue, continue on to the final step.

  3. The last functionality to complete for this B+ tree implementation is the code that allows inner-page pointers to be moved to a left- or right-sibling page. This is required for splitting an inner page into two, and also for relocating pointers between two sibling inner pages. This functionality is provided by the movePointersLeft() and movePointersRight() methods of the InnerPage class.

    These methods are a bit tricky to implement, mainly because of the requirement that every tuple in an inner page must be sandwiched between two pointers. Given an inner page containing N pointers and N-1 tuples, if you move M pointers (and the M-1 tuples between these pointers) from the node to its right sibling (M < N), this will expose a tuple in the node without a pointer on its right. Similarly, if you move M pointers from the node to its left sibling, this will expose a tuple without a pointer on its left.

    Additionally, the sibling node receiving the M pointers and M-1 tuples will already have pointers on both sides of all its tuples.

    This is where you must figure out how the parent node's tuple fits into the puzzle. In the slides we discussed what happens when a single pointer is moved to a sibling inner-node, but in this implementation it is possible to move M pointers, not just one. You will have to figure out where to store the parent's old tuple, if provided, and what to return as the parent's new tuple.

    (You will always return a new key in your implementation. You may not receive an old tuple if the top-level inner page is being split, since there will not yet be a parent of the node being split. The tuple you return will be used in initializing the new top-level inner page.)

    The other complexity is that when moving M pointers to the left sibling, these pointers are taken from the start of the node's sequence, whereas when moving the pointers to the right sibling, they are taken from the end of the node's sequence. When moving pointers right, the implementation must make room in the target node for the new entries. When moving pointers left, the implementation must slide the remaining entries in the source node left.

    For these kinds of operations, the DBPage.moveDataRange() method will be very helpful.

    You must never write to the DBPage's internal byte-array directly! Doing this will break the DBPage's ability to track whether the page is dirty. Always use the methods provided on the DBPage to write to its data. (You may find it helpful to read directly from the underlying byte-array, however, when moving data back and forth.)

    Once you have successfully completed this task, your B+ tree implementation should be complete (ignoring any bugs in the supplied code, of course).

Part 2: Support for B+ Tree Indexes

Once you have B+ tree tuple files working, you can complete the mechanism that keeps indexes in sync with their corresponding tables. As explained earlier, there is a row-event handler that will update a table's indexes based on the changes made to the table. Note that this functionality will not be enabled until you have turned on the "nanodb.enableIndexes" property, as specified earlier.

The IndexUpdater class (in the indexes package) handles adding and removing tuples on a table's indexes. (Updates to a tuple are currently modeled as removing the old version and then adding the new version, which is not optimal, but it works well enough.)

There are two methods that you must complete on this class:

  1. The addRowToIndexes() method is called when a row is inserted or updated on a table. This method must iterate through the table's indexes, construct a suitable index-tuple for each index (based on the columns in the index), and then add this index-tuple to the index's tuple file.

  2. The removeRowFromIndexes() method is called when a row is updated or deleted on a table. As before, this method must iterate through the table's indexes, removing the corresponding index-tuple from each index.

    This method should throw an IllegalStateException if an index doesn't actually contain the tuple being removed. It would be a big problem if the database expects a given tuple to appear in the index, when it isn't actually present!

The IndexUtils class has a number of methods that will be helpful for completing these implementations:

Part 3: Analysis of Implementation

The design document for this assignment has a sizable number of questions for your team to answer about the implementation of B+ trees. Complete all questions in this section. You will need to be familiar with the details of NanoDB's B+ tree implementation for most of these questions, although your team can start working on them before your entire B+ tree implementation is complete.

Suggested Approach

This year you will have approximately two weeks to complete the assignment. You should treat the assignment as a 1-week assignment, with extra time for fixing bugs (or doing extra credit). In other words, get started right away, but know that you have some padding if things don't go smoothly.

Parts 1 and 2 can be completed in parallel, but at least the first step of part 1 must be working before you can start testing anything in part 2. Within each part, each step requires the previous steps to work before it will work, but you can still work on them in parallel.

The analysis in part 3 can be started once your team has a clear understanding of the NanoDB B+ tree file format, and how the various operations on the tree structure work. The analysis is involved enough that your team should probably begin working on it once you have enough knowledge to start answering questions. While it can be completed by one teammate, a better approach is to discuss the questions as a team and try to understand the concepts and issues that the questions are focusing on. After this, one or more teammates can write up the answers.

Submitting Your Assignment

Once you are finished with the above tasks, your team needs to complete the design document for this assignment, available in your repository at "doc/lab6design.txt".

When you are ready to submit your code, and your design document includes your team's answers to the design questions, you need to tag the version of your code that you want us to grade. Only one member of your team should perform these steps. If multiple people on your team do this, it will mess up your team's repository.

The instructions are the same as for the previous assignment, although you will use a hw6 tag this week. Again, note that if you decide to resubmit your work (e.g. because you found a bug that you needed to fix), you cannot reuse the same tag name again! Therefore, use an updated tag name "hw6-2", "hw6-3", etc. so that we can still find the submission. Also, don't forget to push the new tag to your submission repository.

Finally, one team member should submit this short document and submit it on the course website. Only one teammate should submit this document so that we don't get confused! You can list the Git repository hashes of your tags with this command:

git show-ref --tags

Extra Credit

Pinning and unpinning may or may not be working for the B+ tree format. If you determine that pinning is not correct, you are welcome to fix it. Otherwise, you may simply ignore pin/unpin issues. (up to +15 points)

The OPTIMIZE command can be used to optimize the storage of a table, both to compact the disk space used by the table, and to ensure that any logical structuring of the table matches the physical layout of the file on disk. Implement this operation for the B+ tree storage format. (+10 points)

If you wish to "close the loop" and incorporate indexes into your query planning, you can examine the IndexScanNode included with the B+ tree code. This plan-node works with other code in the indexes package to allow a query to start at a specific point within an index, and look up tuples in the indexed table based on a query's predicate. You can incorporate this functionality into the SimplePlanner or the CostBasedJoinPlanner to use indexes where available.