In this assignment you will complete the implementation of a write-ahead logging system for NanoDB that will allow the database to provide transaction atomicity and durability. The specific components to implement are:

Toward the end of the assignment, you will again have to work with some low-level file structures. You are encouraged to use logging statements (e.g. logger.debug(...)) generously as you write your code; it will make debugging much easier for you in the long run.

NanoDB Transaction Processing

This section will give you an overview of how transaction processing works in NanoDB. It is long, but your team should read it carefully since it touches on all key details.

NanoDB includes a very basic implementation of transaction processing and write-ahead logging that captures the basic concepts of the ARIES write-ahead logging mechanism. Transaction management is provided by the code in the edu.caltech.nanodb.transactions package, and write-ahead logging is provided by the code in the edu.caltech.nanodb.storage.writeahead package.

Transaction Demarcation

The transaction manager leverages the same event-handling framework that index management uses, but it uses the before/after command events rather than the before/after row events. Transaction management is straightforward:

The user can issue a "BEGIN [ WORK ]" or "START TRANSACTION" command to manually start a transaction. Subsequent DML statements will be part of the same transaction until either a "COMMIT [ WORK ]" or "ROLLBACK [ WORK ]" command is issued, ending the transaction.

The user can also issue DDL/DML commands without using the above operations, and the transaction manager will automatically enclose each command within its own transaction. (Note that DDL is not transacted and cannot be rolled back.)

The entry points for starting, committing, and rolling back transactions are the following methods on the TransactionManager class:

The TransactionStateUpdater class implements the logic described above for starting and ending transactions, and the [ Begin | Commit | Rollback ] TransactionCommand classes in the commands package also call the above methods on the transaction manager.

Write-Ahead Logging

The write-ahead logger is managed as a component of the transaction manager; no code outside of the transaction manager really should need to interact with it. The main class to be aware of is the WALManager class. It exposes writeXXXX() operations to write various log records to the WAL, a rollbackTransaction() method that uses the WAL to rollback the current transaction, and a doRecovery() entry-point that performs recovery processing when NanoDB starts.

The write-ahead logger implements an extremely simplified version of the ARIES logging and recovery mechanism. The logger does not support checkpointing, although it does support all record types discussed in class: begin/commit/rollback, update, and redo-only update. Other relevant details are as follows:

The WALManager class updates the pageLSN value on DBPage objects when the writeUpdatePageRecord() and writeRedoOnlyUpdatePageRecord() methods are called. You shouldn't have to manage these values yourself.

Note that some dirty DBPage objects will not have a pageLSN value! Specifically, pages in files that do not support transactions will not have this value set; WAL pages in particular will not specify this value. In general, data pages relating to the write-ahead log are not transacted. (It wouldn't make sense to log changes to write-ahead log pages in the write-ahead log.)

Dirty Page Management

Managing data pages in the context of write-ahead logging is somewhat involved. For example, when a page is modified, the original version of the page must be kept so that the write-ahead logger can generate the "old value" and "new value" to store in the log. These kinds of capabilities are provided on the DBPage class - you will notice that when a clean page is made dirty (by DBPage.setDirty(true)), it also makes a copy of the clean version of the page.

Similarly, when a change to a dirty data page has been logged to the WAL, it is no longer necessary to record the original version of the page because the WAL now reflects the changes. Thus, the WAL Manager calls DBPage.syncOldPageData() when it writes an "update" record, so that the "old value" of the page matches the "new value" of the page. To see why this is important, imagine the following sequence of events against a specific page:

This is why the logger calls DBPage.syncOldPageData() after writing an "update" record, so that each time an update is recorded, it only reflects the changes since the previous "update" record.

Logging Writes to Table Files

Logging the changes to data pages is very dependent on the kind of data file being managed. For example, in heap files, adding or removing a tuple involves changing the page's slot array, as well as changing the tuple-data region of the page. Thus, any code modifying a data page page must inform the write-ahead logger when it is time to log changes to the WAL. This is done via the DBPage.logDBPageWrite() method. (If the page is not dirty, this is a no-op.)

Currently, the heap tuple file implementation doesn't include any of these calls, so changes will not appear in the write-ahead log; therefore, they will not be durable or atomic. This will be one of the first tasks to complete, calling logDBPageWrite() at appropriate times so that operations against tables will be properly logged in the WAL and governed by transactions.

Page Debugging

Unfortunately, debugging the write-ahead logger tends to be difficult. You will need to look at logging output as the database performs transaction rollback, redo processing, undo processing, and so forth. Be prepared.

Sometimes you will need to see the contents of data pages that have been modified. There are a few helper functions on DBPage to help you with this:

The CRASH Command

In order to simplify basic testing, NanoDB has a CRASH command that will cause the database to terminate immediately without saving any buffered data. It is very easy to use:

CMD> CRASH;
Goodbye, cruel world!  I'm taking your data with me!!!
$ [your shell prompt]

This command is useful for basic testing, but it will not allow exhaustive testing of your transaction processing system. The reason is simple: databases can also crash at far more inopportune times than just at the start of a command. To get us a little closer to that, you can also give an argument to the CRASH command, the number of seconds to wait before crashing:

CMD> CRASH 5;
CMD> INSERT INTO t VALUES (1, 'a', 1.1);
CMD> INSERT INTO t VALU
Goodbye, cruel world!  I'm taking your data with me!!!
$ [your shell prompt]

This version of the command starts a background thread that will crash the database after the specified number of seconds has passed. This allows a test script to go on to other things, so that the database crash can interrupt more serious operations, like writing data out to disk, etc. There is still an element of chance in it, so it'll be like playing Russian Roulette, at least until you have transaction atomicity and durability working properly. After that, crashes shouldn't matter.

The FLUSH Command

NanoDB also has a FLUSH command that will cause the database to flush any buffered data in the Buffer Manager to disk.

CMD> FLUSH;
Flushing all unwritten data to disk.
CMD>

You can use this command to make sure that the buffer manager and write-ahead logger work together properly. This can be particularly useful when testing crashes, since only commits force the WAL out to disk. For example, you can rollback a transaction (or just leave it incomplete), flush the database to disk, and then crash the database, like this:

CMD> ROLLBACK;
CMD> FLUSH;
Flushing all unwritten data to disk.
CMD> CRASH;
Goodbye, cruel world!  I'm taking your data with me!!!

Next, you can try to restart NanoDB and see if it does the proper thing during recovery processing.

PLEASE READ: Important Limitations!

Currently, NanoDB does not support transacted DDL, for a variety of reasons. Don't even try it. The same goes for indexes; indexes are not transacted.

The write-ahead logger supports logs that span multiple files, but this functionality has not been heavily tested. You are expected to support multiple logs, but if you run into strange bugs that don't seem to be in your code, please talk to Donnie and/or the TAs right away.

Step 1: Enable Transaction Processing in NanoDB

Before you can work on the write-ahead logger implementation, you need to enable transaction processing in NanoDB. Just like with indexes, this subsystem is controlled via a system property that must be enabled at startup; it cannot be turned on while the database is in normal operation.

Go to the PropertyRegistry.initProperties() (server.properties package) method, and configure these features as follow:

These changes will become active after you have recompiled NanoDB.

NOTE: After doing this, you need to delete the contents of your datafiles directory, since this change does affect what files are created and managed in this directory.

You may also want to disable the index settings from Assignment 6, if you ran into issues with your B+ tree implementation. None of this lab depends on the index implementation working properly.

Step 2: Add Logging to Heap Tuple Files

As stated earlier, heap tuple files currently don't log any of their changes. Fortunately, this is not terribly difficult to add, especially since the number of pages that change for any given operation will tend to be small.

You need to update the heap tuple file code to properly log all changes to the write-ahead log. Go through both the HeapTupleFile and HeapTupleFileManager classes, looking for any place that a DBPage is written to. For example, adding/updating/removing a tuple, storing statistics, updating non-full-page lists, would all be places where changes are made to the tuple file. Wherever you finish modifying the contents of a DBPage, add in a call to the Storage Manager's logDBPageWrite(DBPage) method as the last step of that operation. This will ensure that the changes to the page are written to the write-ahead log.

Try to avoid calling logDBPageWrite() on a page multiple times for a single operation. For example, it would be wasteful to call logDBPageWrite() once after updating a page's slots, and then again after updating the page's tuple data. Rather, it should be called once, after all changes have been made to the page.

Step 3: Implement an Atomic Force-WAL Operation

The TransactionManager class provides a forceWAL(LogSequenceNumber) method that is critical for atomic and durable transaction processing. This method must ensure that the write-ahead log is written out to the specified LSN, and sync'd to disk, before it returns. This method is called every time a transaction is committed to ensure that the committed transaction is reflected in the WAL, and it is also called anytime dirty pages must be written to disk during a transaction.

The only problem is, our write-ahead log can span multiple files. And, there is a second issue that the buffer manager could output WAL pages in any order it wants. We have no guarantees that the database won't crash during this process, which could leave our WAL completely corrupted. (You could say it was "WALloped," ha ha haaaaaa!) Therefore, careful thought must be given to how the WAL is written to disk, to ensure it is updated atomically.

The transaction manager also manages a separate file called txnstate.dat, which it uses to record various critical values related to transaction management:

Note that the WALManager also maintains a nextLSN value in memory, but the WALManager's value will be larger than this value if there are unsaved WAL records/pages in the buffer manager. The txnstate.dat file's "next LSN" value should always reflect the end of the write-ahead log that is stored on disk.

This file is intentionally small; it should always be atomically writable. This means that it should fit within one disk sector (as small as 512 bytes, but possibly up to 4KiB in size with modern disks), so that write-tearing will not be likely to occur (i.e. when a buffer that spans an atomic-write boundary is written or sync'd to the disk, and the system crashes as that boundary is being crossed).

Implement the transaction manager's forceWAL(LogSequenceNumber) method to atomically and durably write the WAL, as far as the specified LSN. Specifically, this method must ensure that buffered WAL pages containing records up to the specified value are written out of the buffer manager, and that the involved WAL files are also sync'd to disk. This method will need to look at the current transaction state to tell what range of WAL records must be written out, and will use the buffer manager to ensure that pages containing those records are indeed written and sync'd to disk. It will also need to modify the current transaction state.

Of course, you will also need to make sure that you update the WAL in an atomic and durable way, particularly in the context of the issues outlined earlier, namely having multiple WAL files and buffer pages being written out in no particular order.

Here are additional requirements and suggestions:

Once this operation is available, you can use it to make sure that the database will follow the write-ahead logging rule. This is the focus of our next task...

Step 4: Enforce the Write-Ahead Logging Rule

When the Buffer Manager needs more memory, it evicts some of the pages loaded into memory, taking care to write dirty pages back to disk. This is all fine and good, except that the Buffer Manager doesn't know anything about transactions or write-ahead logging. If the Buffer Manager's page-writes are not coordinated with the Transaction Manager's updates to the write-ahead log, then there is no way we can enforce durability and atomicity in the database.

The solution to this problem is that the Buffer Manager can notify other components when dirty pages are about to be written to disk. The Transaction Manager registers a handler on the Buffer Manager to receive notifications when dirty pages are about to be evicted; this is where we can ensure that the write-ahead logging rule is enforced. The method that the Buffer Manager will call is TransactionManager.beforeWriteDirtyPages().

Implement this method on the Transaction Manager so that the database will follow the WAL rule when any dirty pages are written back to disk. This is a pretty straightforward operation to complete, especially after the forceWAL() method from the last step has been implemented. In fact, this method will call forceWAL(); the only complexity is figuring out what LSN to pass to the method. There are more details in the comments for beforeWriteDirtyPages().

Once you complete this operation, the database will follow the write-ahead logging rule. Additionally, you should see the database updating the write-ahead log with details of committed transactions, which would make them durable.

Except... we don't have any recovery processing yet. Not only that, but if we end up needing to roll back a transaction, NanoDB can't do that either. So really, we don't have much durability at the moment, and our atomicity story is also weak to nonexistent.

Step 5: Implement Transaction Rollback

For the final two tasks, you will need to become familiar with the format of the write-ahead log. The Javadocs for the edu.caltech.nanodb.storage.writeahead package describe the log format in detail; you should refer to this documentation as you complete the last two steps.

The WALManager class provides a rollbackTransaction() method used during normal operation to rollback the current transaction a client is participating in. You will need to complete the implementation of this method. Note that this method is not used during recovery processing; transactions are rolled back in a different way during recovery.

The rollbackTransaction() method operates as described in class. NanoDB manages session state for each client, and part of that session state is the LSN of the last WAL record issued for the current transaction. This is where rollback begins. Additionally, all WAL records except for the "start transaction" record include a "previous LSN" value, allowing the sequence of operations performed in the transaction to be rolled back in reverse order.

Of course, this is a relatively straightforward task, because rollback only cares about two records: the "start" record which means rollback is done, and the "update page" record which specifies a change to roll back. Any other record encountered during rollback would indicate a serious error, worthy of throwing a WALFileException to report a corrupt write-ahead log.

The WALManager provides a helper method getWALFileReader(LogSequenceNumber), that opens (or retrieves from the buffer manager) the WAL file corresponding to the specified LSN, and seeks to the file-offset specified by the LSN. The returned object is a DBFileReader, which allows our page-based database files to be read and written in a more stream-like format. You should always use this getWALFileReader() method to get a reader for a given LSN, to avoid unnecessary complexity in your implementation.

As the transaction is rolled back, new records must be written to the write-ahead log. Use the writeTxnRecord(WALRecordType) and writeRedoOnlyUpdatePageRecord(DBPage,int, byte[]) methods to write these records as you rollback. Note that there are two versions of these methods; use the versions that do not require transaction info to be explicitly specified! Otherwise, the client's transaction state will not be updated properly during rollback.

The last important detail is how to undo the changes to a particular data page. The WAL record will specify the file and page number that was modified; load this page using the storage manager and then use the applyUndoAndGenRedoOnlyData() helper to undo the changes on the page and generate the redo-only data at the same time. Note that this method has very specific requirements on the DBFileReader position when the method is invoked; read the method's Javadoc for details!

Once you have completed the rollbackTransaction() method, you should be able to successfully start and rollback transactions in your database. Give it a try:

  1. Create a simple table and add a few rows to it.

  2. Then, start a new transaction with BEGIN, and try either changing some rows and/or adding new rows to the table. Query the table to verify that your changes are all visible!

  3. Finally, rollback the transaction by typing ROLLBACK. If your code is working properly, you should be able to query the table after rollback and see the original contents. Pretty cool!

If you have any issues during your testing, you will probably want to delete all files in your datafiles directory. Corrupt or invalid write-ahead logs will leave NanoDB in a corrupted state, and then you will be on a wild-goose chase.

Once this step is finished, you will have atomic transactions in NanoDB. However, they are still not entirely durable until recovery processing is also completed. Onward!

Step 6: Implement Redo and Undo Processing

This is probably the most complicated part of the write-ahead logging implementation, because it involves moving forward and backward through a log containing variable-size records. Just like with the index implementation, this low-level stuff can become a little grungy. Such is life.

The WALManager provides an entry-point for recovery processing called doRecovery(). The TransactionManager invokes this method at database startup using the values loaded from the txnstate.dat file. We already discussed the nextLSN value stored in this file; it is just past the last known-good record in the WAL. The firstLSN value is important for recovery: it is the lower bound on the set of records necessary for recovery processing. The firstLSN value provides the following guarantees:

Unfortunately, without checkpoints, we really cannot easily move firstLSN forward until the completion of recovery processing. Thus, this is the only time that NanoDB moves this value forward.

You will notice this problem if you work with a large data-set, such as the TPC-H data-set. If you create the TPC-H tables, then load the data, and then shut down NanoDB, you will likely observe a long pause the next time you start the database. This is because NanoDB is replaying the entire write-ahead log for the entire data-load. It would be much nicer to have a way to move forward firstLSN during normal operation. You will have an opportunity to explore this topic when you complete the design-document.

Redo Processing

Redo processing is implemented in performRedo(). This method takes a RecoveryInfo object, which keeps track of what transactions are incomplete, along with the last LSN seen for each transaction. This second detail is important: when incomplete transactions are rolled back, we must properly chain new WAL records into the earlier records for each transaction.

You must complete the implementation of this method. The implementation should be mostly straightforward; update the RecoveryInfo object as appropriate for the WAL records you encounter, and be aware that for this method you must make sure the walReader is always moved past each WAL record.

In situations where a change must be redone, you can use the applyRedo() method to perform the task. As with the applyUndoAndGenRedoOnlyData() method, the walReader must be positioned just after the segment-count when the method is invoked. (In this case, applyRedo() could read the number of segments itself, but it seems beneficial to make both methods behave similarly.)

No WAL records should be emitted during redo processing!

Undo Processing

Undo processing is implemented in performUndo(). This method takes the same RecoveryInfo object populated by performRedo(), since now the set of incomplete transactions should be known. This method behaves similarly to the rollbackTransaction() method, except that it is rolling back all incomplete transactions at the same time. As before, it only cares about "start" records and "update" records; in particular, "redo-only update" records are ignored during this phase. (Of course, a "commit" or "rollback" record should never be seen for an incomplete transaction.)

You must complete the implementation of this method as well. Note that the part of this method that you have to implement is relatively small compared to the overall size of the method; the bulk of the code is devoted to navigating backward through the write-ahead logs, a rather tedious task that doesn't lend itself to factoring into another method.

As stated before, you will find this implementation very similar to the rollbackTransaction() implementation, but there will be some important changes. You will need to emit additional WAL records as you rollback incomplete transactions, but this time you must use the information in the RecoveryInfo object to manage each transaction's "previous LSN" value. You cannot use the writeXXXX() methods that pull these details from the session state; your logs will be corrupt!

Important Note: Using the debugger to verify your redo/undo processing code is an excellent idea, but this recovery code runs right when the database starts up, which will be before you get a chance to connect your debugger to the database. Therefore you will need to change your nanodb script to cause the JVM to suspend itself when it is started up, giving you a chance to connect your debugger to the JVM. In your nanodb script you will have a line like this:

JAVA_OPTS="$JAVA_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5009"

Change the "suspend=n" to "suspend=y", and then the JVM will pause at startup until a debugger connects.

Step 7: Testing

Obviously, with a system as complex as write-ahead logging, there can be a lot of subtle errors in your implementation. Therefore, you should test your code as extensively as you can to make sure it works. You should test from simple to complicated, so that you can isolate issues as quickly as possible. Also, putting generous logging statements in your code will give you a lot of help in identifying issues.

Finally, don't forget that you had to add write-ahead logging to your heap-file implementation, so it would be a very good idea to test inserts, updates and deletes, involving one page and multiple pages, and so forth. That way you can ensure that your database logs all changes properly.

Here are some ideas for testing. In between tests, you may want to delete the contents of your datafiles directory so that corrupt files don't send you on a wild-goose chase.

  1. Start the database, then shut it down, then start it again. See if this works without error.

  2. Start the database, create a table, insert a few rows (relying on autocommit to commit each operation separately). Shut it down.

    Start it again, and make sure the table is the same, and that recovery processing ran successfully.

    Start it again, and see if recovery processing is now a no-op, since the firstLSN value in txnstate.dat will have been moved forward by the previous startup.

  3. Start the database, create a table. Explicitly start a transaction, and make multiple changes (e.g. insert multiple rows) within that single transaction. Then commit it.

    Try the same steps as in the previous test, to see if the data stays the same after restarting the database. Make sure recovery processing runs without errors.

  4. Start the database, create a table, insert a few rows (relying on autocommit again).

    Explicitly start a transaction, perform a few changes (e.g. change values, insert rows), then rollback the transaction. Make sure the original table data is restored.

    Shut down the database, then restart it. Make sure recovery processing ran successfully. Verify that the data is unchanged.

    Shut down and restart the database again, and check the data again. (Remember, for the second restart, recovery processing should be a no-op.)

  5. Repeat the same steps as for 4, but use CRASH instead of ROLLBACK to see if things will work properly.

    The previous test tends to work with no problems, because the WAL isn't forced out until a commit or a database shutdown. Therefore, you should also try issuing a FLUSH before the CRASH; this will ensure that your recovery processing can handle an incomplete transaction in the WAL.

  6. Start the database, create a table. Perform some modifications and commit them. Then perform other modifications and roll them back. Finally, perform more modifications and commit them. Make sure the data is correct.

    Shut down and restart the database, and verify that recovery works correctly. Verify the data is valid.

    Shut down and restart the database again, and check the data again.

  7. Start the database, create a table. Add some data to the table.

    Start a transaction, perform some operations within the transaction, then abort the transaction.

    Start another transaction, perform some more operations, then abort that transaction. Make sure the original table data is still there!

    Shut down (or flush and crash) the database, then restart it and make sure the contents of the table are still correct.

If you are reasonably rigorous about your testing, you should have pretty good confidence that your write-ahead logging is working correctly.

You will also find a file in schemas/writeahead/basic-tests.sql. You can't run this file directly against NanoDB; it requires manual steps that aren't yet scriptable. But, you can use it to do some of the testing specified above.

Suggested Approach

One teammate should enable the transaction-processing properties (step 1) and commit this work right away, so that all teammates are able to work on the transaction code.

Once transactions are enabled, steps 2 through 6 can be completed in parallel. However, steps 5 and 6 probably shouldn't be started until your code is reliably generating write-ahead logs, because you will have nothing to test your code with.

Of course, the entire team should work on step 7 (testing), as much as possible and as early as possible.

Submitting Your Assignment

Once you are finished with the above tasks, your team needs to complete the design document for this assignment, available in your repository at "doc/lab7design.txt".

When you are ready to submit your code, and your design document includes your team's answers to the design questions, you need to tag the version of your code that you want us to grade. Only one member of your team should perform these steps. If multiple people on your team do this, it will mess up your team's repository.

The instructions are the same as for the previous assignment, although you will use a hw7 tag this week. Again, note that if you decide to resubmit your work (e.g. because you found a bug that you needed to fix), you cannot reuse the same tag name again! Therefore, use an updated tag name "hw7-2", "hw7-3", etc. so that we can still find the submission. Also, don't forget to push the new tag to your submission repository.

Finally, one team member should submit this short document and submit it on the course website. Only one teammate should submit this document so that we don't get confused! You can list the Git repository hashes of your tags with this command:

git show-ref --tags