Package edu.caltech.nanodb.storage.writeahead

This package contains the implementation for the write-ahead log, which allows us to provide transaction atomicity, consistency and durability. The write-ahead log format is designed with several goals in mind:

  • Log records provide ARIES-like features: the individual records corresponding to a given transaction are chained together via "previous LSN" values stored in each record, allowing fast rollback of transaction modifications.
  • The entire log file can be efficiently traversed both forward and backward, which is necessary during recovery processing.

To implement all of these features, the log record format is somewhat complex. The details are outlined below.

For records that store the Previous LSN value, the Previous LSN is stored as a two-byte log file number (in the range [0..65535]), and then a four-byte file-offset (signed integer), relative to the start of the file.

In the descriptions below, a "B" suffix means "bytes". For example, "6B" means six bytes, and "Ssi B" means Ssi bytes.

<Ti start> (6 bytes)
Start-transaction records don't require a "previous LSN" value, because there is no previous LSN for the transaction. Thus, the format is as follows:
SizeDescription
1BWALRecordType.START_TXN
4BTransaction ID
1BWALRecordType.START_TXN
<Ti update PP' >
Update records store modifications to data pages. We record changes on the page-level for a variety of reasons. First, in some situations the old value or the new value is actually missing, e.g. in the cases of an insert or a delete. Also, we don't want the WAL to be tied to specific data-file formats; we would like to use the WAL for logging index changes as well as table changes. The format is as follows:
SizeDescription
1BWALRecordType.UPDATE_PAGE
4BTransaction ID
6BPrevLSN
1-256BFilename of the modified file, written as a VARCHAR(255). This value can be read with a function like DBFileReader.readVarString255().
2BPage number of modified page, written as an unsigned short
?B Description of the old page P, and the new page P'. The differences are stored as a series of segments, where each segment is a range of bytes that is different between the old and new versions of the page. Each segment is described by a starting index, a size, and then two sequences of bytes.
  • 2B - number of segments Ns (unsigned short)
  • Ns repetitions of:
    • 2B - starting index of the segment in the page (unsigned short)
    • 2B - size of the segment in bytes, Ssi (unsigned short)
    • Ssi B - old version of the data (i.e. undo data)
    • Ssi B - new version of the data (i.e. redo data)
4BFile-offset of the start of this update record, relative to the start of the file.
1BWALRecordType.UPDATE_PAGE
<Ti update (redo-only) P' >
Redo-only update records are similar to standard update records, except that they contain no undo information. The format is as follows:
SizeDescription
1BWALRecordType.UPDATE_PAGE_REDO_ONLY
4BTransaction ID
6BPrevLSN
1-256BFilename of the modified file, written as a VARCHAR(255). This value can be read with a function like DBFileReader.readVarString255().
2BPage number of modified page, written as an unsigned short
?B Description of the new page P'. The differences are stored as a series of segments, where each segment is a range of bytes that is different between the old and new versions of the page. Since this is a redo-only record, only the new version of the data is stored. Each segment is described by a starting index, a size, and then two sequences of bytes.
  • 2B - number of segments Ns (unsigned short)
  • Ns repetitions of:
    • 2B - starting index of the segment in the page (unsigned short)
    • 2B - size of the segment in bytes, Ssi (unsigned short)
    • Ssi B - new version of the data (i.e. redo data)
4BFile-offset of the start of this update record, relative to the start of the file.
1BWALRecordType.UPDATE_PAGE_REDO_ONLY
<Ti commit>
Commit records are 12 bytes:
SizeDescription
1BWALRecordType.COMMIT_TXN
4BTransaction ID
6BPrevLSN
1BWALRecordType.COMMIT_TXN
<Ti abort>
Abort records are 12 bytes:
SizeDescription
1BWALRecordType.ABORT_TXN
4BTransaction ID
6BPrevLSN
1BWALRecordType.ABORT_TXN