Package edu.caltech.nanodb.storage.writeahead
This package contains the implementation for the write-ahead log, which allows us to provide transaction atomicity, consistency and durability. The write-ahead log format is designed with several goals in mind:
- Log records provide ARIES-like features: the individual records corresponding to a given transaction are chained together via "previous LSN" values stored in each record, allowing fast rollback of transaction modifications.
- The entire log file can be efficiently traversed both forward and backward, which is necessary during recovery processing.
To implement all of these features, the log record format is somewhat complex. The details are outlined below.
For records that store the Previous LSN value, the Previous LSN is stored as a two-byte log file number (in the range [0..65535]), and then a four-byte file-offset (signed integer), relative to the start of the file.
In the descriptions below, a "B" suffix means "bytes". For example, "6B" means six bytes, and "Ssi B" means Ssi bytes.
- <Ti start> (6 bytes)
-
Start-transaction records don't require a "previous LSN" value, because
there is no previous LSN for the transaction. Thus, the format is as
follows:
Size Description 1B WALRecordType.START_TXN
4B Transaction ID 1B WALRecordType.START_TXN
- <Ti update P → P' >
-
Update records store modifications to data pages. We record changes on
the page-level for a variety of reasons. First, in some situations the
old value or the new value is actually missing, e.g. in the cases of an
insert or a delete. Also, we don't want the WAL to be tied to specific
data-file formats; we would like to use the WAL for logging index
changes as well as table changes. The format is as follows:
Size Description 1B WALRecordType.UPDATE_PAGE
4B Transaction ID 6B PrevLSN 1-256B Filename of the modified file, written as a VARCHAR(255)
. This value can be read with a function likeDBFileReader.readVarString255()
.2B Page number of modified page, written as an unsigned short ?B Description of the old page P, and the new page P'. The differences are stored as a series of segments, where each segment is a range of bytes that is different between the old and new versions of the page. Each segment is described by a starting index, a size, and then two sequences of bytes. - 2B - number of segments Ns (unsigned short)
-
Ns repetitions of:
- 2B - starting index of the segment in the page (unsigned short)
- 2B - size of the segment in bytes, Ssi (unsigned short)
- Ssi B - old version of the data (i.e. undo data)
- Ssi B - new version of the data (i.e. redo data)
4B File-offset of the start of this update record, relative to the start of the file. 1B WALRecordType.UPDATE_PAGE
- <Ti update (redo-only) P' >
-
Redo-only update records are similar to standard update records, except
that they contain no undo information. The format is as follows:
Size Description 1B WALRecordType.UPDATE_PAGE_REDO_ONLY
4B Transaction ID 6B PrevLSN 1-256B Filename of the modified file, written as a VARCHAR(255)
. This value can be read with a function likeDBFileReader.readVarString255()
.2B Page number of modified page, written as an unsigned short ?B Description of the new page P'. The differences are stored as a series of segments, where each segment is a range of bytes that is different between the old and new versions of the page. Since this is a redo-only record, only the new version of the data is stored. Each segment is described by a starting index, a size, and then two sequences of bytes. - 2B - number of segments Ns (unsigned short)
-
Ns repetitions of:
- 2B - starting index of the segment in the page (unsigned short)
- 2B - size of the segment in bytes, Ssi (unsigned short)
- Ssi B - new version of the data (i.e. redo data)
4B File-offset of the start of this update record, relative to the start of the file. 1B WALRecordType.UPDATE_PAGE_REDO_ONLY
- <Ti commit>
-
Commit records are 12 bytes:
Size Description 1B WALRecordType.COMMIT_TXN
4B Transaction ID 6B PrevLSN 1B WALRecordType.COMMIT_TXN
- <Ti abort>
-
Abort records are 12 bytes:
Size Description 1B WALRecordType.ABORT_TXN
4B Transaction ID 6B PrevLSN 1B WALRecordType.ABORT_TXN
-
Class Summary Class Description LogSequenceNumber This class represents a Log Sequence Number (LSN) in the write-ahead log.RecoveryInfo This class holds information necessary for theWALManager
to perform recovery processing, such as the LSNs to scan for redo/undo processing, and the set of incomplete transactions.WALManager This class manages the write-ahead logs of the database. -
Enum Summary Enum Description WALRecordType This enumeration specifies the various types of records that can appear in the write-ahead log, along with their numeric values that actually appear within the write-ahead log. -
Exception Summary Exception Description WALFileException This exception class represents issues with write-ahead log files, when they contain corrupt or invalid information in some way.