CS122 Winter 2017-2018 - Lecture Slides and Videos

Slides are made available as PDF files. All lectures are recorded in .mp4 format using the H.264 codec. Each recording is around 45MB in size.


Lecture 1: 2018-01-08 [slides] [recording]
Introduction, course overview. Relational DB requirements. Database architecture and components. Data dictionary. Storage hierarchy: primary storage, secondary (online) storage, tertiary (nearline/offline) storage. Magnetic hard disks: access time, seek time, rotational latency time. Disk access optimizations: buffering, read-ahead, I/O scheduling, nonvolatile write buffers (NV-RAM).
Lecture 2: 2018-01-10 [slides] [recording]
Solid State Drives (SSD): erase blocks, write amplification, wear leveling. Filesystem operations: open, read, write, seek, sync. Block-level access. Record-level access: common operations, block-level structure, non-full lists. Record storage format: null bitmaps, slotted-page structure. Record pointers.
Lecture 3: 2018-01-12 [slides] [recording]
Record-level file organization: heap file organization, sequential file organization and search keys, hashing file organization and static hashing, multitable clustering file organization. External memory algorithms and data structures. Query evaluation pipeline: SQL parsing, abstract syntax tree (AST), plan optimization, query evaluator. L-values and R-values in the database.

Lecture 4: 2018-01-17 [slides] [recording]
Plan nodes, evaluation primitives (annotated plan nodes). Materialized evaluation. Pipelined evaluation: demand-driven pipelines, producer-driven pipelines. Blocking operations. SQL query translation to relational algebra: select, project, Cartesian product, grouping/aggregation, theta join.
Lecture 5: 2018-01-19 [slides] [recording]
SQL query translation: SQL-92 join syntax. Nested subqueries in the SELECT clause: scalar subqueries, correlated evaluation. Nested subqueries in the FROM clause, views. Nested subqueries in the WHERE clause: IN/NOT IN and EXISTS/NOT EXISTS. Semijoin and antijoin relational-algebra operators. Decorrelation.

Lecture 6: 2018-01-22 [slides] [recording]
Plan-node implementations: select, project, hash-based and sort-based grouping and aggregation, external memory sorting. Nested-loop join: support for inner join, left-outer join, semijoin, antijoin. Cost of evaluating nested-loop join: best case, worst case. Block nested-loop join.
Lecture 7: 2018-01-24 [slides] [recording]
Sort-merge join. Marking tuple-position in plan nodes. Materialize plan-node. External sorting algorithm. Hash joins: partition, build, probe. Overflow resolution, overflow avoidance. Recursive partitioning.
Lecture 8: 2018-01-26 [slides] [no recording available due to technical issues]
Plan costing goals. Plan costing and table statistics. Table/tuple-level statistics: number of tuples/blocks in a table, average size of tuple in a table. Column-level statistics: number of distinct values in a column, min/max value in a column. Selectivity of selection predicates: column-op-value, column-op-column. Complex selection predicates. Join costs, outer join costs. Project costing, grouping/aggregation costing.

Lecture 9: 2018-01-29 [slides] [no recording available due to technical issues]
Equivalence rules: selects, theta joins, grouping, outer joins. Heuristic plan optimization. Cost-based plan optimization. Exhaustive plan enumeration, guided plan enumeration. Bottom-up planning, top-down planning (branch-and-bound), Selinger-style plan optimization. Optimization and join ordering: bottom-up join optimizer, top-down join optimizer. Left-deep join orders. System-R join optimizer.

Lecture 10: 2018-02-05 [slides] [recording]
Indexes: ordered indexes, hashed indexes. Search keys. Primary indexes (aka clustering indexes), index-sequential files. Secondary indexes. Dense vs. sparse indexes. B+ tree index structure: internal and leaf nodes. Querying B+ trees. Inserting into B+ trees: splitting nodes. Sketch of B+ tree insertion algorithm.
Lecture 11: 2018-02-07 [slides] [recording]
Deleting from B+ trees: coalescing siblings, redistributing values. Coalescing leaf nodes, redistributing between sibling leaf nodes, coalescing non-leaf nodes, redistributing between sibling non-leaf nodes. Sketch of B+ tree deletion algorithm. Deletion and uniquifier attributes. Prefix compression for large string keys.
Lecture 12: 2018-02-09 [slides] [recording]
Hash index issues: bucket skew, overflow pages. Static hashing (recap). Dynamic hashing: extendable hashing, linear hashing. Extendable hashing: bucket address table, splitting adjacent buckets. Linear hashing: levels, level-based hash functions. Hash on current- and next-level hash functions. Splitting a bucket. Lookups.

Lecture 13: 2018-02-12 [slides] [recording]
Index optimizations, alternate access paths. Index scans: B+ tree costing for key/nonkey lookups. Equality lookups, range lookups. Indexes and complex selections. Join optimizations: indexed nested loop join, hybrid merge-join. Bitmap indexes.
Lecture 14: 2018-02-14 [slides] [recording]
Transaction processing. Commit and rollback/abort of a transaction. ACID properties: atomicity, consistency, isolation, durability. Serial transaction execution schedules, serializable execution schedules. Transaction states. Volatile storage, nonvolatile storage, stable storage. File-sync operation, other atomic filesystem operations. Shadow-copy strategy, shadow-paging strategy.
Lecture 15: 2018-02-16 [slides] [recording]
Deferred vs. immediate modification. Write-ahead logging. Records: start, commit, abort, update, read-only update (aka "compensation log record"). Force vs. no-force policy. Steal vs. no-steal policy. Recovery: redo phase, undo phase. Crashes during recovery, idempotence of recovery processing. Read-only transactions.