CS122 Winter 2017-2018 - Lecture Slides and Videos
Slides are made available as PDF files. All lectures are recorded in
.mp4 format using the H.264 codec. Each recording is around 45MB in size.
- Lecture 1: 2018-01-08
[slides]
[recording]
-
Introduction, course overview.
Relational DB requirements.
Database architecture and components. Data dictionary.
Storage hierarchy: primary storage, secondary (online) storage, tertiary
(nearline/offline) storage.
Magnetic hard disks: access time, seek time, rotational latency time.
Disk access optimizations: buffering, read-ahead, I/O scheduling, nonvolatile
write buffers (NV-RAM).
- Lecture 2: 2018-01-10
[slides]
[recording]
-
Solid State Drives (SSD): erase blocks, write amplification, wear leveling.
Filesystem operations: open, read, write, seek, sync.
Block-level access. Record-level access: common operations, block-level
structure, non-full lists.
Record storage format: null bitmaps, slotted-page structure. Record pointers.
- Lecture 3: 2018-01-12
[slides]
[recording]
-
Record-level file organization: heap file organization, sequential file organization and search keys, hashing file organization and static hashing, multitable clustering file organization.
External memory algorithms and data structures.
Query evaluation pipeline: SQL parsing, abstract syntax tree (AST), plan optimization, query evaluator.
L-values and R-values in the database.
- Lecture 4: 2018-01-17
[slides]
[recording]
-
-
Plan nodes, evaluation primitives (annotated plan nodes).
Materialized evaluation. Pipelined evaluation: demand-driven pipelines,
producer-driven pipelines.
Blocking operations.
SQL query translation to relational algebra: select, project, Cartesian
product, grouping/aggregation, theta join.
- Lecture 5: 2018-01-19
[slides]
[recording]
-
-
SQL query translation: SQL-92 join syntax.
Nested subqueries in the SELECT clause: scalar subqueries, correlated
evaluation.
Nested subqueries in the FROM clause, views.
Nested subqueries in the WHERE clause: IN/NOT IN and EXISTS/NOT EXISTS.
Semijoin and antijoin relational-algebra operators. Decorrelation.
- Lecture 6: 2018-01-22
[slides]
[recording]
-
Plan-node implementations: select, project, hash-based and sort-based grouping
and aggregation, external memory sorting.
Nested-loop join: support for inner join, left-outer join, semijoin, antijoin.
Cost of evaluating nested-loop join: best case, worst case.
Block nested-loop join.
- Lecture 7: 2018-01-24
[slides]
[recording]
-
Sort-merge join. Marking tuple-position in plan nodes. Materialize plan-node.
External sorting algorithm.
Hash joins: partition, build, probe. Overflow resolution, overflow avoidance.
Recursive partitioning.
- Lecture 8: 2018-01-26
[slides]
[no recording available due to technical issues]
-
Plan costing goals.
Plan costing and table statistics.
Table/tuple-level statistics: number of tuples/blocks in a table,
average size of tuple in a table.
Column-level statistics: number of distinct values in a column, min/max value
in a column.
Selectivity of selection predicates: column-op-value,
column-op-column. Complex selection predicates. Join costs, outer join costs.
Project costing, grouping/aggregation costing.
- Lecture 9: 2018-01-29
[slides]
[no recording available due to technical issues]
-
Equivalence rules: selects, theta joins, grouping, outer joins.
Heuristic plan optimization.
Cost-based plan optimization. Exhaustive plan enumeration, guided plan
enumeration.
Bottom-up planning, top-down planning (branch-and-bound),
Selinger-style plan optimization.
Optimization and join ordering: bottom-up join optimizer, top-down join
optimizer. Left-deep join orders. System-R join optimizer.
- Lecture 10: 2018-02-05
[slides]
[recording]
-
Indexes: ordered indexes, hashed indexes. Search keys. Primary indexes
(aka clustering indexes), index-sequential files. Secondary indexes.
Dense vs. sparse indexes. B+ tree index structure: internal and leaf nodes.
Querying B+ trees. Inserting into B+ trees: splitting nodes. Sketch of
B+ tree insertion algorithm.
- Lecture 11: 2018-02-07
[slides]
[recording]
-
Deleting from B+ trees: coalescing siblings, redistributing values.
Coalescing leaf nodes, redistributing between sibling leaf nodes, coalescing
non-leaf nodes, redistributing between sibling non-leaf nodes.
Sketch of B+ tree deletion algorithm.
Deletion and uniquifier attributes.
Prefix compression for large string keys.
- Lecture 12: 2018-02-09
[slides]
[recording]
-
Hash index issues: bucket skew, overflow pages. Static hashing (recap).
Dynamic hashing: extendable hashing, linear hashing.
Extendable hashing: bucket address table, splitting adjacent buckets.
Linear hashing: levels, level-based hash functions. Hash on current- and
next-level hash functions. Splitting a bucket. Lookups.
- Lecture 13: 2018-02-12
[slides]
[recording]
-
Index optimizations, alternate access paths.
Index scans: B+ tree costing for key/nonkey lookups. Equality lookups,
range lookups. Indexes and complex selections.
Join optimizations: indexed nested loop join, hybrid merge-join.
Bitmap indexes.
- Lecture 14: 2018-02-14
[slides]
[recording]
-
Transaction processing. Commit and rollback/abort of a transaction.
ACID properties: atomicity, consistency, isolation, durability.
Serial transaction execution schedules, serializable execution schedules.
Transaction states.
Volatile storage, nonvolatile storage, stable storage.
File-sync operation, other atomic filesystem operations.
Shadow-copy strategy, shadow-paging strategy.
- Lecture 15: 2018-02-16
[slides]
[recording]
-
Deferred vs. immediate modification. Write-ahead logging. Records: start,
commit, abort, update, read-only update (aka "compensation log record").
Force vs. no-force policy. Steal vs. no-steal policy.
Recovery: redo phase, undo phase. Crashes during recovery, idempotence of
recovery processing. Read-only transactions.