CS122 Winter 2018-2019 - Lecture Slides and Videos
Slides are made available as PDF files. All lectures are recorded in
.mp4 format using the H.264 codec. Each recording is around 45MB in size.
The previous year's lectures
are available here.
- Lecture 1: 2019-01-07
[slides]
[recording]
-
Introduction, course overview.
Relational DB requirements.
Database architecture and components. Data dictionary.
Storage hierarchy: primary storage, secondary (online) storage, tertiary
(nearline/offline) storage.
- Lecture 2: 2019-01-09
[slides]
[recording]
-
Magnetic hard disks: access time, seek time, rotational latency time.
Disk access optimizations: buffering, read-ahead, I/O scheduling, nonvolatile
write buffers (NV-RAM).
Solid State Drives (SSD): erase blocks, write amplification, wear leveling.
Filesystem operations: open, read, write, seek, sync.
Block-level access. Record-level access: common operations, block-level
structure, non-full lists, free-space bitmaps.
- Lecture 3: 2019-01-11
[slides]
[recording]
-
Record storage format: null bitmaps, slotted-page structure. Record pointers.
Record-level file organization: heap file organization, sequential file organization and search keys, hashing file organization and static hashing, multitable clustering file organization.
External memory algorithms and data structures.
Query evaluation pipeline: SQL parsing, abstract syntax tree (AST), plan optimization, query evaluator.
- Lecture 4: 2019-01-14
[slides]
[recording]
-
-
L-values and R-values in the database.
Plan nodes, evaluation primitives (annotated plan nodes).
Materialized evaluation. Pipelined evaluation: demand-driven pipelines,
producer-driven pipelines.
Blocking operations.
SQL query translation to relational algebra: select, project, Cartesian
product, grouping/aggregation.
- Lecture 5: 2019-01-18
[slides]
-
-
SQL query translation to relational algebra: grouping/aggregation, theta join.
SQL query translation: SQL-92 join syntax.
Nested subqueries in the SELECT clause: scalar subqueries, correlated
evaluation.
Nested subqueries in the FROM clause, views.
- Lecture 6: 2019-01-23
[slides]
-
Nested subqueries in the WHERE clause: IN/NOT IN and EXISTS/NOT EXISTS.
Semijoin and antijoin relational-algebra operators. Decorrelation.
Plan-node implementations: select, project, hash-based and sort-based grouping
and aggregation.
- Lecture 7: 2019-01-25
[slides]
-
Plan-node implementations: external memory sorting.
Nested-loop join: support for inner join, left-outer join, semijoin, antijoin.
Cost of evaluating nested-loop join: best case, worst case.
Block nested-loop join.
Sort-merge join. Marking tuple-position in plan nodes. Materialize plan-node.
- Lecture 8: 2019-01-28
[slides]
-
Hash joins: partition, build, probe. Overflow resolution, overflow avoidance.
Recursive partitioning.
Plan costing goals.
Plan costing and table statistics.
Table/tuple-level statistics: number of tuples/blocks in a table,
average size of tuple in a table.
Column-level statistics: number of distinct values in a column, min/max value
in a column.
- Lecture 9: 2019-01-30
[slides]
-
Selectivity of selection predicates: column-op-value,
column-op-column. Complex selection predicates. Join costs, outer join costs.
Project costing, grouping/aggregation costing.
Equivalence rules: selects, theta joins, grouping, outer joins.
- Lecture 10: 2019-02-01
[slides]
-
Heuristic plan optimization.
Cost-based plan optimization. Exhaustive plan enumeration, guided plan
enumeration.
Bottom-up planning, top-down planning (branch-and-bound),
Selinger-style plan optimization.
Optimization and join ordering: bottom-up join optimizer, top-down join
optimizer. Left-deep join orders. System-R join optimizer.