Overview
How can we gain insights from massive data sets?
Many
scientific
and
commercial
applications
require
us
to
obtain
insights
from
massive,
high-dimensional
data
sets.
In
particular,
in
this
course
we
will
study:
- Online learning: How can we learn when we cannot fit the training data into memory? We will cover no regret online algorithms; bandit algorithms; sketching and dimension reduction.
- Active learning: How should we choose few expensive labels to best utilize massive unlabeled data? We will cover active learning algorithms, learning theory and label complexity.
- Nonparametric learning on large data: How can we let complexity of classifiers grow in a principled manner with data set size? We will cover large-scale kernel methods; Gaussian process regression, classification, optimization and active set methods.
News
- Homework 3 online! Due Mar 8
- Poster session: March 17 2:30pm-4pm, Annenberg 2nd floor atrium
- No class on Feb 15 (President's Day)
- Homework 2 online! Due Feb 19
- Project ideas online!
- If you're looking for a project partner, please email Deb Ray.
- We will be meeting in 213 Annenberg.
Details
- Instructors:
- Andreas Krause (krausea [at] caltech.edu)
- Daniel Golovin (dgolovin [at] caltech.edu)
- Teaching Assistants:
- Deb Ray (dray [at] caltech.edu)
- Time and Location: Winter
’09/’10, Mon
& Wed 2:30pm-4pm in 213 Annenberg
- Prerequisites: Learning Systems (CS/CNS/EE 156a) or permission by instructor
- 9 Units (3-0-6): Grading based on
- Homework assignments (50 %)
- Class project (50 %)
- Office hours:
- Andreas Krause: Monday 4pm-5:30pm in 300 Annenberg
- Daniel Golovin: Wednesday 4pm-5pm in 114 Annenberg, or by appointment
- Deb Ray: Wednesday 4pm-5:30pm in 243 Annenberg
- Collaboration policy:
- Homeworks: Discussion of the problems is allowed, however, everybody must turn in their own solutions.
- Project: Groups of 2-3 students (exceptions possible with instructor's permission)
- Late day policy: Everybody has 3 late days that can be used for any homework assignment (e.g., you can be later for HW2 for 2 days and HW1 for 1 day). No credits when turning in after the late days have been used. Late days can not be used for the project proposal and reports. Start early!
Homeworks
Homeworks submitted by email should be sent to your friendly TA, Deb Ray.- Homework 1 [pdf] [zip file with starter code] Due Feb 1.
- Homework 2 [pdf]. Due Feb 19.
- Homework 3 [pdf]. Due Mar 8.
Project
- Proposals (1-2 pages) due
Jan 22. Please clearly
specify
- What is the idea of this project?
- Who will you be collaborating with?
- What data will you use? Will you need time "cleaning up" the data?
- What code will you need to write? What existing code are you planning to use?
- What references are relevant? Mention 1-3 related papers.
- What are you planning to accomplish by the milestone?
- See suggestions
- Project milestone due Feb 15
- Roughly half the work should be completed.
- What has already been accomplished?
- What are you planning to do until the end of the term?
- 4 pages in NIPS format
- Final project writeup due March 19
- 8 pages in NIPS format
Lecture notes
Scribing? Get the template.- Jan 4 - Introduction [PDF]
- Jan 6 - Online Learning/Optimization #1: Weighted Majority, Hedge [PDF]. Also see [L.1]
- Jan 11 - Online Learning/Optimization #2: Online Convex
Programming [PDF]. Also see [L.6]
- Jan 13 - Online Learning/Optimization #3: Online SVM [PDF]
- Jan 20 - Online Learning/Optimization #4: More data => faster learning [PDF]. Also see [L.9]
- Jan 25 - Dimensionality Reduction Techniques [PDF]
- Jan 27 - Dealing with Partial Observability/Feedback #1:
epsilon-Greedy and UCB1. [PDF]. Also see [A.9] - Feb 1 - Dealing with Partial Observability/Feedback #2:
unbiased estimators for fun and profit. [PDF] - Feb 3 - Active Learning #1: Background and Pool-based learning [PDF]
- Feb 8 - Active Learning #2: the myopic version-space shrinking strategy [PDF] Also see [B.1]
- Feb 10 - Active Learning #3: When precisely does active learning help? Also see [B.6] [PDF]
- Feb 17 - Nonparametric learning #1: Kernelization [PDF] Also see [N.1]
- Feb 22 - Nonparametric learning #2: RKHS [PDF] Also see [N.1]
- Feb 24 - Nonparametric learning #3: Gaussian Processes [PDF] Also see [N.2]
- Mar 1 - Nonparametric learning #4: Gaussian Process Regression [PDF] [N.2]
- Mar 3 - Nonparametric learning #5: Hyperparameter Optimization and Classification [PDF] Also see [N.2]
- Mar 8 - Nonparametric learning #6: Active Learning in GPs [PDF] [N.2]
- Mar 10 - Nonparametric learning #7: GP Optimization; Active Set methods (coming soon) [N.2]
Relevant Readings
Online Learning and Optimization (in the full information model)
- [L.1] N. Littlestone, M. Warmuth. The weighted majority algorithm. Information and Computation 108(2):212-261, 1994.
- [L.2] Cesa-Bianchi et al. How to Use Expert Advice, JACM 1997. More detail on binary classification in the experts paradigm.
- [L.3] Y. Freund, R. Schapire. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, JCSS 1997. The paper that introduced the Hedge algorithm.
- [L.4] Freund et al. Using and combining predictors that specialize STOC 1997. Addresses the sleeping experts problem
- [L.5] A. Blum, Y. Mansour. From External to Internal Regret, JMLR 2007. Generalizes from sleeping experts to time-selection functions, among other results
- [L.6] M. Zinkevich. Online Convex Programming and Generalized Infinitesimal Gradient Ascent ICML 2003.
- [L.7] Logarithmic Regret Algorithms for Online Convex Optimization by Hazan et al. Improved analysis of Zinkevich's algorithm and two other algorithms, showing logarithmic (in T) regret for strictly convex cost functions
- [L.8] A. Kalai, S. Vempala. Efficient
Algorithms for Online Decision Problems, JCSS 2005.
Another powerful approach to online decision problems,
perturbed-follow-the-leader, is presented here.
- [L.9] S. Shalev-Shwartz, N. Srebro. SVM Optimization: Inverse Dependence on Training Set Size ICML 2008
Online Learning and Optimization (in the bandit feedback model)
- [A.1] S. Pandey, D. Chakrabarti, D. Agrawal. "Multi-armed Bandit Problems with Dependent Arms" ICML 2006
- [A.2] S. Bubeck, R. Munos, G. Stoltz, C.
Szepesvari. "Online Optimization in X-Armed
Bandits" NIPS 2008.
- [A.3] P. Auer et al. The non-stochastic multi-armed bandit problem. The paper that introduced the Exp3 family of algorithms.
- [A.4] J. Langford and T. Zhang. "The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits" NIPS 2007.
- [A.5] R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma. "Regret bounds for sleeping experts and bandits" COLT 2008.
- [A.6] D. Lizotte, T. Wang, M. Bowling, D. Schuurmans. "Automatic Gait Optimization with Gaussian Process Regression" IJCAI 2007
- [A.7] R. Martinez-Cantin, N. de Freitas, A. Doucet and J. Castellanos. "Active Policy Learning for Robot Planning and Exploration under Uncertainty" RSS 2007.
- [A.8] B. McMahan, A. Blum. "Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary", COLT 2004.
- [A.9] P. Auer, N. Cesa-Bianchi, P. Fischer. "Finite-time Analysis of the Multiarmed Bandit Problem", Machine Learning, 2002. The paper that introduced the UCB family of algorithms.
- [A.10] P. Auer, R. Ortner. "Logarithmic Online Regret
Bounds for Undiscounted
Reinforcement Learning", NIPS 2006 - [A.11] A. Flaxman, A. Kalai, H. B. McMahan, "Online convex optimization in the bandit setting: gradient descent without a gradient", SODA 2005. Presents an algorithm similar to Zinkevich's but for the partial information (bandit) feedback model. Also see [A.8].
Dimension Reduction
- [D.1] L. Saul, K. Weinberger, F. Sha, J. Ham, D. Lee. "Spectral Methods for Dimensionality Reduction"
- [D.2] J. Tenenbaum, V. de Silva, J. Langford. "A Global Geometric Framework for Nonlinear
Dimensionality Reduction", Science 290, 2000 - [D.3] S. Roweis, L. Saul. "Nonlinear Dimensionality Reduction by Locally Linear Embedding", Science 290, 2000
Active Learning
- [B.1] S. Dasgupta. "Analysis of a greedy active learning strategy" NIPS 2004
- [B.2] M. Balcan, S. Hanneke, J. Wortmann. "The true sample complexity of Active Learning", COLT 2008.
- [B.3] S. Dasgupta, D.J. Hsu. "Hierarchical sampling for active learning", ICML 2008.
- [B.4] R. Castro, C. Kalish, R. Nowak, R. Qian, T. Rogers, X. Zhu. "Human Active Learning", NIPS 2008.
- [B.5] R. Castro, R. Nowak. "Minimax Bounds for Active Learning", COLT 2007
- [B.6] S. Dasgupta. "Coarse Sample complexity bounds for active learning", NIPS 2005.
- [B.7] S. Tong, D. Koller. "Support Vector Machine Active Learning with Applications to Text Classification." JMLR 2001.
- [B.8] S. Dasgupta, D. Hsu, C. Monteleoni. "A General Agnostic Active Learning Algorithm", NIPS 2007.
- [C.1] M. Streeter, D. Golovin "An Online Algorithm for Maximizing Submodular Functions" NIPS 2008
- [C.2] F. Radlinski, R. Kleinberg, T. Joachims. "Learning diverse rankings with multi-armed bandits" ICML 2008.
- [C.3] S. Hoi, R. Jin, J. Zhu, M. Lyu "Batch mode active learning and its application to medical image classification", ICML 2006.
- [C.7] M. Seeger, H. Nickisch "Compressed Sensing and Bayesian Experimental Design" ICML 2008.
- [C.9] A. Krause, C. Guestrin, A. Gupta, J. Kleinberg. "Near-optimal sensor placements in Gaussian Processes: Maximizing Information while Minimizing Communication Cost", IPSN 2006
- [C.11] J. Williams, J. Fisher III, A. Willsky. "Performance guarantees for information-theoretic active inference", AISTATS '07.
Nonparametric Learning
- [N.1] B. Schölkopf, A. Smola. "Learning with Kernels" MIT Press 2002. Chapters 2 and 4
- [N.2] C. Rasmussen, C. Williams. "Gaussian Processes in Machine Learning" MIT Press 2006. Available online, free of charge
Some other courses with overlapping content
- Avrim Blum's introductory
graduate level and advanced
machine learning courses.
- Robert Kleinberg's course on Learning,
Games, and Electronic Markets