How can we gain insights from massive data sets?

Many scientific and commercial applications require us to obtain insights from massive, high-dimensional data sets. In particular, in this course we will study:

  • Online learning: How can we learn when we cannot fit the training data into memory? We will cover no regret online algorithms; bandit algorithms; sketching and dimension reduction.
  • Active learning: How should we choose few expensive labels to best utilize massive unlabeled data? We will cover active learning algorithms, learning theory and label complexity.
  • Nonparametric learning on large data: How can we let complexity of classifiers grow in a principled manner with data set size? We will cover large-­scale kernel methods; Gaussian process regression, classification, optimization and active set methods.


  • Homework 3 online! Due Mar 8
  • Poster session: March 17 2:30pm-4pm, Annenberg 2nd floor atrium
  • No class on Feb 15 (President's Day)
  • Homework 2 online! Due Feb 19
  • Project ideas online!
  • If you're looking for a project partner, please email Deb Ray.
  • We will be meeting in 213 Annenberg.


  • Instructors:
    • Andreas Krause (krausea [at] caltech.edu)
    • Daniel Golovin (dgolovin [at] caltech.edu)
  • Teaching Assistants:
    • Deb Ray (dray [at] caltech.edu)
  • Time and Location: Winter ’09/’10, Mon & Wed 2:30pm-4pm in 213 Annenberg
  • Prerequisites: Learning Systems (CS/CNS/EE 156a) or permission by instructor
  • 9 Units (3-0-6): Grading based on
    • Homework assignments (50 %)
    • Class project (50 %)
  • Office hours:
    • Andreas Krause: Monday 4pm-5:30pm in 300 Annenberg
    • Daniel Golovin: Wednesday 4pm-5pm in 114 Annenberg, or by appointment
    • Deb Ray: Wednesday 4pm-5:30pm in 243 Annenberg
  • Collaboration policy:
    • Homeworks: Discussion of the problems is allowed, however, everybody must turn in their own solutions.
    • Project: Groups of 2-3 students (exceptions possible with instructor's permission)
  • Late day policy: Everybody has 3 late days that can be used for any homework assignment (e.g., you can be later for HW2 for 2 days and HW1 for 1 day). No credits when turning in after the late days have been used. Late days can not be used for the project proposal and reports. Start early!


Homeworks submitted by email should be sent to your friendly TA, Deb Ray.
  • Homework 1 [pdf] [zip file with starter code]  Due Feb 1.
  • Homework 2 [pdf]. Due Feb 19.
  • Homework 3 [pdf]. Due Mar 8.


  • Proposals (1-2 pages)  due Jan 22. Please clearly specify
    • What is the idea of this project?
    • Who will you be collaborating with?
    • What data will you use?  Will you need time "cleaning up" the data?
    • What code will you need to write?  What existing code are you planning to use?
    • What references are relevant?  Mention 1-3 related papers.
    • What are you planning to accomplish by the milestone?
    • See suggestions
  • Project milestone due  Feb 15
    • Roughly half the work should be completed.
    • What has already been accomplished?
    • What are you planning to do until the end of the term?
    • 4 pages in NIPS format 
  • Final project writeup due March 19

Lecture notes

Scribing? Get the template.
  • Jan 4 - Introduction [PDF]
  • Jan 6 - Online Learning/Optimization #1: Weighted Majority, Hedge [PDF]. Also see [L.1]
  • Jan 11 - Online Learning/Optimization #2: Online Convex Programming [PDF]. Also see [L.6]
  • Jan 13 - Online Learning/Optimization #3: Online SVM [PDF]
  • Jan 20 - Online Learning/Optimization #4: More data => faster learning [PDF]. Also see [L.9]
  • Jan 25 - Dimensionality Reduction Techniques [PDF]
  • Jan 27 - Dealing with Partial Observability/Feedback #1:
                epsilon-Greedy and UCB1. [PDF]. Also see [A.9]
  • Feb 1 - Dealing with Partial Observability/Feedback #2:
                unbiased estimators for fun and profit. [PDF]
  • Feb 3 - Active Learning #1: Background and Pool-based learning [PDF]
  • Feb 8 - Active Learning #2: the myopic version-space shrinking strategy [PDF] Also see [B.1]
  • Feb 10 - Active Learning #3: When precisely does active learning help? Also see [B.6] [PDF]
  • Feb 17 - Nonparametric learning #1: Kernelization [PDF] Also see [N.1]
  • Feb 22 - Nonparametric learning #2: RKHS [PDF] Also see [N.1]
  • Feb 24 - Nonparametric learning #3: Gaussian Processes [PDF] Also see [N.2]
  • Mar 1 - Nonparametric learning #4: Gaussian Process Regression [PDF] [N.2]
  • Mar 3 - Nonparametric learning #5: Hyperparameter Optimization and Classification [PDF] Also see [N.2]
  • Mar 8 - Nonparametric learning #6: Active Learning in GPs [PDF] [N.2]
  • Mar 10 - Nonparametric learning #7: GP Optimization; Active Set methods (coming soon) [N.2]

Relevant Readings 

Online Learning and Optimization (in the full information model)

Online Learning and Optimization (in the bandit feedback model)

  • [A.1] S. Pandey, D. Chakrabarti, D. Agrawal. "Multi-armed Bandit Problems with Dependent Arms" ICML 2006
  • [A.2] S. Bubeck, R. Munos, G. Stoltz, C. Szepesvari. "Online Optimization in X-Armed Bandits" NIPS 2008. 
  • [A.3] P. Auer et al. The non-stochastic multi-armed bandit problem. The paper that introduced the Exp3 family of algorithms.
  • [A.4] J. Langford and T. Zhang. "The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits" NIPS 2007. 
  • [A.5] R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma. "Regret bounds for sleeping experts and bandits" COLT 2008. 
  • [A.6] D. Lizotte, T. Wang, M. Bowling, D. Schuurmans. "Automatic Gait Optimization with Gaussian Process Regression" IJCAI 2007
  • [A.7] R. Martinez-Cantin, N. de Freitas, A. Doucet and J. Castellanos. "Active Policy Learning for Robot Planning and Exploration under Uncertainty" RSS 2007. 
  • [A.8] B. McMahan, A. Blum. "Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary", COLT 2004. 
  • [A.9] P. Auer, N. Cesa-Bianchi, P. Fischer. "Finite-time Analysis of the Multiarmed Bandit Problem", Machine Learning, 2002. The paper that introduced the UCB family of algorithms.
  • [A.10] P. Auer, R. Ortner. "Logarithmic Online Regret Bounds for Undiscounted
    Reinforcement Learning", NIPS 2006
  • [A.11] A. Flaxman, A. Kalai, H. B. McMahan, "Online convex optimization in the bandit setting: gradient descent without a gradient", SODA 2005. Presents an algorithm similar to Zinkevich's but for the partial information (bandit) feedback model. Also see [A.8].

Dimension Reduction

Active Learning

  • [B.1] S. Dasgupta. "Analysis of a greedy active learning strategy" NIPS 2004
  • [B.2] M. Balcan, S. Hanneke, J. Wortmann. "The true sample complexity of Active Learning", COLT 2008. 
  • [B.3] S. Dasgupta, D.J. Hsu. "Hierarchical sampling for active learning", ICML 2008. 
  • [B.4] R. Castro, C. Kalish, R. Nowak, R. Qian, T. Rogers, X. Zhu. "Human Active Learning", NIPS 2008. 
  • [B.5] R. Castro, R. Nowak. "Minimax Bounds for Active Learning", COLT 2007
  • [B.6] S. Dasgupta. "Coarse Sample complexity bounds for active learning", NIPS 2005. 
  • [B.7] S. Tong, D. Koller. "Support Vector Machine Active Learning with Applications to Text Classification." JMLR 2001. 
  • [B.8] S. Dasgupta, D. Hsu, C. Monteleoni. "A General Agnostic Active Learning Algorithm", NIPS 2007. 
  • [C.1] M. Streeter, D. Golovin "An Online Algorithm for Maximizing Submodular Functions" NIPS 2008
  • [C.2] F. Radlinski, R. Kleinberg, T. Joachims. "Learning diverse rankings with multi-armed bandits" ICML 2008. 
  • [C.3]  S. Hoi, R. Jin, J. Zhu, M. Lyu "Batch mode active learning and its application to medical image classification", ICML 2006. 
  • [C.7] M. Seeger, H. Nickisch "Compressed Sensing and Bayesian Experimental Design" ICML 2008. 
  • [C.9] A. Krause, C. Guestrin, A. Gupta, J. Kleinberg. "Near-optimal sensor placements in Gaussian Processes: Maximizing Information while Minimizing Communication Cost", IPSN 2006
  • [C.11] J. Williams, J. Fisher III,  A. Willsky.  "Performance guarantees for information-theoretic active inference", AISTATS '07. 

Nonparametric Learning

  • [N.1] B. Schölkopf, A. Smola. "Learning with Kernels" MIT Press 2002. Chapters 2 and 4
  • [N.2] C. Rasmussen, C. Williams. "Gaussian Processes in Machine Learning" MIT Press 2006. Available online, free of charge

Some other courses with overlapping content