Overview

This page contains possible data sets, ideas for questions and code that you can use for your project.  Since this is a research oriented class, it is highly encouraged to pick a project related to your own research, which is not limited by this page.   

Ideally, we prefer group sizes of 2-3 people.  Exceptions possible with instructor's permission.  Please feel free to contact either Andreas, Hongchao or Pete about project ideas.   

Data sets

Caltech data sets

  • Urban Challenge Datasets (contact Pete Trautman):
    • High fidelity GPS vehicle trajectories.
    • Ladar scans.
    • Possible (but probably a little difficult to recover): stereo, video.
  • Exercise physiology data (contact Pete Trautman)
    • Generated by John Doyle's group, athletes are asked to ride a stationary bike under various conditions. Heart rate, wattage output, breathing rate, and gas exchange data are recorded.
  • Fly data (contact Pete Trautman)
    • High resolution data of fly activities. Using background subtraction, fly positions are recorded over a fixed time interval. Various positive and negative attractions are placed in the fly arena, to encourage certain types of behavior.
  • JPL data sets (contact Pete Trautman)
    • orbital remote sensing imagery of mars to predict areas of high danger to rovers; some of the data is truthed--that is, how much slippage actually occurred during actual rover trajectories.'
    • Use rover slip data to estimate parameters of soil mechanics models
    • Video truthed people tracks
    • UAV fly over data, with annotated lakes and buildings.
    • Data for visual SLAM
    • Person segmentation- there is a data set of people walking, which have been annotated and have bounding boxes around them. 
  • LDPC data sets (contact Hongchao Zhou)
    • Parity-check matrix for an LDPC code.
    • Receiverd signal.

Image & Video data

Neuroscience & Physiology data

Collaborative prediction data

Sensor network data

NLP & Text data

Network data

Other sources of data

Project ideas

Caltech data related ideas

  • Do structured-prediction to predict slip in rover data (e.g., using conditional random fields). Compare prediction to actual parametric models estimated from slip data.
  • Activity recognition of fly data (e.g., using hierarchical conditional random fields). E.g.:
    • Location-Based Activity Recognition. L. Liao, D. Fox, and H. Kautz. NIPS-05.
    • Learning and Inferring Transportation Routines. L. Liao, D. Fox, and H. Kautz. AAAI-04.
  • Clustering/segmentation of Ladar scans, video, GPS trajectories using graphical methods.
  • Compare graphical model methods with classical model ID methods to analyze the physiology data.
  • Apply approximate inference methods to fly data, SLAM data, or visual SLAM data to do tracking, data association, multitarget tracking, etc.
  • Compare different approximate inference techniques (loopy BP, variational inference, ...) for coding theory (LDPC codes)

Learning and Modeling

  • Compare constraint-based (e.g., using independence tests) and score based algorithms for structure learning.
  • Implement algorithms for structure learning of undirected graphical models. E.g., based on L1 regularization (e.g., Ravikumar et al, NIPS '07, NIPS '08)
  • Experiment with Bayesian model averaging (e.g., using sampling)
  • Compare conditional random fields with generative models (directed or undirected) on some learning task
  • Compare Max-margin Markov Nets [Taskar et al NIPS '03] with Conditional Random Fields [ICML '01]

Inference

  • Compare different techniques for exact inference (in terms of complexity, ...)
    • Junction tree inference
    • Bucket elimination
    • Recursive conditioning
    • Algebraic circuits
  • Compare different techniques for approximate inference
    • Variational inference (structured mean field, etc.)
    • Generalized belief propagation
    • Sampling (MCMC / Gibbs /...)
    • Preconditioning based inference (Ravikumar et al NIPS '05)
  • MAP inference
    • Compare exact techniques (e.g., using graph cuts; junction trees with low-treewidth models) with approximate techniques (Max-product, LP relaxations...)
  • Compare different algorithms for Bayesian filtering in dynamical models.
    • Assumed Density filtering
    • Particle filtering. Rao blackwellization for data association, for "two streams" hypothesis, for slam benchmarks
    • Ensemble Kalman filters
  • Compare algebraic circuits and Bayes nets in their ability to represent different data sets (e.g., how much compression do we get by representing a Bayes net as an arithmetic circuit)
  • Compare Gaussian graphical models with inferred sparse precision matrix with Gaussian processes for spatial data.

Applications

  • Fault detection in sensor networks (e.g., automatic data cleaning based on detecting outliers exploiting correlation)
  • Experiment with different models for image segmentation / foreground-background classification
  • Compare probabilistic context free grammars with Hidden Markov models for parsing
  • Model-identification for physiological data (e.g., using Gaussian Processes)

Structured models

  • Learn a simplified class of relational models (e.g., no existence uncertainty)
  • Link prediction / collaborative filtering. (1) E.g., compare matrix factorization techniques with loopy BP in factor graph. (2) Given data about part of a graph, predict presence of edges
  • Experiment with topic models (e.g., LDA) on some interesting data set relevant to your research
  • Apply non-parametric Bayesian clustering (e.g., using Hierarchical Dirichlet Processes) on some data set
    • in particular on the fly data, on the GPS Alice data, on the pedestrian data
    • For data association
    • for the Motion segmentation problem/layered generative modeling

Other

Code