Suggestions for the project
- Possible project ideas (these are just suggestions - own ideas strongly encouraged):
- Implement and evaluate some nonstandard active learning algorithm (e.g., [B.3], [B.4], [B.8])
- Experiment with heuristics for active learning (e.g., SVMs / logistic regression) on some data interesting set (e.g., image classification) (e.g., [B.7])
- Implement and evaluate some nonstandard bandit algorithm (X-armed bandit [A.2], contextual bandits [A.4], [A.1] ...)
- Use submodular function optimization for Bayesian experimental design on some data set (e.g., [C.1], [C.3], [C.9]
- Try out reinforcement learning (Rmax, Optimistic Q-learning, ...) on some simple problem
- Compare different bandit algorithms (e.g., low regret algorithms like UCB with optimal solution using MDPs / Gittins indices)
- Experiment with Gaussian process optimization (e.g., [A.6], [A.7]), compare different selection heuristics
- Implement active learning for sensor placement / management ([C.11], [C.12])
- Implement and evaluate algorithms for online linear optimization (e.g., online shortest paths, [A.3], [A.8])
- Possible data sets:
- Caltech 101 Vision data set for image classification:
http://www.vision.caltech.edu/Image_Datasets/Caltech101/ - Web page latencies (benchmark for bandit problems):
https://sourceforge.net/project/showfiles.php?group_id=111175 - fMRI data ("predict cognitive state given fMRI data")
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-81/www/ - Twenty Newsgroups data (classify 20,000 articles into 20
newsgroups)
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html - A collection of data sets by Sam Roweis (Matlab format)
http://www.cs.toronto.edu/~roweis/data.html - Video data from a multimedia image retrieval competition:
http://www-nlpir.nist.gov/projects/trecvid/trecvid.data.html - Intel Berkeley sensor network data set (temperature in
buildings)
http://www-2.cs.cmu.edu/%7Eguestrin/Research/Data/ - KDD Cup data sets
http://www.kdnuggets.com/datasets/kddcup.html - Netflix prize (predict movie ratings and win $1M)
http://www.netflixprize.com/ - Tried and true UCI Repository of machine learning data sets:
http://archive.ics.uci.edu/ml/ - INRIA Pedestrian Dataset
http://pascal.inrialpes.fr/data/human/ - Visual Object Challenge
http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/
- Code:
- Inference with Gaussian processes:
http://www.gaussianprocess.org/gpml/code/matlab/doc/ - Submodular function optimization:
http://www.mathworks.com/matlabcentral/fileexchange/20504 - Bandit algorithms:
http://bandit.sourceforge.net/ - Support vector machines:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/ - Graphical models / Bayesian networks:
http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html
http://research.microsoft.com/en-us/um/cambridge/projects/infernet/ - MDPs and Reinforcement learning:
http://www.cs.ubc.ca/~murphyk/Software/MDP/mdp.html - Some code for image classification (using ""Bag-of-words" representation):
http://vision.ucla.edu/~vedaldi/code/bag/bag.html