Data Mining: Learning from Large Data Sets
Many scientific and commercial applications require us to obtain insights from massive, high-dimensional data sets. In this graduate-level course, students will learn to apply, analyze and evaluate principled, state-of-the-art techniques from statistics, algorithms and discrete and convex optimization for learning from such large data sets. VVZ information is available here
||HG D 3.2
||HG D 3.2
- No classes / tutorials on 8.12.
- Videos of recorded lectures available here.
- We ask students with last names that start with letters A–M to attend the first tutorial session, while the second one is for N–Z. In case of a conflict with your schedule you may attend the other tutorial.
- To obtain the password please click here and login using your nethz account or login using the ETH VPN client and click here.
- Self-assessment questions [pdf]
- Dealing with large data (Data centers; Map-Reduce/Hadoop; Amazon Mechanical Turk)
- Fast nearest neighbor methods (Shingling, locality sensitive hashing)
- Online learning (Online optimization and regret minimization, online convex programming, applications to large-scale Support Vector Machines)
- Multi-armed bandits (exploration-exploitation tradeoffs, applications to online advertising and relevance feedback)
- Active learning (uncertainty sampling, pool-based methods, label complexity)
- Dimension reduction (random projections, nonlinear methods)
- Data streams (Sketches, coresets, applications to online clustering)
- Recommender systems
The exercise problems will include theoretical and programming problems. Please note that it is not mandatory to submit solutions. We will publish exercise solutions after one week. If you choose to submit:
- Send a soft copy of the exercise from your ethz.ch address to
firstname.lastname@example.org. This can be latex, but also a simple scan or even a picture of a hand-written solution.
- Please do not submit hard copies of your solutions.
Part of the coursework will be a project, carried out in groups of up to 3 students. The goal of this project is to get hands-on experience in machine learning tasks. The project grade will constitute 30% of the total grade. To access the project, log into the ETH network (e.g. using VPN) and then click here
The mode of examination is written, 120 minutes length. The language of examination is English. As written aids, you can bring two A4 pages (i.e. one A4 sheet of paper), either handwritten or 11 point minimum font size. The written exam will constitute 70% of the total grade.
- J. Leskovec, A. Rajaraman, J. Ullman. Mining of Massive Data Sets. Available here.
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI’04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.
- Jure Leskovec, Eric Horvitz. Planetary-Scale Views on a Large
Instant-Messaging Network. In Proceedings WWW 2008 [pdf]
- Manuel Gomez Rodriguez, Jure Leskovec, Andreas Krause. Inferring
Networks of Diffusion and Influence, In Proc. ACM Conference on
Knowledge Discovery in Databases (KDD), 2010 [pdf]
- James Hays, Alexei A. Efros. Scene Completion Using Millions of Photographs. ACM Transactions on Graphics (SIGGRAPH 2007). August 2007, vol. 26, No. 3.
- Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan and Uri Shaft. When is “Nearest Neighbor” Meaningful?. Proc. 7th International Conference on Database Theory (ICDT 1999) LNCS 1540: 217-235.
- Aristides Gionis, Piotr Indyk, Rajeev Motwani. Similarity Search
in High Dimensions via Hashing [ps]. 25th
International Conference on Very Large Databases (VLDB) , 1999
- Martin Zinkevich. Online Convex Programming and Generalized
Infinitesimal Gradient Ascent. Proc. International Conference on
Machine Learning 2003 [pdf]
- Elad Hazan, Amit Agarwal, Satyen Kale. Logarithmic Regret Algorithms for Online Convex Optimization.
Machine Learning Journal Volume 69 , Issue 2-3 Pages: 169 – 192 [pdf]
- John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research 12 (2011): 2121-2159. [pdf]
- Diederik Kingma, and Jimmy Ba. Adam: A Method for Stochastic Optimization. 3rd International Conference for Learning Representations (2015). [pdf]
- Martin Zinkevich, Markus Weimer, Alex Smola, Lihong
Li. Parallelized Stochastic Gradient Descent. Advances in Neural
Information Processing Systems 23, 2010 [pdf]
- Feng Niu, Benjamin Recht, Christopher Re, and Stephen
J. Wright. HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic
Gradient Descent. Advances in Neural
Information Processing Systems 24, 2011 [pdf]
- Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, 2012. [pdf]
- Ali Rahimi, Ben Recht. Random Features for Large-Scale Kernel
Machines. Advances in Neural
Information Processing Systems 20, 2007 [pdf]
- Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua
Zhou. Nyström Method vs Random Fourier Features: A Theoretical
and Empirical Comparison. Advances in Neural
Information Processing Systems 25, 2012 [pdf]
- Ji Zhu, Saharon Rosset, Trevor Hastie, Rob Tibshirani. L1 norm
support vector machines. Advances in Neural Information Processing
Systems 2003 [html]
- John Duchi, Shai Shalev-Shwartz, Yoram Singer, Tushar
Chandra. Efficient Projections onto the l1-Ball for Learning in
High Dimensions. ICML 2008
- Nathan Ratliff, J. Andrew (Drew) Bagnell, and Martin
Zinkevich. (Online) Subgradient Methods for Structured
Prediction. AISTATS 2007 [pdf]
- Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman. Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning. Advances in Neural Information Processing Systems 2003 [pdf]
- Simon Tong, Daphne Koller. Support Vector Machine Active Learning with Applications to Text Classification. Proceedings of the Seventeenth International Conference on Machine Learning 2000 [ps]
- Dan Feldman, Morteza Monemizadeh, Christian Sohler. A PTAS for
k-Means Clustering Based on Weak Coresets. Proc. 23th Annu. ACM Symposium on Computational Geometry (SoCG) 2007 [pdf]
- Chris Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Chapter 9
- Percy Liang, Dan Klein. Online EM for Unsupervised Models. Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) 2009 [pdf]
- Dan Feldman, Matthew Faulkner, Andreas Krause. Scalable Training of Mixture Models via Coresets. In Proc. Neural Information Processing Systems, 2011 [pdf]
- Mario Lucic, Olivier Bachem, Andreas Krause. Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures, 2015 [pdf]
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal, 2002.
- Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article
recommendation. In WWW-10, Raleigh, NC, April, 2010.
- Khalid El-Arini, Gaurav Veda, Dafna Shahaf and Carlos Guestrin. Turning Down the Noise in the Blogosphere. In Proc. of 15th International Conference on Knowledge Discovery and Data Mining (KDD 2009) [pdf]
- Andreas Krause, Daniel Golovin. Submodular Function
Maximization. Survey [pdf]
- Matthew Streeter, Daniel Golovin. An Online Algorithm for
Maximizing Submodular Functions. In Proc. 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2008 [pdf]
- Matthew Streeter, Daniel Golovin, Andreas Krause. Online Learning of Assignments. In Proc. 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 [pdf]
- Yisong Yue, Carlos Guestrin. Linear Submodular Bandits and their
Application to Diversified Retrieval. Neural Information
Processing Systems (NIPS), December, 2011 [pdf]