Data Mining: Learning from Large Data Sets

Many scientific and commercial applications require us to obtain insights from massive, high-dimensional data sets. In this graduate-level course, students will learn to apply, analyze and evaluate principled, state-of-the-art techniques from statistics, algorithms and discrete and convex optimization for learning from such large data sets. VVZ information is available here.
Instructor Prof. Andreas Krause
Head TA Josip Djolonga
Assistants Olivier Bachem, Carlos Cotrini Jimenez, Johannes Kirschner, Kfir Levy, Mario Lučić
Mailing List If you have any questions please send them to from your address.
Thu 13-15 HG F 3
Thu 15-16 HG D 3.2
Thu 16-17 HG D 3.2

  • No classes / tutorials on 8.12.
  • Videos of recorded lectures available here.
  • We ask students with last names that start with letters A–M to attend the first tutorial session, while the second one is for N–Z. In case of a conflict with your schedule you may attend the other tutorial.
  • To obtain the password please click here and login using your nethz account or login using the ETH VPN client and click here.
  • Self-assessment questions [pdf]
  • Dealing with large data (Data centers; Map-Reduce/Hadoop; Amazon Mechanical Turk)
  • Fast nearest neighbor methods (Shingling, locality sensitive hashing)
  • Online learning (Online optimization and regret minimization, online convex programming, applications to large-scale Support Vector Machines)
  • Multi-armed bandits (exploration-exploitation tradeoffs, applications to online advertising and relevance feedback)
  • Active learning (uncertainty sampling, pool-based methods, label complexity)
  • Dimension reduction (random projections, nonlinear methods)
  • Data streams (Sketches, coresets, applications to online clustering)
  • Recommender systems
The exercise problems will include theoretical and programming problems. Please note that it is not mandatory to submit solutions. We will publish exercise solutions after one week. If you choose to submit:
  • Send a soft copy of the exercise from your address to This can be latex, but also a simple scan or even a picture of a hand-written solution.
  • Please do not submit hard copies of your solutions.
Part of the coursework will be a project, carried out in groups of up to 3 students. The goal of this project is to get hands-on experience in machine learning tasks. The project grade will constitute 30% of the total grade. To access the project, log into the ETH network (e.g. using VPN) and then click here.
The mode of examination is written, 120 minutes length. The language of examination is English. As written aids, you can bring two A4 pages (i.e. one A4 sheet of paper), either handwritten or 11 point minimum font size. The written exam will constitute 70% of the total grade.
Old Exams
Text Book
  • J. Leskovec, A. Rajaraman, J. Ullman. Mining of Massive Data Sets. Available here.
  • Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI’04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.
  • Jure Leskovec, Eric Horvitz. Planetary-Scale Views on a Large Instant-Messaging Network. In Proceedings WWW 2008 [pdf]
  • Manuel Gomez Rodriguez, Jure Leskovec, Andreas Krause. Inferring Networks of Diffusion and Influence, In Proc. ACM Conference on Knowledge Discovery in Databases (KDD), 2010 [pdf]
  • James Hays, Alexei A. Efros. Scene Completion Using Millions of Photographs. ACM Transactions on Graphics (SIGGRAPH 2007). August 2007, vol. 26, No. 3.
  • Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan and Uri Shaft. When is “Nearest Neighbor” Meaningful?. Proc. 7th International Conference on Database Theory (ICDT 1999) LNCS 1540: 217-235.
  • Aristides Gionis, Piotr Indyk, Rajeev Motwani. Similarity Search in High Dimensions via Hashing [ps]. 25th International Conference on Very Large Databases (VLDB) , 1999
  • Martin Zinkevich. Online Convex Programming and Generalized Infinitesimal Gradient Ascent. Proc. International Conference on Machine Learning 2003 [pdf]
  • Elad Hazan, Amit Agarwal, Satyen Kale. Logarithmic Regret Algorithms for Online Convex Optimization. Machine Learning Journal Volume 69 , Issue 2-3 Pages: 169 – 192 [pdf]
  • John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research 12 (2011): 2121-2159. [pdf]
  • Diederik Kingma, and Jimmy Ba. Adam: A Method for Stochastic Optimization. 3rd International Conference for Learning Representations (2015). [pdf]
  • Martin Zinkevich, Markus Weimer, Alex Smola, Lihong Li. Parallelized Stochastic Gradient Descent. Advances in Neural Information Processing Systems 23, 2010 [pdf]
  • Feng Niu, Benjamin Recht, Christopher Re, and Stephen J. Wright. HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. Advances in Neural Information Processing Systems 24, 2011 [pdf]
  • Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, 2012. [pdf]
  • Ali Rahimi, Ben Recht. Random Features for Large-Scale Kernel Machines. Advances in Neural Information Processing Systems 20, 2007 [pdf]
  • Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou. Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison. Advances in Neural Information Processing Systems 25, 2012 [pdf]
  • Ji Zhu, Saharon Rosset, Trevor Hastie, Rob Tibshirani. L1 norm support vector machines. Advances in Neural Information Processing Systems 2003 [html]
  • John Duchi, Shai Shalev-Shwartz, Yoram Singer, Tushar Chandra. Efficient Projections onto the l1-Ball for Learning in High Dimensions. ICML 2008 [pdf]
  • Nathan Ratliff, J. Andrew (Drew) Bagnell, and Martin Zinkevich. (Online) Subgradient Methods for Structured Prediction. AISTATS 2007 [pdf]
  • Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman. Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning. Advances in Neural Information Processing Systems 2003 [pdf]
  • Simon Tong, Daphne Koller. Support Vector Machine Active Learning with Applications to Text Classification. Proceedings of the Seventeenth International Conference on Machine Learning 2000 [ps]
  • Dan Feldman, Morteza Monemizadeh, Christian Sohler. A PTAS for k-Means Clustering Based on Weak Coresets. Proc. 23th Annu. ACM Symposium on Computational Geometry (SoCG) 2007 [pdf]
  • Chris Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Chapter 9
  • Percy Liang, Dan Klein. Online EM for Unsupervised Models. Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) 2009 [pdf]
  • Dan Feldman, Matthew Faulkner, Andreas Krause. Scalable Training of Mixture Models via Coresets. In Proc. Neural Information Processing Systems, 2011 [pdf]
  • Mario Lucic, Olivier Bachem, Andreas Krause. Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures, 2015 [pdf]
  • Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal, 2002. [pdf]
  • Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW-10, Raleigh, NC, April, 2010. [pdf]
  • Khalid El-Arini, Gaurav Veda, Dafna Shahaf and Carlos Guestrin. Turning Down the Noise in the Blogosphere. In Proc. of 15th International Conference on Knowledge Discovery and Data Mining (KDD 2009) [pdf]
  • Andreas Krause, Daniel Golovin. Submodular Function Maximization. Survey [pdf]
  • Matthew Streeter, Daniel Golovin. An Online Algorithm for Maximizing Submodular Functions. In Proc. 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2008 [pdf]
  • Matthew Streeter, Daniel Golovin, Andreas Krause. Online Learning of Assignments. In Proc. 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 [pdf]
  • Yisong Yue, Carlos Guestrin. Linear Submodular Bandits and their Application to Diversified Retrieval. Neural Information Processing Systems (NIPS), December, 2011 [pdf]