Data Mining: Learning from Large Data Sets

Many scientific and commercial applications require us to obtain insights from massive, high-dimensional data sets. In this graduate-level course, students will learn to apply, analyze and evaluate principled, state-of-the-art techniques from statistics, algorithms and discrete and convex optimization for learning from such large data sets. VVZ information is available here.
Syllabus
News
  • Our Piazza course page is now active. See details below.
  • Video recordings of the lectures are available here.
  • First class on Sep 21. First tutorial on Sep 28.
  • We ask students with last names that start with letters A–M to attend the first tutorial session, while the second one is for N–Z. In case of a conflict with your schedule you may attend the other tutorial.
  • The course material is protected with a username / password. To obtain them, please connect to ETHZ network with the VPN client and click here.
  • Self-assessment questions [pdf]
Lectures
Thu 10-12 HG F 1
Tutorials
Thu 15-16 HG D 3.2
Thu 16-17 HG D 3.2

Questions and Discussion Forum
This year we are using Piazza for answering questions related to the lecture, exercises and the project. When posting a question, please choose the appropriate folder (e.g. looking for project teammates: post in folder project_teams). Before posting, please check the relevant folder in order to avoid duplicates. You can enrol to the Piazza course here.

Additionaly, we will keep our mailing list dm2017@lists.inf.ethz.ch active. Questions sent here from ethz.ch addresses will also be answered, but we encourage you to ask on Piazza.

Topics
  • Dealing with large data (Data centers; Map-Reduce/Hadoop; Amazon Mechanical Turk)
  • Fast nearest neighbor methods (Shingling, locality sensitive hashing)
  • Online learning (Online optimization and regret minimization, online convex programming, applications to large-scale Support Vector Machines)
  • Multi-armed bandits (exploration-exploitation tradeoffs, applications to online advertising and relevance feedback)
  • Active learning (uncertainty sampling, pool-based methods, label complexity)
  • Dimension reduction (random projections, nonlinear methods)
  • Data streams (Sketches, coresets, applications to online clustering)
  • Recommender systems
Exercises
The exercise problems will include theoretical and programming problems. Please note that it is not mandatory to submit solutions. We will publish exercise solutions after one week. If you choose to submit, please submit your solution from your ethz.ch address to the person specified on the exercise sheet. This can be latex, but also a simple scan or even a picture of a hand-written solution.
Project
The coursework entails a project, carried out in groups of up to 3 students. The goal of the project is to get hands-on experience in large scale machine learning tasks. The project grade will constitute 30% of the total grade. To access the project, log in to the ETH network (e.g. using VPN) and then click here.

If you are retaking the course, your previous project grades will not be transferred to this year.

Exam
The mode of examination is written, 120 minutes length. The language of examination is English. As written aids, you can bring two A4 pages (i.e. one A4 sheet of paper), either handwritten or 11 point minimum font size. The written exam will constitute 70% of the total grade.
Resources
Text Book
  • J. Leskovec, A. Rajaraman, J. Ullman. Mining of Massive Data Sets. Available here.
References
  • Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI’04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.
  • Jure Leskovec, Eric Horvitz. Planetary-Scale Views on a Large Instant-Messaging Network. In Proceedings WWW 2008 [pdf]
  • Manuel Gomez Rodriguez, Jure Leskovec, Andreas Krause. Inferring Networks of Diffusion and Influence, In Proc. ACM Conference on Knowledge Discovery in Databases (KDD), 2010 [pdf]
  • James Hays, Alexei A. Efros. Scene Completion Using Millions of Photographs. ACM Transactions on Graphics (SIGGRAPH 2007). August 2007, vol. 26, No. 3.
  • Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan and Uri Shaft. When is “Nearest Neighbor” Meaningful?. Proc. 7th International Conference on Database Theory (ICDT 1999) LNCS 1540: 217-235.
  • Aristides Gionis, Piotr Indyk, Rajeev Motwani. Similarity Search in High Dimensions via Hashing [ps]. 25th International Conference on Very Large Databases (VLDB) , 1999
  • Martin Zinkevich. Online Convex Programming and Generalized Infinitesimal Gradient Ascent. Proc. International Conference on Machine Learning 2003 [pdf]
  • Elad Hazan, Amit Agarwal, Satyen Kale. Logarithmic Regret Algorithms for Online Convex Optimization. Machine Learning Journal Volume 69 , Issue 2-3 Pages: 169 – 192 [pdf]
  • John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research 12 (2011): 2121-2159. [pdf]
  • Diederik Kingma, and Jimmy Ba. Adam: A Method for Stochastic Optimization. 3rd International Conference for Learning Representations (2015). [pdf]
  • Martin Zinkevich, Markus Weimer, Alex Smola, Lihong Li. Parallelized Stochastic Gradient Descent. Advances in Neural Information Processing Systems 23, 2010 [pdf]
  • Feng Niu, Benjamin Recht, Christopher Re, and Stephen J. Wright. HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. Advances in Neural Information Processing Systems 24, 2011 [pdf]
  • Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, 2012. [pdf]
  • Ali Rahimi, Ben Recht. Random Features for Large-Scale Kernel Machines. Advances in Neural Information Processing Systems 20, 2007 [pdf]
  • Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou. Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison. Advances in Neural Information Processing Systems 25, 2012 [pdf]
  • Ji Zhu, Saharon Rosset, Trevor Hastie, Rob Tibshirani. L1 norm support vector machines. Advances in Neural Information Processing Systems 2003 [html]
  • John Duchi, Shai Shalev-Shwartz, Yoram Singer, Tushar Chandra. Efficient Projections onto the l1-Ball for Learning in High Dimensions. ICML 2008 [pdf]
  • Nathan Ratliff, J. Andrew (Drew) Bagnell, and Martin Zinkevich. (Online) Subgradient Methods for Structured Prediction. AISTATS 2007 [pdf]
  • Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman. Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning. Advances in Neural Information Processing Systems 2003 [pdf]
  • Simon Tong, Daphne Koller. Support Vector Machine Active Learning with Applications to Text Classification. Proceedings of the Seventeenth International Conference on Machine Learning 2000 [ps]
  • Dan Feldman, Morteza Monemizadeh, Christian Sohler. A PTAS for k-Means Clustering Based on Weak Coresets. Proc. 23th Annu. ACM Symposium on Computational Geometry (SoCG) 2007 [pdf]
  • Chris Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Chapter 9
  • Percy Liang, Dan Klein. Online EM for Unsupervised Models. Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) 2009
  • Dan Feldman, Matthew Faulkner, Andreas Krause. Scalable Training of Mixture Models via Coresets. In Proc. Neural Information Processing Systems, 2011 [pdf]
  • Mario Lucic, Olivier Bachem, Andreas Krause. Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures, 2015 [pdf]
  • Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal, 2002. [pdf]
  • Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW-10, Raleigh, NC, April, 2010. [pdf]
  • Khalid El-Arini, Gaurav Veda, Dafna Shahaf and Carlos Guestrin. Turning Down the Noise in the Blogosphere. In Proc. of 15th International Conference on Knowledge Discovery and Data Mining (KDD 2009) [pdf]
  • Andreas Krause, Daniel Golovin. Submodular Function Maximization. Survey [pdf]
  • Matthew Streeter, Daniel Golovin. An Online Algorithm for Maximizing Submodular Functions. In Proc. 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2008 [pdf]
  • Matthew Streeter, Daniel Golovin, Andreas Krause. Online Learning of Assignments. In Proc. 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 [pdf]
  • Yisong Yue, Carlos Guestrin. Linear Submodular Bandits and their Application to Diversified Retrieval. Neural Information Processing Systems (NIPS), December, 2011 [pdf]