Seminar – Advanced Topics in Machine Learning

In this seminar, recent papers of the pattern recognition and machine learning literature are presented and discussed. Possible topics cover statistical models in computer vision, graphical models and machine learning. The seminar “Advanced Topics in Pattern Recognition” familiarizes students with recent developments in pattern recognition and machine learning. Original articles have to be presented and critically reviewed. The students will learn how to structure a scientific presentation in English which covers the key ideas of a scientific paper. An important goal of the seminar presentation is to summarize the essential ideas of the paper in sufficient depth while omitting details which are not essential for the understanding of the work. The presentation style will play an important role and should reach the level of professional scientific presentations. The seminar will cover a number of recent papers which have emerged as important contributions to the pattern recognition and machine learning literature. The topics will vary from year to year but they are centered on methodological issues in machine learning like new learning algorithms, ensemble methods or new statistical models for machine learning applications. Frequently, papers are selected from computer vision or bioinformatics – two fields, which relies more and more on machine learning methodology and statistical models. The papers will be presented in the first session of the seminar. VVZ information is available here.
Lectures
Tue 16-18 CAB H 52
Thu 16-18 CHN G 22

Tuesday Schedule
Date Presenter Topic
20 Oct Melis Distributed Stochastic Variance Reduced Gradient Methods (2015)
20 Oct Jagerman Communication Efficient Distributed Machine Learning with the Parameter Server (2014)
27 Oct Pilgerstorfer Adaptive subgradient methods for online learning and stochastic optimization (2011)
27 Oct Mutny Stochastic dual coordinate ascent methods for regularized loss (2013)
03 Nov Karaivanov Beyond Convexity: Stochastic Quasi-Convex Optimization (2015)
03 Nov Wang Non-convex Robust PCA (2014)
10 Nov Holmer A stochastic PCA and SVD algorithm with an exponential convergence rate (2015)
10 Nov Herbst Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods (2015)
17 Nov Lianos Training Highly Multiclass Classifiers
17 Nov Helminger User Conditional Hashtag Prediction for Images
17 Nov Bahman Sequence to Sequence Learning with Neural Networks
24 Nov Spurr Show and tell: A neural image caption generator
24 Nov Marti A Critical Review of Recurrent Neural Networks for Sequence Learning
01 Dec Ciganovic Neural variational inference and learning in belief networks
01 Dec Ihnatov Neural Turing Machines
15 Dec Kan Zero-shot learning by convex combination of semantic embeddings
15 Dec Ghosh Giraffe: Using Deep Reinforcement Learning to Play Chess
Thursday Schedule
Date Presenter Topic
15 Oct Van der Goten Adaptively Learning the Crowd Kernel (2011)
15 Oct Vollprecht Tuned Models of Peer Assessment in MOOCs (2013)
22 Oct Calderara Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing (2014)
22 Oct Hamas Probabilistic Programming (ICSE 2014)
29 Oct Greuter A New Approach to Probabilistic Programming Inference (2014)
29 Oct Porvaznik Learning Probabilistic Programs (2014)
5 Nov Nikolov On the convexity of latent social network inference (2010)
5 Nov Minhaz Scalable Influence Estimation in Continuous-Time Diffusion Networks (2013)
12 Nov Koleva Uncovering the Temporal Dynamics of Diffusion Networks (2011)
12 Nov Carion A Tutorial on Bayesian Optimization of Expensive Cost Functions (2010)
19 Nov Song Practical Bayesian Optimization of Machine Learning Algorithms (2012)
19 Nov Wu Input Warping for Bayesian Optimization of Non-Stationary Functions (2014)
26 Nov Ma High Dimensional Bayesian Optimisation and Bandits via Additive Models (2015)
26 Nov Nishant Introduction to causal inference (2010)
3 Dec Abdelmessih Identifying the direction of causal time series (2009)
3 Dec Wang Jingyi Nonlinear causal discovery with additive noise models (2008)
17 Dec Demitri Probabilistic latent variable models for distinguishing between cause and effect (2010)
17 Dec Raiskin Towards a learning theory of cause-effect inference (2015)
Tuesday Topics and Papers
Convex and Non-Convex Optimization
  • Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12, 2121-2159.
  • Shalev-Shwartz, Shai, and Tong Zhang. Stochastic dual coordinate ascent methods for regularized loss. The Journal of Machine Learning Research 14.1 (2013): 567-599.
  • Defazio, Aaron, Francis Bach, and Simon Lacoste-Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. NIPS 2014.
  • Christopher De Sa, Christopher Re, Kunle Olukotun, Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems, ICML 2015
  • Shamir, Ohad. A stochastic PCA and SVD algorithm with an exponential convergence rate. ICML 2015.
  • Bhojanapalli, S., and, A. K., & Sanghavi, S. (2015). Dropping Convexity for Faster Semi-definite Optimization arXiv preprint
  • Praneeth Netrapalli, U N Niranjan, Sujay Sanghavi (2014). Non-convex Robust PCA, NIPS 2014
  • Elad Hazan, Kfir Levy, Shai Shalev-Shwartz (2015). Beyond Convexity: Stochastic Quasi-Convex Optimization NIPS 2015
Deep Learning, Embeddings, Multiclass Classification
  • Denton, E., Weston, J., Paluri, M., Bourdev, L., & Fergus, R. (2015). User Conditional Hashtag Prediction for Images. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1731-1740). ACM.
  • Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2014). Show and tell: A neural image caption generator. arXiv preprint
  • Maya R. Gupta, Samy Bengio, Jason Weston; Training Highly Multiclass Classifiers. The Journal of Machine Learning Research 15(Apr):1461−1492, 2014.
  • Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., … & Dean, J. (2013). Zero-shot learning by convex combination of semantic embeddings. arXiv preprint
  • Lipton, Z. A Critical Review of Recurrent Neural Networks for Sequence Learning arXiv preprint
  • Majid Janzamin, Hanie Sedghi, Anima Anandkumar (2015). Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods arXiv preprint
  • Mnih, A., & Gregor, K. (2014). Neural variational inference and learning in belief networks. ICML
  • Lai, M. (2015). Giraffe: Using Deep Reinforcement Learning to Play Chess. MSc Thesis
Variational Inference
  • Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational inference. The Journal of Machine Learning Research, 14(1), 1303-1347.
  • Salimans, T. (2014). Markov chain Monte Carlo and variational inference: Bridging the gap. arXiv preprint.
  • Paisley, J., Blei, D., & Jordan, M. (2012). Variational Bayesian inference with stochastic search. arXiv preprint arXiv:1206.6430.
  • Mnih, A., & Gregor, K. (2014). Neural variational inference and learning in belief networks. ICML
  • Djolonga, J., & Krause, A. (2014). From map to marginals: Variational inference in bayesian submodular models. In Advances in Neural Information Processing Systems (pp. 244-252).
  • Variational Message Passing. John Winn, Christopher M. Bishop. JMLR 2005
Distributed Optimization
  • Lee, J., Ma, T., & Lin, Q. (2015). Distributed Stochastic Variance Reduced Gradient Methods arXiv
  • H Mania, X Pan, D Papailiopoulos, B Recht, K Ramchandran, M I. Jordan (2015) Perturbed Iterate Analysis for Asynchronous Stochastic Optimization
  • Li, M., Andersen, D. G., Smola, A., & Yu, K. (2014). Communication Efficient Distributed Machine Learning with the Parameter Server. NIPS 2014
  • Lee, C.-P., & Roth, D. (2015). Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM. ICML 2015.
Thursday Topics and Papers
Network Inference
  • M. Gomez-Rodriguez, D. Balduzzi, B. Schölkopf. Uncovering the Temporal Dynamics of Diffusion Networks. The 28th International Conference on Machine Learning (ICML), 2011.
  • Seth Myers, and Jure Leskovec. On the convexity of latent social network inference. NIPS’10.
  • Scalable Influence Estimation in Continuous-Time Diffusion Networks. Nan Du, Le Song, Manuel Gomez Rodriguez, and Hongyuan Zha. NIPS 2013.
Learning meets optimization
  • G. Papandreou and A. Yuille, Perturb-and-MAP Random Fields: Using Discrete Optimization to Learn and Sample from Energy Models, ICCV 2011
  • M. J. Wainwright, T. Jaakkola and A. S. Willsky (2005). A new class of upper bounds on the log partition function. IEEE Trans. on Information Theory
  • Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets, Adarsh Prasad, Stefanie Jegelka, Dhruv Batra, NIPS 2014.
  • Active Learning as Non-Convex Optimization, Andrew Guillory, Erick Chastain, Jeff Bilmes, AISTATS 2009.
Bayesian optimization
  • A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. Eric Brochu, Vlad M. Cora and Nando de Freitas. eprint arXiv:1012.2599, pdf
  • Input Warping for Bayesian Optimization of Non-Stationary Functions, Jasper Snoek, Kevin Swersky, Richard S. Zemel, Ryan P. Adams. ICML’14
  • Bayesian Optimization with Unknown Constraints, Michael A. Gelbart, Jasper Snoek, Ryan P. Adams. UAI’14
  • Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. NIPS, 2012.
  • Bayesian Active Learning for Posterior Estimation.Kirthevasan Kandasamy, Jeff Schneider, and Barnabas Poczos. IJCAI, 2015.
  • High Dimensional Bayesian Optimisation and Bandits via Additive Models , Kirthevasan Kandasamy, Jeff Schneider, Barnabas Poczos, ICML 2015
Probabilistic Programming
  • Andrew Gordon, Thomas Henzinger, Aditya Nori, and Sriram Rajamani. Probabilistic Programming. ICSE, 2014.
  • A New Approach to Probabilistic Programming Inference, Wood, F., van de Meent, J. W., & Mansinghka, V., AISTATS 2014
  • Learning Probabilistic Programs. Perov, Y., & Wood, F. arXiv preprint arXiv:1407.2646. PDF
Learning and Economics / Game Theory
  • Adaptively Learning the Crowd Kernel. Omer Tamuz, Ce Liu, Serge Belongie, Ohad Shamir, Adam Tauman Kalai. ICML’11.
  • Tuned Models of Peer Assessment in MOOCs. C. Piech, J. Huang, Z. Chen, C. Do, A. Ng, D. Koller. EDM’13.
  • Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing. Y. Zhang , X. Chen, D. Zhou, M. Jordan. NIPS’14.
Causality
  • Spirtes, Peter. “Introduction to causal inference.” The Journal of Machine Learning Research 11 (2010): 1643-1662 with additional examples from Guyon, Isabelle. “Practical feature selection: from correlation to causality.” NATO Science for Peace and Security 19 (2008): 27-43.
  • Patrik O. Hoyer et al. “Nonlinear causal discovery with additive noise models” in Advances in Neural Information Processing Systems 21 (NIPS 2008)
  • Stegle, Oliver, et al. “Probabilistic latent variable models for distinguishing between cause and effect.” Advances in Neural Information Processing Systems. 2010.
  • Peters, Janzing, Gretton and Schölkopf Detecting the Direction of Causal Time Series in ICML 2009
  • Lopez-Paz, David, et al. “Towards a learning theory of cause-effect inference.” Proceedings of the 32nd International Conference on Machine Learning, JMLR: W&CP, Lille, France. 2015