Seminar – Advanced Topics in Machine Learning
In this seminar, recent papers of the pattern recognition and machine learning literature are presented and discussed. Possible topics cover statistical models in computer vision, graphical models and machine learning. The seminar “Advanced Topics in Pattern Recognition” familiarizes students with recent developments in pattern recognition and machine learning. Original articles have to be presented and critically reviewed. The students will learn how to structure a scientific presentation in English which covers the key ideas of a scientific paper. An important goal of the seminar presentation is to summarize the essential ideas of the paper in sufficient depth while omitting details which are not essential for the understanding of the work. The presentation style will play an important role and should reach the level of professional scientific presentations.
The seminar will cover a number of recent papers which have emerged as important contributions to the pattern recognition and machine learning literature. The topics will vary from year to year but they are centered on methodological issues in machine learning like new learning algorithms, ensemble methods or new statistical models for machine learning applications. Frequently, papers are selected from computer vision or bioinformatics – two fields, which relies more and more on machine learning methodology and statistical models.
The papers will be presented in the first session of the seminar. VVZ information is available here
||CAB H 52
|| CHN G 22
|| Distributed Stochastic Variance Reduced Gradient Methods (2015)
|| Communication Efficient Distributed Machine Learning with the Parameter Server (2014)
|| Adaptive subgradient methods for online learning and stochastic optimization (2011)
|| Stochastic dual coordinate ascent methods for regularized loss (2013)
|| Beyond Convexity: Stochastic Quasi-Convex Optimization (2015)
|| Non-convex Robust PCA (2014)
|| A stochastic PCA and SVD algorithm with an exponential convergence rate (2015)
|| Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods (2015)
|| Training Highly Multiclass Classifiers
|| User Conditional Hashtag Prediction for Images
|| Sequence to Sequence Learning with Neural Networks
|| Show and tell: A neural image caption generator
|| A Critical Review of Recurrent Neural Networks for Sequence Learning
|| Neural variational inference and learning in belief networks
|| Neural Turing Machines
|| Zero-shot learning by convex combination of semantic embeddings
|| Giraffe: Using Deep Reinforcement Learning to Play Chess
|| Van der Goten
|| Adaptively Learning the Crowd Kernel (2011)
|| Tuned Models of Peer Assessment in MOOCs (2013)
|| Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing (2014)
|| Probabilistic Programming (ICSE 2014)
|| A New Approach to Probabilistic Programming Inference (2014)
|| Learning Probabilistic Programs (2014)
|| On the convexity of latent social network inference (2010)
|| Scalable Influence Estimation in Continuous-Time Diffusion Networks (2013)
|| Uncovering the Temporal Dynamics of Diffusion Networks (2011)
|| A Tutorial on Bayesian Optimization of Expensive Cost Functions (2010)
|| Practical Bayesian Optimization of Machine Learning Algorithms (2012)
|| Input Warping for Bayesian Optimization of Non-Stationary Functions (2014)
|| High Dimensional Bayesian Optimisation and Bandits via Additive Models (2015)
|| Introduction to causal inference (2010)
|| Identifying the direction of causal time series (2009)
|| Wang Jingyi
|| Nonlinear causal discovery with additive noise models (2008)
|| Probabilistic latent variable models for distinguishing between cause and effect (2010)
|| Towards a learning theory of cause-effect inference (2015)
Tuesday Topics and Papers
Convex and Non-Convex Optimization
- Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12, 2121-2159.
- Shalev-Shwartz, Shai, and Tong Zhang. Stochastic dual coordinate ascent methods for regularized loss. The Journal of Machine Learning Research 14.1 (2013): 567-599.
- Defazio, Aaron, Francis Bach, and Simon Lacoste-Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. NIPS 2014.
- Christopher De Sa, Christopher Re, Kunle Olukotun, Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems, ICML 2015
- Shamir, Ohad. A stochastic PCA and SVD algorithm with an exponential convergence rate. ICML 2015.
- Bhojanapalli, S., and, A. K., & Sanghavi, S. (2015). Dropping Convexity for Faster Semi-definite Optimization arXiv preprint
- Praneeth Netrapalli, U N Niranjan, Sujay Sanghavi (2014). Non-convex Robust PCA, NIPS 2014
- Elad Hazan, Kfir Levy, Shai Shalev-Shwartz (2015). Beyond Convexity: Stochastic Quasi-Convex Optimization NIPS 2015
Deep Learning, Embeddings, Multiclass Classification
- Denton, E., Weston, J., Paluri, M., Bourdev, L., & Fergus, R. (2015). User Conditional Hashtag Prediction for Images. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1731-1740). ACM.
- Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2014). Show and tell: A neural image caption generator. arXiv preprint
- Maya R. Gupta, Samy Bengio, Jason Weston; Training Highly Multiclass Classifiers. The Journal of Machine Learning Research 15(Apr):1461−1492, 2014.
- Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., … & Dean, J. (2013). Zero-shot learning by convex combination of semantic embeddings. arXiv preprint
- Lipton, Z. A Critical Review of Recurrent Neural Networks for Sequence Learning arXiv preprint
- Majid Janzamin, Hanie Sedghi, Anima Anandkumar (2015). Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods arXiv preprint
- Mnih, A., & Gregor, K. (2014). Neural variational inference and learning in belief networks. ICML
- Lai, M. (2015). Giraffe: Using Deep Reinforcement Learning to Play Chess. MSc Thesis
- Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational inference. The Journal of Machine Learning Research, 14(1), 1303-1347.
- Salimans, T. (2014). Markov chain Monte Carlo and variational inference: Bridging the gap. arXiv preprint.
- Paisley, J., Blei, D., & Jordan, M. (2012). Variational Bayesian inference with stochastic search. arXiv preprint arXiv:1206.6430.
- Mnih, A., & Gregor, K. (2014). Neural variational inference and learning in belief networks. ICML
- Djolonga, J., & Krause, A. (2014). From map to marginals: Variational inference in bayesian submodular models. In Advances in Neural Information Processing Systems (pp. 244-252).
- Variational Message Passing. John Winn, Christopher M. Bishop. JMLR 2005
- Lee, J., Ma, T., & Lin, Q. (2015). Distributed Stochastic Variance Reduced Gradient Methods arXiv
- H Mania, X Pan, D Papailiopoulos, B Recht, K Ramchandran, M I. Jordan (2015) Perturbed Iterate Analysis for Asynchronous Stochastic Optimization
- Li, M., Andersen, D. G., Smola, A., & Yu, K. (2014). Communication Efficient Distributed Machine Learning with the Parameter Server. NIPS 2014
- Lee, C.-P., & Roth, D. (2015). Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM. ICML 2015.
Thursday Topics and Papers
- M. Gomez-Rodriguez, D. Balduzzi, B. Schölkopf. Uncovering the Temporal Dynamics of Diffusion Networks. The 28th International Conference on Machine Learning (ICML), 2011.
- Seth Myers, and Jure Leskovec. On the convexity of latent social network inference. NIPS’10.
- Scalable Influence Estimation in Continuous-Time Diffusion Networks. Nan Du, Le Song, Manuel Gomez Rodriguez, and Hongyuan Zha. NIPS 2013.
Learning meets optimization
- G. Papandreou and A. Yuille, Perturb-and-MAP Random Fields: Using Discrete Optimization to Learn and Sample from Energy Models, ICCV 2011
- M. J. Wainwright, T. Jaakkola and A. S. Willsky (2005). A new class of upper bounds on the log partition function. IEEE Trans. on Information Theory
- Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets, Adarsh Prasad, Stefanie Jegelka, Dhruv Batra, NIPS 2014.
- Active Learning as Non-Convex Optimization, Andrew Guillory, Erick Chastain, Jeff Bilmes, AISTATS 2009.
- A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. Eric Brochu, Vlad M. Cora and Nando de Freitas. eprint arXiv:1012.2599, pdf
- Input Warping for Bayesian Optimization of Non-Stationary Functions, Jasper Snoek, Kevin Swersky, Richard S. Zemel, Ryan P. Adams. ICML’14
- Bayesian Optimization with Unknown Constraints, Michael A. Gelbart, Jasper Snoek, Ryan P. Adams. UAI’14
- Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. NIPS, 2012.
- Bayesian Active Learning for Posterior Estimation.Kirthevasan Kandasamy, Jeff Schneider, and Barnabas Poczos. IJCAI, 2015.
- High Dimensional Bayesian Optimisation and Bandits via Additive Models , Kirthevasan Kandasamy, Jeff Schneider, Barnabas Poczos, ICML 2015
- Andrew Gordon, Thomas Henzinger, Aditya Nori, and Sriram Rajamani. Probabilistic Programming. ICSE, 2014.
- A New Approach to Probabilistic Programming Inference, Wood, F., van de Meent, J. W., & Mansinghka, V., AISTATS 2014
- Learning Probabilistic Programs. Perov, Y., & Wood, F. arXiv preprint arXiv:1407.2646. PDF
Learning and Economics / Game Theory
- Adaptively Learning the Crowd Kernel. Omer Tamuz, Ce Liu, Serge Belongie, Ohad Shamir, Adam Tauman Kalai. ICML’11.
- Tuned Models of Peer Assessment in MOOCs. C. Piech, J. Huang, Z. Chen, C. Do, A. Ng, D. Koller. EDM’13.
- Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing. Y. Zhang , X. Chen, D. Zhou, M. Jordan. NIPS’14.
- Spirtes, Peter. “Introduction to causal inference.” The Journal of Machine Learning Research 11 (2010): 1643-1662 with additional examples from Guyon, Isabelle. “Practical feature selection: from correlation to causality.” NATO Science for Peace and Security 19 (2008): 27-43.
- Patrik O. Hoyer et al. “Nonlinear causal discovery with additive noise models” in Advances in Neural Information Processing Systems 21 (NIPS 2008)
- Stegle, Oliver, et al. “Probabilistic latent variable models for distinguishing between cause and effect.” Advances in Neural Information Processing Systems. 2010.
- Peters, Janzing, Gretton and Schölkopf Detecting the Direction of Causal Time Series in ICML 2009
- Lopez-Paz, David, et al. “Towards a learning theory of cause-effect inference.” Proceedings of the 32nd International Conference on Machine Learning, JMLR: W&CP, Lille, France. 2015