|8.30-9.00||COFFEE BREAK & Posters|
|9.00-10.30||Kazuo Murota: Discrete Convex Analysis: Basics, DC Programming, and Submodular Welfare Algorithm|
|10.45-15.15||SKI & POSTER BREAK|
|15.15-15.30||Spotlights III||15.30-16.15||Yisong Yue: Balancing the Explore/Exploit Tradeoff in Interactive Structured Prediction|
|16.20-17.05||Jeff Bilmes: Summarizing Big Data: A Practical Use for Submodular Functions|
|17.05-17.30||COFFEE BREAK & Posters|
|17.30-18.15||Michael I. Jordan: MAD-Bayes: MAP-based Asymptotic Derivations from Bayes|
Discrete optimization problems and combinatorial structures are becoming increasingly important in machine learning. In the endeavour to process more and more complex data, one may quickly find oneself working with graphs, relations, partitions, or sparsity structure. In addition, we may want to predict structured, sparse estimators, or combinatorial objects such as permutations, trees or other graphs, group structure and so on.
While complex structures enable much richer applications, they often come with the caveat that the related learning and inference problems become computationally very hard. When scaling to large data, combinatorial problems also add some further challenges to compact representations, streaming and distributed computation.
Despite discouraging theoretical worst-case results, many practically interesting problems can be much more well behaved (when modeled appropriately), or are well modeled by assuming properties that make them so. Such properties are most often structural and include symmetry, exchangeability, sparsity, or submodularity. Still, not all of these structures and their application ranges are well understood. The DISCML workshop revolves around such structures in machine learning, their applications and theory.
Kazuo Murota (University of Tokyo)
Discrete Convex Analysis:
Basics, DC Programming, and Submodular Welfare Algorithm
Discrete convex analysis is a theory that aims at a discrete analogue of convex analysis for nonlinear discrete optimization. Technically it is a nonlinear generalization of matroid/submodular function theory; matroids are generalized to M-convex functions and submodular set functions to L-convex function.
The first part of the talk is an introduction to fundamental concepts and theorems in discrete convex analysis: definition of M-convex and L-convex functions, relation between submodularity and convexity/concavity, M-L conjugacy theorem and duality theorems.
The second part of the talk presents two recent topics
(both joint work with T. Maehara):
(i) Discrete DC (difference of convex) programming. This is a discrete analogue of DC (difference of convex) programming, including DS (difference of submodular functions) programming as a special case. The mathematical basis for this framework is the conjugacy results in discrete convex analysis.
(ii) Valuated-matroid based algorithm for submodular welfare problem. A subclass of the submodular welfare problem, with matroid valuations (M-natural concave functions), is solvable to optimality via valuated matroid intersection algorithms. On the basis of this fact, an algorithm for (general) submodular welfare problems is designed with some heuristics. The mathematical basis for this algorithm is the duality results in discrete convex analysis. Computational results are also reported.
K. Murota (2000): Matrices and Matroids for Systems Analysis, Springer.
K. Murota (2003): Discrete Convex Analysis, SIAM.
Yisong Yue (Disney Research)
Balancing the Explore/Exploit Tradeoff in Interactive Structured Prediction
Many prediction domains, ranging from content recommendation in a digital system to motion planning in a physical system, require making structured predictions. Broadly speaking, structured prediction refers to any type of prediction performed jointly over multiple input instances, and has been a topic of active research in the machine learning community over the past 10-15 years. However, what has been less studied is how to model structured prediction problems for an interactive system. For example, a recommender system necessarily interacts with users when recommending content, and can learn from the subsequent user feedback on those recommendations. In general, each "prediction" is an interaction where the system not only predicts a structured action to take, but also receives feedback corresponding to the utility of that action.
In this talk, I will describe methods for balancing the tradeoff between exploration (collecting informative feedback) versus exploitation (maximizing system utility) when making structured predictions in an interactive environment. Exploitation corresponds to the standard prediction goal in non-interactive settings, where one predicts the best possible action given the current model. Exploration refers to taking actions that maximize the informativeness of the subsequent feedback, so that one can exploit more reliably in future interactions. I will show how to model and optimize for this tradeoff in two settings: diversified news recommendation (where the feedback comes from users) and adaptive vehicle routing (where the feedback comes from measuring congestion).
This is joint work with Carlos Guestrin, Sue Ann Hong, Ramayya Krishnan and Siyuan Liu.
Michael Jordan (UC Berkeley)
One of the virtues of the Bayesian nonparametric approach to statistical
modeling is that is naturally combinatorial---the underlying prior distributions
are generally taken from a class of stochastic processes known as "combinatorial
stochastic processes." Appealing to Bayes' theorem, the posterior distribution
is also a combinatorial stochastic process and inferences have a combinatorial
flavor. The difficulty is computational---while MCMC and variational methods
exist for these models their scaling is generally unfavorable or unknown.
Bowing to the computational imperative, we recall that K-means can be obtained
from a finite mixture of Gaussians model via "small-variance asymptotics," where
the covariances of the Gaussian components are taken analytically to zero, and
we ask what happens when similar asymptotics are applied to BNP models, specifically
Dirichlet process mixtures, and then hierarchical Dirichlet processes and other
BNP model such as the beta process and hierarchical beta processes. The answer
is that interesting new procedures are obtained; non-Bayesian procedures to be
sure, but fast, effective procedures that can be applied to large-scale problems.
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes
[With Tamara Broderick and Brian Kulis.]
Jeff Bilmes (University of Washington)
The explosion of available data is both a blessing and a curse for the
field of machine learning. While bigger data is different data, and
can lead to improved statistical accuracy, bigger data may also be
plagued with redundancy. In this talk, we will discuss how simple
submodular function optimization can address such problems. In
particular, we will review how submodular functions can practically be
used for big data summarization, and will do so in four different
domains. Starting with our earlier work on document summarization, we
will next review recent work on summarization large speech corpus
training data, summarizing training data for the sake of a given test
data set in statistical machine translation, and lastly some recent
work on image summarization.
Summarizing Big Data: A Practical Use for Submodular Functions
[Joint work with Hui Lin, Kai Wei, Yisong Song, Sebastian Tschiatschek, and Rishabh Iyer]
- Baharan Mirzasoleiman, Amin Karbasi, Andreas Krause, Rik Sarkar.
Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
- Adarsh Prasad, Stefanie Jegelka, Dhruv Batra.
Submodular Maximization and Diversity in Structured Output Spaces (Supplement)
- Junkyu Lee, William Lam, Rina Dechter.
Benchmark on DAOOPT and GUROBI with the PASCAL2 Inference Challenge Problems
- Alon Milchgrub, Rina Dechter. On Minimal Tree-Inducing Cycle-Cutsets and Their Use in a Cutset-Driven Local Search
- Kui Tang, Tony Jebara.
Network Ranking With Bethe Pseudomarginals
- Hidekazu Oiwa, Issei Sato, Hiroshi Nakagawa
Novel Sparse Modeling by L2 + L0 Regularization
- Sarah R. Allen, Lisa Hellerstein.
Approximation algorithms for reducing classification cost in ensembles of classifiers
- Yudong Chen, Jiaming Xu.
Statistical-Computational Tradeoffs in Planted Models: The High-Dimensional Setting
- K. S. Sesh Kumar, Francis Bach.
Maximizing submodular functions using probabilistic graphical models
- Vitaly Feldman, Jan Vondrak
Approximation of Submodular and XOS Functions by Juntas with Applications to Learning
- Stefanie Jegelka, UC Berkeley
- Andreas Krause, ETH Zurich
- Pradeep Ravikumar,
University of Texas, Austin
A. Bilmes, University of Washington
The workshop is a continuation of DISCML 2009, 2010, and 2011 and 2012. Other related workshops are the OPT workshop.