8.00-8.30 Spotlights I
8.30-9.00 COFFEE BREAK & Posters
9.00-10.30 Kazuo Murota: Discrete Convex Analysis: Basics, DC Programming, and Submodular Welfare Algorithm
10.30-10.45 Spotlights II
10.45-15.15 SKI & POSTER BREAK
15.15-15.30 Spotlights III
15.30-16.15 Yisong Yue: Balancing the Explore/Exploit Tradeoff in Interactive Structured Prediction
16.20-17.05 Jeff Bilmes: Summarizing Big Data: A Practical Use for Submodular Functions
17.05-17.30 COFFEE BREAK & Posters
17.30-18.15 Michael I. Jordan: MAD-Bayes: MAP-based Asymptotic Derivations from Bayes


Discrete optimization problems and combinatorial structures are becoming increasingly important in machine learning. In the endeavour to process more and more complex data, one may quickly find oneself working with graphs, relations, partitions, or sparsity structure. In addition, we may want to predict structured, sparse estimators, or combinatorial objects such as permutations, trees or other graphs, group structure and so on.

While complex structures enable much richer applications, they often come with the caveat that the related learning and inference problems become computationally very hard. When scaling to large data, combinatorial problems also add some further challenges to compact representations, streaming and distributed computation.

Despite discouraging theoretical worst-case results, many practically interesting problems can be much more well behaved (when modeled appropriately), or are well modeled by assuming properties that make them so. Such properties are most often structural and include symmetry, exchangeability, sparsity, or submodularity. Still, not all of these structures and their application ranges are well understood. The DISCML workshop revolves around such structures in machine learning, their applications and theory.

Invited Talks

Kazuo Murota (University of Tokyo)
Discrete Convex Analysis: Basics, DC Programming, and Submodular Welfare Algorithm

Discrete convex analysis is a theory that aims at a discrete analogue of convex analysis for nonlinear discrete optimization. Technically it is a nonlinear generalization of matroid/submodular function theory; matroids are generalized to M-convex functions and submodular set functions to L-convex function.

The first part of the talk is an introduction to fundamental concepts and theorems in discrete convex analysis: definition of M-convex and L-convex functions, relation between submodularity and convexity/concavity, M-L conjugacy theorem and duality theorems.

The second part of the talk presents two recent topics (both joint work with T. Maehara):
(i) Discrete DC (difference of convex) programming. This is a discrete analogue of DC (difference of convex) programming, including DS (difference of submodular functions) programming as a special case. The mathematical basis for this framework is the conjugacy results in discrete convex analysis.
(ii) Valuated-matroid based algorithm for submodular welfare problem. A subclass of the submodular welfare problem, with matroid valuations (M-natural concave functions), is solvable to optimality via valuated matroid intersection algorithms. On the basis of this fact, an algorithm for (general) submodular welfare problems is designed with some heuristics. The mathematical basis for this algorithm is the duality results in discrete convex analysis. Computational results are also reported.

K. Murota (2000): Matrices and Matroids for Systems Analysis, Springer.
K. Murota (2003): Discrete Convex Analysis, SIAM.

Yisong Yue (Disney Research)
Balancing the Explore/Exploit Tradeoff in Interactive Structured Prediction

Many prediction domains, ranging from content recommendation in a digital system to motion planning in a physical system, require making structured predictions. Broadly speaking, structured prediction refers to any type of prediction performed jointly over multiple input instances, and has been a topic of active research in the machine learning community over the past 10-15 years. However, what has been less studied is how to model structured prediction problems for an interactive system. For example, a recommender system necessarily interacts with users when recommending content, and can learn from the subsequent user feedback on those recommendations. In general, each "prediction" is an interaction where the system not only predicts a structured action to take, but also receives feedback corresponding to the utility of that action.

In this talk, I will describe methods for balancing the tradeoff between exploration (collecting informative feedback) versus exploitation (maximizing system utility) when making structured predictions in an interactive environment. Exploitation corresponds to the standard prediction goal in non-interactive settings, where one predicts the best possible action given the current model. Exploration refers to taking actions that maximize the informativeness of the subsequent feedback, so that one can exploit more reliably in future interactions. I will show how to model and optimize for this tradeoff in two settings: diversified news recommendation (where the feedback comes from users) and adaptive vehicle routing (where the feedback comes from measuring congestion).

This is joint work with Carlos Guestrin, Sue Ann Hong, Ramayya Krishnan and Siyuan Liu.

Michael Jordan (UC Berkeley)
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

One of the virtues of the Bayesian nonparametric approach to statistical modeling is that is naturally combinatorial---the underlying prior distributions are generally taken from a class of stochastic processes known as "combinatorial stochastic processes." Appealing to Bayes' theorem, the posterior distribution is also a combinatorial stochastic process and inferences have a combinatorial flavor. The difficulty is computational---while MCMC and variational methods exist for these models their scaling is generally unfavorable or unknown. Bowing to the computational imperative, we recall that K-means can be obtained from a finite mixture of Gaussians model via "small-variance asymptotics," where the covariances of the Gaussian components are taken analytically to zero, and we ask what happens when similar asymptotics are applied to BNP models, specifically Dirichlet process mixtures, and then hierarchical Dirichlet processes and other BNP model such as the beta process and hierarchical beta processes. The answer is that interesting new procedures are obtained; non-Bayesian procedures to be sure, but fast, effective procedures that can be applied to large-scale problems.

[With Tamara Broderick and Brian Kulis.]

Jeff Bilmes (University of Washington)
Summarizing Big Data: A Practical Use for Submodular Functions

The explosion of available data is both a blessing and a curse for the field of machine learning. While bigger data is different data, and can lead to improved statistical accuracy, bigger data may also be plagued with redundancy. In this talk, we will discuss how simple submodular function optimization can address such problems. In particular, we will review how submodular functions can practically be used for big data summarization, and will do so in four different domains. Starting with our earlier work on document summarization, we will next review recent work on summarization large speech corpus training data, summarizing training data for the sake of a given test data set in statistical machine translation, and lastly some recent work on image summarization.

[Joint work with Hui Lin, Kai Wei, Yisong Song, Sebastian Tschiatschek, and Rishabh Iyer]

Accepted Papers

Spotlights I

Spotlights II

Spotlights III


Previous workshops

The workshop is a continuation of DISCML 2009, 2010, and 2011 and 2012. Other related workshops are the OPT workshop.