# Schedule

7.30-7.50 |
Opening remarks |

7.50-8.40 | Pushmeet Kohli: Exploiting Problem Structure for Efficient Discrete Optimization [video] |

8.40-9.00 | Spotlight presentations |

9.00-9.15 | BREAK (Posters) |

9.15-10.05 | Francis Bach: Learning with Submodular Functions: A Convex Optimization Perspective [video] |

10.05-10.30 | Spotlight presentations |

10.30-16.00 |
BREAK
(Posters) |

16.00-16.30 | Spotlight presentations |

16.30-17.50 | Jack Edmonds: Polymatroids and
Submodularity [introduction] [video] |

17.50-18.20 | BREAK (Posters) |

18.20-19.10 | Nicolò Cesa-Bianchi: Combinatorial prediction games [video] |

**Jack Edmonds: Polymatroids and Submodularity**

Many polytime algorithms have now been presented for minimizing a submodular function f(S) over the subsets S of a finite set E. We provide a tutorial in (somewhat hidden) theoretical foundations of them all. In particular, f can be easily massaged to a set function g(S) which is submodular, non-decreasing, and zero on the empty set, so that minimizing f(S) is equivalent to repeatedly determining whether a point x is in the polymatroid, P(g) = {x : x >= 0 and, for every S, sum of x(j) over j in S is at most g(S)}. A fundamental theorem says that, assuming g(S) is integer, the 0,1 vectors x in P(g) are the incidence vectors of the independent sets of a matroid M(P(g)). Another gives an easy description of the vertices of P(g). We will show how these ideas provide beautiful, but complicated, polytime algorithms for the possibly useful optimum branching system problem.

Jack Edmonds is considered by many to be one of the greatest mathematicians of the 20th century. Edmonds
introduced the concept and class we now call NP (that is, existential polynomial time) and good characterizations
(NP \ coNP) in his classic 1965 paper ''Minimum Partition of a Matroid Into Independent Subsets.'' In this paper, he
also gave the first polynomial time algorithm for a special case of submodular function minimization. |

*His paper ''Paths, Trees and Flowers'' also in 1965 was a first paper to suggest that it might be possible to establish a formal theory of efficient computation. His paper in 1969 ''Submodular Functions, Matroids, and Certain Polyhedra'' was a groundbreaking paper that established many of the critical theorems about submodular functions, including a good characterization of the minimum of a submodular functions, and polyhedral generalizations of matroids. Quoting Lex Schrijver's three volume book ''Combinatorial Optimization'':*

Pioneered by the work of Jack Edmonds, polyhedral combinatorics has proved to be a most powerful, coherent, and unifying tool throughout combinatorial optimization. Not only has it led to efficient (that is, polynomial-time) algorithms, but also, conversely, efficient algorithms often imply polyhedral characterizations and related min-max relations. It makes the two sides closely intertwined. In the 1960s, Edmonds advocated calling an algorithm efficient [or ''good''] if its running time is bounded by a polynomial in the size of its representation. Since then, this criterion has won broad acceptance, also because Edmonds found polynomial-time algorithms for several important combinatorial optimization problems. Edmonds conjectured that there is no polynomial-time algorithm for the travelling salesman problem. In language that was developed later, this is equivalent to NP != P

*Quoting Richard Karp:*

Jack Edmonds papers and a few key discussions with him drew my attention to the crucial distinction between polynomial-time and superpolynomial-time solvability. I was also influenced by Jacks emphasis on min-max theorems as a tool for fast verification of optimal solutions, which foreshadowed Steve Cooks definition of the complexity class NP.

**Pushmeet Kohli: Exploiting Problem Structure for Efficient Discrete Optimization**

Many problems in computer vision and machine learning require inferring the most probable states of certain hidden or unobserved variables. This inference problem can be formulated in terms of minimizing a function of discrete variables. The scale and form of computer vision problems raise many challenges in this optimization task. For instance, functions encountered in vision may involve millions or sometimes even billions of variables. Furthermore, the functions may contain terms that encode very high-order interaction between variables. These properties ensure that the minimization of such functions using conventional algorithms is extremely computationally expensive. In this talk, I will discuss how many of these challenges can be overcome by exploiting the sparse and heterogeneous nature of discrete optimization problems encountered in real world computer vision problems. Such problem-aware approaches to optimization can lead to substantial improvements in running time and allow us to produce good solutions to many important problems.

*Pushmeet Kohli is a research scientist in the Machine Learning and Perception group at Microsoft Research Cambridge, and an associate of the Psychometric Centre, University of Cambridge. Pushmeet’s research revolves around Intelligent Systems and Computational Sciences with a particular emphasis on algorithms and models for scene understanding, and the use of new sensors such as KINECT for the problems of human pose estimation and robotics. Pushmeet's papers have appeared in SIGGRAPH, NIPS, ICCV, AAAI, CVPR, PAMI, IJCV, CVIU, ICML, AISTATS, AAMAS, UAI, ECCV, and ICVGIP and have won best paper awards in ICVGIP 2006, 2010 and ECCV 2010. His PhD thesis, titled ''Minimizing Dynamic and Higher Order Energy Functions using Graph Cuts'', was the winner of the British Machine Vision Association’s ''Sullivan Doctoral Thesis Award'', and was a runner-up for the British Computer Society’s ''Distinguished Dissertation Award''.*

**Francis Bach: Learning with Submodular Functions: A Convex Optimization Perspective**

Submodular functions are relevant to machine learning for mainly two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the Lovasz extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning. In this talk, I will present the theory of submodular functions from a convex analysis perspective, presenting tight links between certain polyhedra, combinatorial optimization and convex optimization problems. In particular, I will show how submodular function minimization is equivalent to solving a wide variety of convex optimization problems. This allows the derivation of new efficient algorithms for approximate submodular function minimization with theoretical guarantees and good practical performance. By listing examples of submodular functions, I will also review various applications to machine learning, such as clustering or subset selection, as well as a family of structured sparsity-inducing norms that can be derived and used from submodular functions.

*Francis Bach is a researcher at INRIA and Ecole Normale Superieure, leading the SIERRA project-team. A former student of Ecole Polytechnique, he completed his Ph.D. in Computer Science at U.C. Berkeley, working with Professor Michael Jordan. He is interested in statistical machine learning, and especially in graphical models, sparse methods, kernel-based learning, vision, signal processing, convex and combinatorial optimization.*

**Nicolò Cesa-Bianchi: Combinatorial prediction games**

Combinatorial prediction games are problems of online linear optimization in which the action space is a combinatorial space. These games can be studied under different feedback models: full, semi-bandit, and bandit. In first part of the talk we will describe the main known facts about these models and mention some of the open problems. In the second part we will focus on the bandit feedback and describe some recent results which strengthen the link between bandit optimization and convex geometry.

*Nicolò Cesa-Bianchi is professor of Computer Science at the University of Milano, Italy. He was President of the Association for Computational Learning (2006-2009). He is member of the steering committee of the EC-funded Network of Excellence PASCAL2, action editor for the IEEE Transactions on Information Theory and for the Journal of Machine Learning Research. His main research interests include statistical learning theory, game-theoretic learning, and pattern analysis. He is co-author with Gabor Lugosi of the monography ''Prediction, Learning, and Games'' (Cambridge University Press, 2006). He is recipient of a Google Research Award (2010) and a Xerox Foundation UAC award (2011).*

# Workshop Overview

Solving optimization problems with ultimately discretely solutions is becoming increasingly important in machine learning: At the core of statistical machine learning is to infer conclusions from data, and when the variables underlying the data are discrete, both the tasks of inferring the model from data, as well as performing predictions using the estimated model are discrete optimization problems. Many of the resulting optimization problems are NP-hard, and typically, as the problem size increases, standard off-the-shelf optimization procedures become intractable.Fortunately, most discrete optimization problems that arise in machine learning have specific structure, which can be leveraged in order to develop tractable exact or approximate optimization procedures. For example, consider the case of a discrete graphical model over a set of random variables. For the task of prediction, a key structural object is the "marginal polytope," a convex bounded set characterized by the underlying graph of the graphical model. Properties of this polytope, as well as its approximations, have been successfully used to develop efficient algorithms for inference. For the task of model selection, a key structural object is the discrete graph itself. Another problem structure is

*sparsity*: While estimating a high-dimensional model for regression from a limited amount of data is typically an ill-posed problem, it becomes solvable if it is known that many of the coefficients are zero. Another problem structure,

*submodularity*, a discrete analog of convexity, has been shown to arise in many machine learning problems, including structure learning of probabilistic models, variable selection and clustering. One of the primary goals of this workshop is to investigate how to leverage such structures.

The focus of this yearÃ¢ÂÂs workshop is on the

*interplay between discrete optimization and machine learning*: How can we solve inference problems arising in machine learning using discrete optimization? How can one solve discrete optimization problems involving parameters that themselves are estimated from training data? How can we solve challenging sequential and adaptive discrete optimization problems where we have the opportunity to incorporate feedback (online and active learning with combinatorial decision spaces)? We will also explore applications of such approaches in computer vision, NLP, information retrieval etc.

## Invited speakers (confirmed)

## Format

Broadly, this one-day workshop aims at exploring the current challenges in discrete optimization in machine learning. It will explore these topics in tutorials and invited talks. In addition, we will have a poster session with spotlight presentations to provide a platform for presenting new contributions.## Organizers

- Andreas Krause, ETH Zurich
- Pradeep Ravikumar,
University of Texas, Austin

- Jeff
A. Bilmes, University of Washington

- Stefanie Jegelka, Max Planck Institute for Intelligent Systems, Tuebingen

## Other related workshops

The proposed workshop is a continuation of DISCML 2009 and 2010. This year's workshop will have a particular focus on the interplay between learning and discrete optimization. The workshop is also related to the workshop series Optimization in Machine Learning.