# Schedule

7.45-7.55 |
Opening remarks |

7.55-8.40 | Venkat Chandrasekaran: Computational and Statistical Tradeoffs via Convex Relaxation [video] |

8.40-9.00 | Spotlights I |

9.00-9.30 | COFFEE BREAK & Posters |

9.30-10.30 | Satoru Fujishige: Submodularity and Discrete Convexity [video] |

10.30-15.30 |
BREAK
(Posters) |

15.30-15.55 | Spotlights II |

15.55-16.40 | Amir Globerson: What cannot be learned with Bethe Approximations [video] |

16.40-17.00 | Spotlights III |

17.00-17.30 | COFFEE BREAK & Posters |

17.30-18.15 | Alex Smola: The Parameter Server [video] |

18.15-18.30 | Posters |

## Invited Talks

**Venkat Chandrasekaran: Computational and Statistical Tradeoffs via Convex Relaxation**

In modern data analysis, one is frequently faced with statistical inference problems involving massive datasets. Processing such large datasets is usually viewed as a substantial computational challenge. However, if data are a statistician's main resource then access to more data should be viewed as an asset rather than as a burden. We describe a computational framework based on convex relaxation to reduce the computational complexity of an inference procedure when one has access to increasingly larger datasets. Convex relaxation techniques have been widely used in theoretical computer science as they give tractable approximation algorithms to many computationally intractable tasks. We demonstrate the efficacy of this methodology in statistical estimation in providing concrete time-data tradeoffs in a class of denoising problems. Thus, convex relaxation offers a principled approach to exploit the statistical gains from larger datasets to reduce the runtime of inference algorithms. (Joint work with Michael Jordan)

**Satoru Fujishige: Submodularity and Discrete Convexity**

The present talk is a tutorial one and gives some essence of
submodular functions and discrete convexity.

The contents of my talk will be the following:

- Submodular and Supermodular Functions

1.1 Submodularity

1.2 Distributive Lattices and Posets (Birkhoff-Iri)

- Submodular Systems and Supermodular Systems

2.1 Submodular polyhedron and Base Polyhedron

2.2 Duality

2.3 Polymatroids and Matroids

2.4 Generalized Polymatroids

2.5 Characterization by Edge-vectors

2.6 Crossing- and Intersecting-submodular Functions

- Intersection Theorem and Its Equivalents

3.1 Intersection Theorem

3.2 Discrete Separation Theorem

3.3 Fenchel Duality Theorem

3.4 Minkowski Sum Theorem

- Minimum-Norm Base and Submodular Function Minimization

4.1 Lexicographically Optimal Base

4.2 Minimum-Norm Base

4.3 Submodular Function Minimization

- Maximum Weight Base Problem

5.1 Greedy Algorithm

5.2 Lovasz extension

5.3 Subgradients and Subdifferentials of Submodular Functions

- Discrete Convex Analysis

6.1 L♮-convex Functions

6.2 M♮-convex Functions

6.3 Discrete Separation Theorem

6.4 Fenchel Duality Theorem

**Amir Globerson: What cannot be learned with Bethe Approximations**

We address the problem of learning the parameters in graphical models when inference is intractable. A common strategy in this case is to replace the partition function with its Bethe approximation. However not much is known about the theoretical properties of such approximations. Here we show that there exists a regime of empirical marginals where such "Bethe learning" will fail. By failure we mean that moment matching will not be achieved. We provide several conditions on empirical marginals that yield outer and inner bounds on the set of Bethe learnable marginals. An interesting implication of our results is that there exists a large class of marginals that cannot be obtained as stable fixed points of belief propagation. Taken together our results provide a novel approach to analyzing learning with Bethe approximations and highlight when it can be expected to work or fail.

**Alex Smola: The Parameter Server**

In this talk I will discuss a number of vignettes on scaling optimization and inference. Despite arising from very different contexts (graphical models inference, convex optimization, neural networks), they all share a common design pattern - a synchronization mechanism in the form of a parameter server. It formalizes the notion of decomposing optimization problems into subsets and reconciling partial solutions. I will discuss some of the systems and distribution issues involved in building such a system.

# Workshop Overview

Optimization problems with discrete solutions (e.g., combinatorial optimization) are becoming increasingly important in machine learning. The core of statistical machine learning is to infer conclusions from data, and when the variables underlying the data are discrete, both the tasks of inferring the model from data, as well as performing predictions using the estimated model are discrete optimization problems. Two factors complicate matters: first, many discrete problems are in general computationally hard, and second, machine learning applications often demand solving such problems at very large scales.

The focus of this year's workshop lies on structures that enable scalability. Which properties of the problem make it possible to still efficiently obtain exact or decent approximate solutions? What are the challenges posed by parallel and distributed processing? Which discrete problems in machine learning are in need of more scalable algorithms? How can we make discrete algorithms scalable while retaining quality? Some heuristics perform well but as of yet are devoid of a theoretical foundation; what explains such good behavior?

## Call for papers

We would like to encourage high quality submissions of short papers relevant to the workshop topics. Accepted papers will be presented as spotlight talks and posters. Of particular interest are new algorithms with theoretical guarantees, as well as applications of discrete optimization to machine learning problems, especially large scale ones.

Areas of interest include:

- Optimization:
- Combinatorial algorithms
- Submodular / supermodular optimization
- Discrete Convex Analysis
- Pseudo-boolean optimization
- Parallel & distributed discrete optimization
- Continuous relaxations:
- Sparse approximation & compressive sensing
- Regularization techniques
- Structured sparsity models
- Learning in discrete domains:
- Online learning / bandit optimization
- Generalization in discrete learning problems
- Adaptive / stochastic optimization
- Applications:
- Graphical model inference & structure learning
- Clustering
- Feature selection, active learning & experimental design
- Structured prediction
- Novel discrete optimization problems in ML, Computer Vision, Natural Language Processing, Speech processing, Computational Biology.

## Format

Broadly, this one-day workshop aims at exploring the current challenges in discrete optimization in machine learning. It will explore these topics in tutorials and invited talks. In addition, we will have a poster session with spotlight presentations to provide a platform for presenting new contributions.## Organizers

- Andreas Krause, ETH Zurich
- Pradeep Ravikumar,
University of Texas, Austin

- Jeff
A. Bilmes, University of Washington

- Stefanie Jegelka, UC Berkeley

## Other related workshops

The proposed workshop is a continuation of DISCML 2009, 2010 and 2011. This year's workshop will have a particular focus on scalability in discrete optimization. The workshop is also related to the workshop series Optimization in Machine Learning.