Direction and Extrapolation in Active Learning

Yarden As Active Learning, Bayesian Optimization, Experimental Design, Safe Exploration

Learning as an Iterative Process

One key challenge that resonates in numerous learning tasks, is the process of deciding what to learn next. For example, imagine you are trying to learn a new hobby. One way to learn your hobby quickly, would be to keep track of a reading list of books that will allow you to learn from experts. As you keep reading these books from your reading list, you’ll need to decide which of them is worth investing time to cover all the learning material. For example, if your goal is learning chess, there’s no need to repeat a beginner’s book twice, when you can already progress to learning new opening tactics.

How can you intelligently choose the next book? Active learning algorithms can methodologically guide you in your learning endeavor.

Let’s take our chess learning problem and model it as an Active Learning problem. Suppose that each book can be represented as a set of features that uniquely identify it.… Read more

Learning Constraints to Represent Human Preferences

David Lindner Active Learning, Preference Learning, Reinforcement Learning

Human preferences often naturally decompose into rewards and constraints. For example, imagine you have an autonomous car and tell it to drive you to the grocery store. This task has a goal that is naturally described by a reward function, such as the distance to the grocery store. However, other implicit parts of the task, such as following driving rules or making sure the drive is comfortable, can be naturally described as constraints.

A key challenge for deploying AI systems in the real world is to ensure that they act “in accordance with their users’ intentions’’ – that they do what we want them to do. The most common way of communicating to an AI system what we want it to do is through designing a reward (or cost) function. However, in practice, it is challenging to specify good reward functions, and misspecified reward functions can lead to all kinds of undesired behavior.… Read more

Towards Bayesian Causal Structure Learning in Real-World Applications

Lars Lorch Active Learning

Figure 1: A workflow for a Bayesian approach to causal structure discovery. In two recent works from our lab, we study the inference and experiment design components of the pipeline.

Causal Inference and Machine Learning

Causality and machine learning have been studied largely independently, but recently there has been significant excitement in the intersection of both fields. One hope is that by inferring causal rather than statistical dependencies, we might be able to design systems that can perform robustly outside of the training environment.

Causal discovery refers to the task of inferring the explicit causal relations among a set of random variables. Currently, many causal discovery algorithms are designed under strong assumptions that may not hold in the complex domains causal discovery is most frequently deployed in e.g., economics, health care, biology. These limitations range from assumptions about the underlying causal system to assumptions about the data collection pipeline.

A Real-World Application: Gene Regulatory Networks in Systems Biology

Let’s look at the task of causal discovery by way of the following example: gene regulatory networks (GRNs).… Read more

Sample efficient reward learning for reinforcement learning

David Lindner Active Learning, Preference Learning, Reinforcement Learning

Figure 1: The robot needs to learn the user’s food preferences to decide what to collect. We propose a method that can significantly reduce the number of queries necessary by focusing on queries that are informative about which policy is optimal.

Recently, reinforcement learning (RL) has shown impressive performance on tasks with a well-specified reward function, such as Atari games. Unfortunately, a reward function is often not available in the real world. Say you want to train an RL agent to drive a car. What is a good reward function for driving? Often researchers hand-craft complicated reward functions for such tasks, but this is cumbersome and prone to error. More generally, misspecified rewards can lead to unintended and unsafe behavior due to specification gaming.

A promising alternative is to learn a model of the reward from human feedback. By, for example, asking humans to compare trajectories and judge which one solves a task better, we can learn a reward function for tasks that are difficult for humans to specify manually.… Read more