by T. Desautels, A. Krause, J. Burdick

Abstract:

Can one parallelize complex exploration–exploitation tradeoffs? As an example, consider the problem of optimal high-throughput experimental design, where we wish to sequentially design batches of experiments in order to simultaneously learn a surrogate function mapping stimulus to response, as well as identifying the maximum of the function. We formalize the task as a multi-armed bandit problem, where the unknown payoff function is sampled from a Gaussian process (GP), and instead of a single arm, in each round we pull a batch of several arms in parallel. We develop GP-BUCB, a principled algorithm for choosing batches, based on the GP-UCB algorithm for sequential GP optimization. We prove that, perhaps surprisingly, the cumulative regret of the parallel algorithm, compared to the sequential approach, only increases by a constant factor independent of the batch size B, for any fixed B. Our results provide rigorous theoretical support for exploiting parallelism in Bayesian global optimization. We demonstrate the effectiveness of our approach on two real-world applications.

Reference:

Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization T. Desautels, A. Krause, J. BurdickIn Proc. International Conference on Machine Learning (ICML), 2012

Bibtex Entry:

@inproceedings{desautels12parallelizing,
	author = {Thomas Desautels and Andreas Krause and Joel Burdick},
	booktitle = {Proc. International Conference on Machine Learning (ICML)},
	title = {Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization},
	year = {2012}}