by , , ,
Abstract:
Many applications require optimizing an un- known, noisy function that is expensive to evaluate. We formalize this task as a multi- armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the impor- tant open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental de- sign. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covari- ance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GP-UCB compares favorably with other heuristical GP optimization approaches.
Reference:
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design N. Srinivas, A. Krause, S. Kakade, M. SeegerIn Proc. International Conference on Machine Learning (ICML), 2010
Bibtex Entry:
@inproceedings{srinivas10gaussian,
	Author = {Niranjan Srinivas and Andreas Krause and Sham Kakade and Matthias Seeger},
	Booktitle = {Proc. International Conference on Machine Learning (ICML)},
	Title = {Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design},
	Year = {2010}}