by , , ,
Abstract:
Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.
Reference:
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback M. Jourdan, M. Mutný, J. Kirschner, A. KrauseIn Proc. International Conference on Algorithmic Learning Theory (ALT), 2021
Bibtex Entry:
@inproceedings{jourdan21efficient,
	author = {Jourdan, Marc and Mutn\'y, Mojm\'ir and Kirschner, Johannes and Krause, Andreas},
	booktitle = {Proc. International Conference on Algorithmic Learning Theory (ALT)},
	month = {March},
	title = {Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback},
	year = {2021}}