by L. Chen, A. Krause, A. Karbasi

Abstract:

In many machine learning applications, submodular functions have been used as a model for evaluating the utility or payoff of a set, e.g., news items to recommend, sensors to deploy in a terrain, nodes to influence in a social network, to name a few. At the heart of all these applications is the assumption that the underlying utility/payoff function is known a priori, hence maximizing it is in principle possi ble. In many real life situations, however, the utility function is not fully known in advance and can only be estimated via interactions. For instance, whether a user likes a movie or not can be reliably evaluated only after it was shown to her. Or, the range of influence of a user in a social network can be estimated only after she is selected to advertise the product. We model such problems as an interactive submodular bandit optimization, where in each round we receive a context (e.g., previously selected movies) and have to choose an action (e.g., propose a new movie). We then receive a noisy feedback about the utility of the action (e.g., ratings) which we model as a submodular function over the context-action space. We develop SM-UCB that efficiently trades off exploration (collecting more data) and exploration (proposing a good action given gathered data) and achieves a $O(T^{1/2})$ regret bound after T rounds of interaction. More specifically, given a bounded-RKHS norm kernel over the context-action-payoff space that governs the smoothness of the utility function, SM-UCB keeps an upper-confidence bound on the payoff function that allows it to asymptotically achieve no-regret. Finally, we evaluate our results on four concrete applications, including movie recommenda tion (on the MovieLense data set), news recommendation (on Yahoo! Webscope dataset), interactive influence maximization (on a subset of the Facebook network), and personalized data summarization (on Reuters Corpus). We observe that SM-UCB consistently outperforms the prior art.

Reference:

Interactive Submodular Bandit L. Chen, A. Krause, A. KarbasiIn Neural Information Processing Systems (NeurIPS), 2017

Bibtex Entry:

@inproceedings{chen17interactive)$ regret bound after T rounds of interaction. More specifically, given a bounded-RKHS norm kernel over the context-action-payoff space that governs the smoothness of the utility function, SM-UCB keeps an upper-confidence bound on the payoff function that allows it to asymptotically achieve no-regret. Finally, we evaluate our results on four concrete applications, including movie recommenda tion (on the MovieLense data set), news recommendation (on Yahoo! Webscope dataset), interactive influence maximization (on a subset of the Facebook network), and personalized data summarization (on Reuters Corpus). We observe that SM-UCB consistently outperforms the prior art.},
	author = {Lin Chen and Andreas Krause and Amin Karbasi},
	booktitle = {Neural Information Processing Systems (NeurIPS)},
	month = {December},
	title = {Interactive Submodular Bandit},
	year = {2017}}