by , ,
AI and machine learning methods are increasingly interacting with and seeking information from people, robots, and other learning agents. Consequently, the learning dynamics of these agents creates fundamentally new challenges for existing methods. Motivated by the application of learning to offer personalized deals to users, we highlight these challenges by studying a variant of the framework of “online learning using expert advice with bandit feedback". In our setting, we consider each expert as a learning agent, seeking to more accurately reflect real-world applications. The bandit feedback leads to additional challenges in this setting: at time t, only the expert i^t that has been selected by the central algorithm (forecaster) receives feedback from the environment and gets to learn at this time. A natural question to ask is whether it is possible to be competitive with the best expert j* had it seen all the feedback, i.e., competitive with the policy of always selecting expert j*. We prove the following hardness result—without any coordination between the forecaster and the experts, it is impossible to design a forecaster achieving no-regret guarantees. We then consider a practical assumption allowing the forecaster to guide the learning process of the experts by blocking some of the feedback observed by them from the environment, i.e., restricting the selected expert i^t to learn at time t for some time steps. With this additional coordination power, we design our forecaster LIL that achieves no-regret guarantees, and we provide regret bounds dependent on the learning dynamics of the best expert j*.
Learning to Interact with Learning Agents A. Singla, H. Hassani, A. KrauseIn Proc. Conference on Artificial Intelligence (AAAI), 2018
Bibtex Entry:
	Author = {Adish Singla and Hamed Hassani and Andreas Krause},
	Booktitle = {Proc. Conference on Artificial Intelligence (AAAI)},
	Title = {Learning to Interact with Learning Agents},
	Month = {Feburary},
	Year = {2018}}