by , , , , ,
Abstract:
Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for complex systems while giving safety and optimality guarantees. Our experiments on a robot arm that would be prohibitive for GoSafe demonstrate that GoSafeOpt safely finds remarkably better policies than competing safe learning methods for high-dimensional domains.
Reference:
Scalable Safe Exploration for Global Optimization of Dynamical Systems B. Sukhija, M. Turchetta, D. Lindner, A. Krause, S. Trimpe, D. BaumannIn arXiv preprint arXiv:2201.09562, 2022
Bibtex Entry:
@inproceedings{sukhija2022scalable,
	Author = {Bhavya Sukhija and Matteo Turchetta and David Lindner and Andreas Krause and Sebastian Trimpe and Dominik Baumann},
	Booktitle = {arXiv preprint arXiv:2201.09562},
	Month = {January},
	Title = {Scalable Safe Exploration for Global Optimization of Dynamical Systems},
	Year = {2022}}