Figure 1: We propose an algorithm that learns a policy for continuous control tasks even when a worst-case adversary is present.
This is a post for the work Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning (Curi et al., 2021), jointly with Ilija Bogunovic and Andreas Krause, that appeared in ICML 2021. It is a follow-up on Efficient Model-Based reinforcement Learning Through Optimistic Policy Search and Planning (Curi et al., 2020), that appeared in NeuRIPS 2020 (See blog post)
In this work, we address the challenge of finding a robust policy in continuous control tasks. As a motivating example, consider designing a braking system on an autonomous car. As this is a highly complex task, we want to learn a policy that performs this maneuver. One can imagine many real-world conditions and simulate them during training time e.g., road conditions, brightness, tire pressure, laden weight, or actuator wear.… Read more