Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

Anonymous authors

teaser


Abstract

Driven by inherent uncertainty and the sim-to-real gap, robust reinforcement learning (RL) seeks to improve resilience against the complexity and variability in agent-environment sequential interactions. Despite the existence of a large number of RL benchmarks, there is a lack of standardized benchmarks for robust RL. Current robust RL policies often focus on a specific type of uncertainty and are evaluated in distinct, one-off environments. In this work, we introduce Robust-Gymnasium, a unified modular benchmark designed for robust RL that supports a wide variety of disruptions across all key RL components—agents' observed state and reward, agents' actions, and the environment. It offers over sixty diverse task environments spanning control and robotics, safe RL, and multi-agent RL. Robust-Gymnasium provides an open-source and user-friendly tool for the community to assess current methods and foster the development of robust RL algorithms. In addition, we benchmark existing standard and robust RL algorithms in Robust-Gymnasium, uncovering significant deficiencies in each and offering new insights.


Benchmark Features

This benchmark aims to advance robust reinforcement learning (RL) for real-world applications and domain adaptation. The benchmark provides a comprehensive set of tasks that cover various robustness requirements in the face of uncertainty on state, action, reward, and environmental dynamics, and spans diverse applications including control, robot manipulations, dexterous hand, and more. (This repository is under active development. We appreciate any constructive comments and suggestions).


🔥 Benchmark Features:


🔥 Benchmark Tasks:

Each of these robust tasks incorporates elements such as robust observations, actions, reward signals, and dynamics to evaluate the robustness of RL algorithms.


🔥 Our Vision:

We hope this benchmark serves as a useful platform for pushing the boundaries of RL in real-world problems --- promoting robustness and domain adaptation ability!



Robust RL Framework

Robust RL problems typically consists of three modules:


teaser
teaser


Opportunities for Future Research!

By leveraging this benchmark, we can evaluate the robustness of RL algorithms and develop new ones that perform reliably under real-world uncertainties and adversarial conditions. This involves creating agents that maintain their performance despite distributional shifts, noisy data, and unforeseen perturbations. Therefore, there are vast opportunities for future research with this benchmark, such as:

  1. Integrating techniques from robust optimization, adversarial training, LLMs, and safe learning to enhance generalization and adaptability.
  2. Improving sample efficiency and safety in robust settings, which is crucial for deploying these systems in high-stakes applications like healthcare, finance, and autonomous vehicles.

In conclusion, by using this benchmark, we can test and refine the robustness of RL algorithms before deploying them in diverse, real-world scenarios.