stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Stable Baselines3 is a Python library that gives you reliable, ready-to-run reinforcement learning algorithms built on PyTorch, so you can train AI agents to make decisions by trial and error without coding the algorithms from scratch.
Stable Baselines3 (SB3) is a Python library that provides clean, tested implementations of reinforcement learning algorithms. Reinforcement learning is a branch of machine learning where a software agent learns by trial and error: it takes actions in an environment, receives a score based on how well it did, and gradually learns to make better decisions. SB3 is built on PyTorch and is intended for researchers and practitioners who want reliable starting points for their own experiments.
The library is developed by the German Aerospace Center (DLR) Robotics and Mechatronics Center. It is the third generation of the Stable Baselines project. The goal is to make it easier to reproduce published research results and to give people a solid foundation to build new ideas on top of, rather than reimplementing the same algorithms from scratch each time.
SB3 provides a consistent interface across all its algorithms, following a style similar to the scikit-learn machine learning library that many Python developers already know. You create a model, call learn() to train it, and then use predict() to run it. Training progress can be tracked with Tensorboard. The library supports custom environments, custom policies, and custom callbacks, and works in Jupyter notebooks.
The README notes that SB3 itself is now in a stable maintenance phase, focused on bug fixes. Newer experimental algorithms are released in a companion package called SB3 Contrib. A JAX-based variant called SBX offers much faster training at the cost of fewer features. A training framework called RL Baselines3 Zoo adds hyperparameter tuning, pre-trained agents, and experiment management on top of SB3.
The library requires Python 3.10 or newer and PyTorch 2.3 or newer. It can be installed with pip. Integration with Weights and Biases for experiment tracking and Hugging Face for sharing trained models is also available.
Where it fits
- Train a game-playing agent using a standard reinforcement learning algorithm like PPO or SAC in under 20 lines of code
- Reproduce published reinforcement learning research results using well-tested algorithm implementations
- Build a custom robotic control policy by training an agent in a simulation environment and evaluating it with TensorBoard
- Use a pre-trained SB3 model from Hugging Face as a starting point for a new robotics or game AI project