openpi

Python ★ 12k updated 5d ago

Open-source AI models from Physical Intelligence that control robot arms using camera images and plain-language instructions, pre-trained on over 10,000 hours of robot demonstrations.

PythonPyTorchCUDADockeruvsetup: hardcomplexity 5/5

Openpi is a Python repository from Physical Intelligence that publishes open-source AI models for controlling robots. The models in it are called vision-language-action models, which means they take camera images and text instructions as input and produce movement commands as output. The goal is to give a robot arm the ability to perform physical tasks described in plain language, such as folding a towel or unpacking a container.

The repository provides three model variants. The first, pi0, uses a technique called flow matching to generate actions. The second, pi0-FAST, is an autoregressive model that uses a different approach to turn planned actions into discrete tokens. The third, pi0.5, is an updated version of pi0 with improved ability to handle environments it was not specifically trained on. All three come with base checkpoints that were pre-trained on more than 10,000 hours of recorded robot demonstrations.

Beyond the base models, the repository also includes fine-tuned checkpoints for specific robot platforms and tasks, such as performing table-top manipulation on a DROID-platform robot arm or folding towels on an ALOHA robot. These fine-tuned models can be run directly for inference without further training, though the authors note that results will vary depending on how closely your robot setup matches the one used during training.

Running inference requires an NVIDIA GPU with at least 8 GB of memory. Fine-tuning on your own data requires considerably more: at least 22.5 GB for the parameter-efficient LoRA approach, or 70 GB or more for full fine-tuning. The repository has been tested on Ubuntu 22.04. Dependencies are managed with a tool called uv, and Docker instructions are also provided for those who prefer a containerized setup.

The project is framed as an experiment: the models were developed for Physical Intelligence's own robots, and adapting them to other hardware may or may not produce useful results.

Where it fits

Run a pre-trained pi0 or pi0.5 model for inference on a DROID or ALOHA robot arm without additional training.
Fine-tune a base model on your own robot demonstration data using parameter-efficient LoRA to adapt it to new hardware.
Use pi0-FAST's autoregressive action tokenization for tasks where discrete action planning is preferred over flow matching.
Evaluate generalist manipulation performance of pi0.5 across environments the model was not specifically trained on.

Open on GitHub → Full breakdown on explaingit →