molmoact2
Official Repository for MolmoAct2
MolmoAct2 is an open family of AI models from the Allen Institute for AI (Ai2) designed to control robots in the real world. Think of it as an AI brain that can look at what a robot sees through its camera, reason about what actions to take, and then command the robot's arms to move. It combines a vision-and-language understanding model with a specialized action-generation system called a flow-matching continuous action expert, which together allow a robot to follow language instructions like "pick up the cup" and actually carry them out.
The repository provides several levels of ready-to-use models. Base checkpoints are starting points for researchers who want to train a robot for a specific task. Fine-tuned checkpoints are already specialized for particular robot platforms — including Franka arms, SO-100 and SO-101 arms, and bimanual YAM robots — and can be deployed more directly. There is also a "Think" variant that reasons using depth information before deciding on actions.
MolmoAct2 integrates with LeRobot, a widely-used robotics training framework, so users can train, evaluate, and deploy the models using standard tools and datasets. The release also includes the datasets used to build these models, covering bimanual manipulation, DROID-style robot data, and embodied reasoning tasks. All datasets are in LeRobot v3.0 format with added language annotations.