agent-browser

Rust ★ 36k updated 3d ago

Browser automation CLI for AI agents

A command-line tool that lets AI agents control a web browser, clicking buttons, filling forms, taking screenshots, through simple text commands.

RustChrome for TestingNode.jsnpmCargosetup: moderatecomplexity 3/5

Agent-browser is a command-line tool that lets AI agents control a web browser programmatically — opening pages, clicking buttons, filling in forms, taking screenshots, and extracting information — all from simple text commands in a terminal. It was built by Vercel Labs specifically to power automated browser tasks inside AI-driven workflows.

The problem it solves is that AI agents often need to interact with the web just like a human would: navigating to a URL, reading the page content, clicking a link, or submitting a form. Most existing tools for browser automation are designed for software testing and can be heavy or slow. Agent-browser is designed to be extremely fast and lightweight, making it well-suited for AI pipelines where the agent issues many browser commands in sequence.

It works by launching a Chrome browser in the background (using Google's official "Chrome for Testing" channel) and exposing a set of clean command-line instructions to control it: things like "click this element", "fill this input field", "take a screenshot", or "get the text of this element". The tool can identify elements by reference IDs from an accessibility tree snapshot — a structured representation of everything visible on the page — which is particularly useful for AI agents that reason about page structure rather than pixel positions. It also supports natural-language commands through a built-in AI chat mode.

You would use this tool when building an AI agent that needs to browse the web, fill out forms, scrape content, or automate repetitive web tasks. The core binary is written in Rust for maximum performance, and it is distributed via npm, Homebrew, or Cargo (Rust's package manager).

Where it fits

Build an AI agent that autonomously fills out web forms and submits them without human intervention.
Automate repetitive web tasks like logging in, navigating pages, and extracting data from multiple websites.
Create a chatbot that can browse the web, read page content, and answer questions about what it finds.
Scrape dynamic web content by having an agent click through pages and capture screenshots or text.

Open on GitHub → Full breakdown on explaingit →