openbrowser

TypeScript ★ 9.5k updated 2mo ago

Let AI agents browse the web. An autonomous toolkit for browser-based AI agents.

Open Browser is a TypeScript framework that gives an AI model control of a real web browser, describe what you want in plain English and the agent clicks buttons, fills forms, navigates pages, and extracts information automatically.

TypeScriptPlaywrightBunOpenAIAnthropicGoogle AIsetup: moderatecomplexity 3/5

Open Browser is a TypeScript framework that lets you give an AI language model control of a web browser. You describe what you want accomplished in plain text, and the AI agent figures out the steps: clicking buttons, filling in forms, navigating between pages, and pulling out information. The underlying browser control is handled by Playwright, a standard tool for automating web browsers, and the AI reasoning can use models from OpenAI, Anthropic, or Google.

The way it works is straightforward: you provide a task description, and on each step the agent takes a screenshot and reads the page structure, sends that to an AI model to decide what action to take next, then carries out that action in the browser. This cycle repeats until the task is finished or a step limit is reached. The README includes a diagram illustrating this loop.

The project comes with three pieces. The core library handles the agent logic, browser interaction, and AI model integration. A command-line tool lets you run agents or issue individual browser commands (open a URL, click an element, take a screenshot, extract content as markdown) directly from a terminal. A sandboxed execution environment lets you run agents with memory limits, timeouts, domain restrictions, and CPU monitoring, which is useful when running agents in production or in automated pipelines where you need predictable resource usage.

Additional features mentioned in the README include an interactive session where you can type commands into a live browser prompt for testing and debugging, cost tracking so you can see how much each agent run is spending on AI API calls, and session replay recording. Configuration options cover step limits, screenshot frequency, allowed and blocked URLs, proxy settings, and more.

The project is MIT licensed, built with Bun (a JavaScript runtime), and requires your own API keys for whichever AI provider you want to use.

Where it fits

Build an AI agent that fills out web forms, navigates multi-step flows, and extracts page data based on plain-English task descriptions instead of brittle selectors.
Automate repetitive browser research tasks by describing the goal to an AI rather than writing step-by-step Playwright scripts.
Run sandboxed browser agents in production pipelines with memory limits, timeouts, and domain restrictions to keep resource usage predictable.

Open on GitHub → Full breakdown on explaingit →