page-agent
JavaScript in-page GUI agent. Control web interfaces with natural language.
JavaScript library that drives any web page from natural language instructions in the browser, using your own LLM API key and the page's DOM as text.
Page Agent is a JavaScript library that lets you control any web page using natural language instructions. Instead of clicking buttons and filling forms manually, you describe what you want in plain text — like "Click the login button" or "Fill in the shipping address" — and the library figures out which elements on the page need to interact with and performs the actions for you.
The key distinction compared to other browser automation tools is that Page Agent runs directly inside the web page as ordinary JavaScript, not as a separate browser extension, a Python script, or a headless browser (a browser run programmatically without a visible window). It works by reading the page's structure as text rather than taking screenshots, which means it does not need a multimodal AI model that can interpret images. You bring your own AI model by providing an API key, and the library handles the interaction logic.
The README describes several use cases: adding an AI copilot to a software product so users can navigate it with voice or text commands, automating repetitive multi-step workflows in enterprise tools like ERP or CRM systems, and improving accessibility by letting users control interfaces through natural language. An optional Chrome extension extends the capability across multiple browser tabs, and an MCP server (a protocol for connecting AI tools) lets external agents control the browser.
You install it via npm or include it as a script tag on your page. It is written in TypeScript and released under the MIT license. It is designed for client-side web enhancement in applications you own, not for automated scraping of third-party sites.
Where it fits
- Add a natural-language copilot to your SaaS product so users can drive the UI by typing or speaking.
- Automate repetitive multi-step workflows inside an internal ERP or CRM web app.
- Improve accessibility by letting users control a complex web UI with plain English.
- Expose your web app to external AI agents through the bundled MCP server.