gitmyhub

Agent-S

Python ★ 12k updated 1mo ago

Agent S: an open agentic framework that uses computers like a human

Agent S is an open-source AI framework that controls a computer by looking at the screen and clicking like a person would, completing real desktop tasks across Windows, Mac, and Linux without needing software APIs.

PythonPyPIsetup: hardcomplexity 4/5

Agent S is an open-source framework that lets an AI model control a computer the same way a person would: by looking at the screen, clicking, typing, and navigating applications. Instead of calling software APIs directly, the agent perceives the graphical interface as a human does and decides which buttons to click or which text to type to complete a given task. This approach makes it capable of working with almost any desktop application, including ones that do not have a programmable interface.

The project has gone through several iterations, named S1, S2, and S3. The S3 version achieved a score of 72.60% on OSWorld, a benchmark that tests how well an AI can complete real computer tasks, which the developers say surpasses the average human score on the same benchmark. It also performs well on WindowsAgentArena and AndroidWorld, meaning it is not limited to one operating system. The framework runs on Linux, macOS, and Windows.

Installation is straightforward for developers: a single pip command installs the core package, and you configure API keys for whichever AI model provider you want to use (OpenAI, Anthropic, Gemini, and others are supported). The agent also requires a separate visual grounding model, which helps it identify the exact location of buttons and interface elements on screen. The recommended combination at the time of writing is GPT-5 paired with a model called UI-TARS for grounding.

Because the agent runs Python code to control your computer and can click and type in any application, the README explicitly warns users to run it with care. It is designed for a single-monitor setup. A hosted cloud version is available for people who do not want to manage the setup themselves.

The research behind Agent S was accepted at ICLR 2025 and won a best paper award at a workshop there. The framework is also distributed as a Python package called gui-agents, installable from PyPI.

Where it fits