UI-TARS-desktop
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
UI-TARS Desktop provides two AI agent tools that let an AI model look at your screen and control browsers or desktop apps by clicking and typing, so you can automate multi-step computer tasks using plain-language instructions.
UI-TARS Desktop is an open-source stack of two related AI agent projects that let an AI model observe and interact with graphical interfaces — web browsers, desktop applications, and terminals — the same way a human user would, by looking at the screen and clicking or typing.
The first project, Agent TARS, is a general-purpose multimodal AI agent (multimodal meaning it can process both text and visual information). It can be controlled through a command-line tool or a web-based interface, and it connects to external tools via MCP (a protocol for giving AI agents access to real-world capabilities). You can give it a natural-language instruction like "book the earliest flight from X to Y on this website" and it will navigate a browser to complete the task.
The second project, UI-TARS Desktop, is a desktop application built on a specific AI model called UI-TARS. It provides local or remote operators for computers and browsers, meaning it can control either the machine it runs on or a remote machine.
Both projects are written in TypeScript and target developers and researchers building or experimenting with GUI automation agents — software that automates tasks by operating graphical interfaces rather than APIs. Someone would use this when they want an AI to perform multi-step computer tasks on their behalf, or when they are building agent-based automation tooling.
Where it fits
- Automate a multi-step browser workflow by giving the Agent TARS a plain-language task description instead of writing code
- Build a GUI automation agent that fills out web forms, navigates pages, and extracts information without needing a site API
- Control a local or remote desktop programmatically through an AI model interface
- Prototype AI agent workflows that interact with existing desktop software that has no automation API