gitmyhub

UI-TARS-desktop

TypeScript ★ 37k updated 3d ago

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

UI-TARS Desktop provides two AI agent tools that let an AI model look at your screen and control browsers or desktop apps by clicking and typing, so you can automate multi-step computer tasks using plain-language instructions.

TypeScriptElectronNode.jsMCPsetup: hardcomplexity 4/5

UI-TARS Desktop is an open-source stack of two related AI agent projects that let an AI model observe and interact with graphical interfaces — web browsers, desktop applications, and terminals — the same way a human user would, by looking at the screen and clicking or typing.

The first project, Agent TARS, is a general-purpose multimodal AI agent (multimodal meaning it can process both text and visual information). It can be controlled through a command-line tool or a web-based interface, and it connects to external tools via MCP (a protocol for giving AI agents access to real-world capabilities). You can give it a natural-language instruction like "book the earliest flight from X to Y on this website" and it will navigate a browser to complete the task.

The second project, UI-TARS Desktop, is a desktop application built on a specific AI model called UI-TARS. It provides local or remote operators for computers and browsers, meaning it can control either the machine it runs on or a remote machine.

Both projects are written in TypeScript and target developers and researchers building or experimenting with GUI automation agents — software that automates tasks by operating graphical interfaces rather than APIs. Someone would use this when they want an AI to perform multi-step computer tasks on their behalf, or when they are building agent-based automation tooling.

Where it fits