gitmyhub

windows-computer-use

JavaScript ★ 25 updated 23d ago

让 Agent 操控 Windows 桌面软件

A plugin that lets AI agents (like Codex) control Windows desktop apps by seeing the screen, clicking buttons, typing text, and more, no custom script needed per app.

Node.jsPowerShellMCPWindows Accessibilitysetup: easycomplexity 3/5

Windows Computer Use is a plugin that lets AI agents control Windows desktop applications. It works by exposing a server that understands a standard called MCP, which is a way for AI tools to call external capabilities. Once installed, agents like Codex can read what is on screen, click buttons, type text, scroll, drag, and interact with any Windows GUI program, including older legacy apps that have no modern API.

The main use case is when an agent needs to automate a task that can only be done through a graphical interface, such as filling out a form in a settings dialog, running an installer, or working with a WinForms or WPF application. Rather than writing a custom script for each program, the agent can observe the screen and interact with it the way a human would.

The plugin exposes a collection of tools through its MCP server. Observation tools let the agent take screenshots, list open windows, read the accessibility tree (which describes every button and text field on screen), and find specific elements. Action tools let it move and click the mouse, double-click, drag, scroll, type text, and send keyboard shortcuts. There are also structured automation actions for focusing elements, invoking controls, and setting values directly through the Windows accessibility layer.

Installation can be done three ways. The simplest is to paste a prompt into an agent and let it install the plugin itself. You can also clone the repository and register it as a Codex plugin with two commands. The third option works with any MCP-compatible agent client: you point it at the server script with an absolute file path and it runs as a local process. No npm install step is required since the server has no external dependencies beyond Node.js and Windows PowerShell.

Where it fits