gitmyhub

Open-AutoGLM

Python ★ 26k updated 3mo ago

An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone

Open-source AI phone agent that automates Android and iOS tasks by understanding screenshots and executing taps, swipes, and text input based on plain-language instructions.

PythonAutoGLM-Phone-9BvLLMSGLangADBHDCWebDriverAgentsetup: hardcomplexity 4/5

Open-AutoGLM is an open-source AI phone agent framework built on the AutoGLM model. The problem it solves is that most smartphone tasks still require you to tap through menus yourself. This project lets you describe a task in plain language — such as "open Meituan and search for nearby hotpot restaurants" — and the AI automatically figures out what to do on your phone's screen and does it for you.

The system works by connecting to your Android or HarmonyOS phone via ADB (Android Debug Bridge, a standard developer tool for communicating with Android devices) or HDC (the equivalent for Huawei HarmonyOS). It takes screenshots of your screen, uses a vision-language model to understand what is shown, plans a sequence of actions, and then executes taps, swipes, and text input on your behalf. It includes a confirmation step for sensitive actions and supports remote control over Wi-Fi. The AI model (AutoGLM-Phone-9B) can be run via third-party API services or self-hosted using vLLM or SGLang inference frameworks.

You would use this for automating repetitive phone tasks, app testing, or research into AI phone agents. The framework supports iOS as well as Android, though iOS setup requires additional configuration via WebDriverAgent. It is written in Python and intended for research and educational use.

Where it fits