gitmyhub

GenericAgent

Python ★ 13k updated 1d ago

Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption

A lightweight Python framework where an AI model controls your computer, browser, terminal, files, mouse and keyboard, and builds a personal skill library from each completed task so future similar tasks run faster.

PythonStreamlitsetup: moderatecomplexity 4/5

GenericAgent is a Python framework that lets a large language model (an AI system like Claude or Gemini) control a real computer on your behalf. It can open and interact with a browser, run terminal commands, manage files, move the mouse and keyboard, read the screen, and even control an Android phone via USB. You describe a task in plain language, and the agent figures out the steps, executes them, and reports back.

The framework's central design idea is that it learns from experience. When the agent successfully completes a task for the first time, it automatically saves the approach as a reusable skill. The next time you ask for something similar, it recalls that skill directly rather than working it out from scratch. Over time this builds a personal skill library unique to your setup, which the README describes as a growing skill tree.

The codebase itself is deliberately small, around 3,000 lines of core code. The agent loop that drives behavior is roughly 100 lines. The authors claim this minimal footprint lets the agent run within a context window far smaller than competing frameworks, which reduces cost and keeps the AI's attention focused on relevant information.

Several interface options are included: a desktop GUI, a terminal interface, a Streamlit web app, a Telegram bot, and a WeChat bot. You connect it to whichever AI model you already have API access to, configure your key, and launch.

The README notes the entire repository, including its git history and commit messages, was created autonomously by the agent itself with no manual terminal use by the author. The project has a published technical report on arXiv. It is released publicly with an open-source license.

The full README is longer than what was shown.

Where it fits