Fay
fay是一个帮助数字人(2.5d、3d、移动、pc、网页)或大语言模型(openai兼容、deepseek)连通业务系统的agent框架。
A Python framework for building talking AI avatar characters that bridges large language models, speech recognition, and text-to-speech to visual avatar software like Unity, Unreal Engine, or web pages, free for commercial use.
Fay is a Python framework for building AI-powered digital humans: virtual on-screen characters that can speak, listen, and carry on conversations. It acts as a bridge between visual avatar software (2.5D and 3D models, Unity, Unreal Engine, mobile apps, web pages) and the large language models that power the conversation. You set up a character appearance with one system and connect Fay to give it a voice, a personality, and real-time responses.
On the language model side, Fay connects to any OpenAI-compatible API, including DeepSeek and locally hosted models, and lets you swap the model out without changing the rest of the setup. Speech recognition and text-to-speech modules are also interchangeable, so you can use different providers depending on quality or cost. The framework streams responses as they are generated, which means the character can begin speaking before the full reply is ready.
Built-in modes include a virtual teacher, a virtual live streamer, and a news broadcast reader, where the character delivers prepared or generated content automatically on a schedule. There is also an interactive mode with voice and text input, wake-word detection to start conversations, and an agent mode where the character can call external tools on its own to answer questions or complete tasks. A knowledge base and a custom Q&A file let you shape what the character knows and how it responds.
The framework runs on Windows, macOS, and Ubuntu with Python 3.12. A management interface is accessible via a local web page for configuration and control. The project is fully open source, and the README states it is free to use for commercial purposes.
Where it fits
- Build a virtual live streamer character that listens, responds in real time, and syncs lip movements to a Unity or Unreal Engine avatar
- Create a virtual teacher that wakes on a keyword, answers domain-specific questions from a custom knowledge base, and speaks with a swappable text-to-speech engine