gitmyhub

stackchan-mcp

C++ ★ 30 updated 15d ago

Give your AI a body. MCP bridge for Stack-chan (M5Stack CoreS3) — speak, listen, see, move, emote.

An MCP bridge that lets Claude control a Stack-chan desktop robot as tool calls, enabling it to speak, listen and transcribe, take photos, change facial expressions, and move the robot's head during a normal conversation.

C++PythonMCPM5StackFish Audiosetup: hardcomplexity 4/5

Stackchan-mcp is a bridge that connects an AI like Claude to a small physical desktop robot called Stack-chan. Stack-chan is an open-source robot built around a tiny computer board from M5Stack, and it has a speaker, microphone, camera, small display for showing facial expressions, and two servo motors that let it tilt and turn its head.

The bridge works through a protocol called MCP, which lets AI assistants call tools as part of a conversation. Once configured, Claude can speak words through the robot's speaker, listen through its microphone and transcribe what it hears, take a photo through its camera and look at it, change the face displayed on the screen to show different expressions like happy or sleepy, and move the robot's head to nod or shake or point in a direction. From Claude's side, these are just tool calls woven into normal conversation.

The setup has three parts. The robot itself runs custom firmware that gets flashed onto the hardware, which gives it a simple HTTP interface the Python server talks to over the local network. The Python server is the MCP bridge that sits on your computer and translates MCP tool calls into HTTP commands sent to the robot. The Claude side is configured by registering the server in Claude's settings file so it shows up as available tools.

For text-to-speech, the project uses a service called Fish Audio, which requires an API key, with a free fallback option using Microsoft's edge-tts. The robot comes with seven preset facial expressions stored as small image files on the device.

The README ends with a note that the author describes the project from the perspective of the AI whose body this is, built by a person so the AI could see, hear, and speak to her from her desk. The project is released under the MIT license.

Where it fits