gitmyhub

smolvlm-realtime-webcam

HTML ★ 5.6k updated 1y ago

Real-time webcam demo with SmolVLM and llama.cpp server

This repository is a small demo showing how to use a compact AI vision model to analyze your webcam feed in real time, directly on your own computer. The AI model involved is called SmolVLM 500M, a relatively small vision-language model that can look at images and describe or respond to questions about what it sees.

The setup requires installing a program called llama.cpp, which is a tool for running AI language and vision models locally without needing a cloud service. Once that is running as a local server, you open a single HTML file in your browser, which connects to the server and starts sending webcam frames to the model. The model then responds to a customizable instruction, such as describing objects in the frame or returning structured data.

The demo is intentionally minimal: the entire interface is one HTML file, and the instructions to get it running are four steps. GPU acceleration is optional but supported on Nvidia, AMD, and Intel graphics cards by adding a flag when starting the server. The author also notes you can swap in other compatible multimodal models listed in the llama.cpp documentation if you want to experiment beyond the default.

The README is sparse and describes this as a simple proof-of-concept rather than a production application. It is most useful for developers or curious users who want to see local AI vision running in a browser with minimal setup.