LocalClicky

Python ★ 19 updated 3d ago

A macOS menubar app that lets you control your computer with your voice using fully local AI models. Say a wake word, then give commands to open apps, control Spotify, click things on screen, and more, with no internet required.

PythonWhisper.cppOllamaPyAutoGUImacOSsetup: hardcomplexity 3/5

LocalClicky is a Python application for macOS that lets you control your computer with your voice, with everything running locally on your own hardware. No audio, screenshots, or commands are sent to any external server. There are no API keys, no cloud subscriptions, and no internet connection required once the models are downloaded.

The application lives in the macOS menubar with no Dock icon. You activate it by saying "Hey Jarvis," which starts a session. From there you can give commands back-to-back without repeating the wake word. The session ends when you say goodbye or after 25 seconds of silence. A small icon in the menubar shows the current state: idle, listening, recording, thinking, or speaking.

Under the hood, four tools work together. Whisper.cpp handles speech-to-text transcription and runs entirely on your machine. Ollama runs two local AI models: one for understanding commands and deciding what to do (a reasoning model called qwen3), and one for vision tasks (gemma4) that can look at a screenshot of your screen and identify where to click. PyAutoGUI moves the cursor and performs clicks. The macOS built-in text-to-speech command handles spoken responses.

The range of things you can ask it to do is broad: open or quit applications, adjust system volume, control Spotify playback, create reminders using natural language dates, make folders, run shell commands, inject JavaScript into Chrome, and answer general questions. When you ask it to click something on screen, it automatically takes a screenshot, sends it to the vision model to locate the target element, and then clicks the center of whatever it found. You do not need to phrase these requests in any special way.

Setup requires installing Whisper.cpp via Homebrew, pulling the AI models through Ollama, and installing Python dependencies in a virtual environment. You also need to grant macOS permissions for microphone access, screen recording, and accessibility controls. The project is MIT licensed and runs on macOS 12 and later.

Where it fits

Control your Mac hands-free by voice to open apps, adjust volume, and run shell commands.
Click anything on your screen by describing it in plain English and letting the vision model locate and click it for you.
Control Spotify playback and create reminders using natural language, without touching the keyboard.
Build a fully offline voice assistant on your Mac that never sends audio or screenshots to any external server.

Open on GitHub → Full breakdown on explaingit →