gitmyhub

mac-ocr

Swift ★ 401 updated 2d ago

macOS CLI for OCR and searchable PDFs using Apple's Vision framework

mac-ocr is a command-line tool for macOS that reads text from images and PDF files. It runs entirely on your computer using Apple's built-in Vision framework, so nothing is sent to an external server. You point it at an image or PDF and it prints the recognized text to your terminal, or saves it to a file.

The tool installs via npm, which is the same package manager used for JavaScript projects. You run it as a command in your terminal: pass an image file, a batch of images, or a PDF, and you get the extracted text back. Output can be plain text for simple cases, or JSON if you need details like bounding box coordinates and confidence scores for each word. For PDFs, the tool processes pages and can stream results page by page so you see output from large documents without waiting for everything to finish.

A second mode, called searchable-pdf, takes an image or a scanned PDF and produces a new PDF file that looks identical to the original but with a hidden text layer added on top. This means you can open the output in any PDF viewer and select, copy, and search the text inside it. Pages that already have selectable text are skipped by default.

The tool also ships a Node.js API for developers who want to call it from code rather than the terminal. You pass file bytes to the ocr function and get back an object with the text and individual word observations. A streaming variant handles multi-page PDFs one page at a time.

Supported options include recognition speed versus accuracy trade-offs, custom vocabulary words, a minimum confidence threshold, multiple recognition languages at once, and a region-of-interest flag to restrict recognition to a specific area of the image.

The README notes one practical use case beyond normal document work: AI coding agents can run this tool locally to extract text from documents instead of sending images to a vision API, which saves on per-token costs. A bundled skill file lets compatible agents discover and use the tool automatically. The project requires macOS 10.15 or later.