LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A Python tool that takes a photo or screenshot of a mathematical equation and converts it into the LaTeX source code needed to typeset that formula, saving the tedious work of transcribing equations by hand.
LaTeX-OCR (also called pix2tex) is a Python tool that looks at a picture of a mathematical formula and gives you back the LaTeX source code that would render it. LaTeX is the typesetting language scientists and mathematicians use to write equations cleanly; converting an equation back into that code by hand is tedious, so this project automates it with a machine-learning model.
Under the hood it is an image-to-text neural network: a Vision Transformer (ViT) encoder with a ResNet backbone reads the image, and a Transformer decoder writes out the LaTeX token by token. A small extra network first predicts the best resolution to resize the image to, because the main model works better on smaller crops that match its training data. There are several ways to use it once installed via pip: a command-line tool called pix2tex that can read images from disk or your clipboard, a desktop GUI called latexocr that lets you screenshot part of your screen and have the predicted LaTeX rendered with MathJax and copied to your clipboard, a Streamlit web demo plus an HTTP API (also available as a Docker image), and a Python import for use inside your own code. Training your own model is also supported, with scripts for building datasets from paired equation images and LaTeX source, using arXiv and the im2latex-100k dataset.
You would reach for this if you are taking notes from a textbook or paper, copying formulas out of slides, or building a tool that ingests scanned scientific documents. Results are not perfect, so the author recommends always double-checking the output. The full README is longer than what was provided.
Where it fits
- Screenshot a math formula from a paper or textbook and get LaTeX code ready to paste into your document
- Batch-convert a folder of equation images into LaTeX strings for a document digitization pipeline
- Add equation recognition to a note-taking app via the HTTP API or Docker image
- Build a tool that ingests scanned scientific papers and extracts editable mathematical expressions