OmniDoc-TokenBench

Python ★ 66 updated 1mo ago

OmniDoc-TokenBench is a benchmark dataset and evaluation toolkit designed to test how well AI image compression and reconstruction models handle text-heavy document images. The core problem it addresses is that traditional image quality metrics — like measuring pixel differences or visual similarity — do not capture whether readable text has been preserved accurately after an image is compressed and rebuilt. A document image might look visually fine but have garbled letters that make it unreadable.

The repository contains roughly 3,000 sample images drawn from nine document categories — books, slides, textbooks, exam papers, academic papers, magazines, financial reports, newspapers, and handwritten notes — in both English and Chinese. Each sample is a small 256x256 pixel crop of text from a document.

The key evaluation metric it introduces is NED (Normalized Edit Distance), which works by running optical character recognition on both the original and reconstructed images, then measuring how different the extracted text strings are. This directly catches cases where compression scrambles characters even when the image looks visually acceptable to the human eye.

Researchers would use this repository when building or comparing AI models that compress images into compact representations (called VAEs — variational autoencoders) and need to verify that text documents survive the compression faithfully. The evaluation script accepts any pair of original and reconstructed image folders and outputs scores across all supported metrics. The project is written in Python and was developed by Alibaba Group's Qwen Team.

Open on GitHub → Full breakdown on explaingit →