BabelDOC
Yet Another Document Translator
Python tool that translates PDF documents into bilingual output with the original and translated text side by side, focused on academic papers and scientific content.
BabelDOC is a Python library and command-line tool that translates PDF documents and produces bilingual output, meaning you get both the original and the translated text together in one file for side-by-side comparison. It focuses on scientific papers and academic documents, where preserving the layout and correctly handling formulas and technical notation matters.
The primary translation direction is English to Chinese, though basic support for other language combinations is included. Translation requires access to a large language model API, such as OpenAI's GPT models or a compatible service. You supply the API key when running the tool, and BabelDOC handles parsing the PDF, sending the text to the model, and reassembling the output.
You can use BabelDOC in three ways. A hosted online service at Immersive Translate provides a free quota of 1000 pages per month for straightforward use without any setup. A self-hosted option called PDFMathTranslate-next bundles BabelDOC with a web interface and supports a wider range of translation services. The command-line interface and Python API let you embed the library directly into your own programs or scripts.
The command-line tool accepts one or more PDF files, a source language, and a target language. Options let you restrict translation to specific pages, control how the bilingual output is arranged (original and translated side by side, or on alternating pages), and manage watermarks on the output. The tool also integrates with Zotero, a popular academic reference manager, through third-party plugins. The README notes the CLI is primarily for debugging and that most end users are better served by the hosted service or the self-hosted web interface.
Where it fits
- Translate an English scientific paper to Chinese while preserving layout, formulas, and technical notation, producing a bilingual PDF.
- Process multiple PDF files in batch from the command line, restricting translation to specific page ranges.
- Embed BabelDOC into a Python script or pipeline to automate PDF translation for a research workflow.