LLM PDF Translator

New Feature Introduction

This repository is inspired by ppisljar/pdf-translator and adds the following features:

[GUI] Add support for download-save-translation process on the server (better for mobile devices)
Support of Ollama and QWEN for translation (by api)
Support multi threading for translation
Make a better support of inplace translation of different languages (e.g. support Chinese translation)
Support batch translation of PDF files without api calls (provides an example later)
Use single process for ocr / layout model to save vram
Use LLM for reference checking (fix the bug of supplemental translation)

1.Introduction

1.1 About the GUI

This repository offers an WebUI and API endpoint that translates PDF files using openai GPT, preserving the original layout.

Option 1: Upload PDF file and download translated PDF file

Option 2: Download the PDF file by URL and save/download translated PDF file

Model Setting Page: change the setting in gui

OCR Models Page: download ocr models

Features

translate PDF files while preserving layout
translation engines:
- ollama (just added, works fine for translation)
- openAI (best)
- QWEN
- google translate
layout recognition engines:
- UniLM DiT
OCR engines:
- PaddleOCR
Render engine:
- ReportLab

2.Installation

2.0 Prerequisites

Clone this repository

   git clone https://github.com/poppanda/LLM_PDF_Translator.git
   cd LLM_PDF_Translator

2.1 Local installation

prerequesites:

Basically ffmpeg, propper-utils and the font you want to use
Check if you wants to use uv or venv for individual installation

Install the dependencies

(Recommand)If you are using uv, run the following command

uv sync
uv pip install --no-build-isolation git+https://github.com/facebookresearch/detectron2.git

Otherwise, run the following command

pip install --no-build-isolation -r requirements.txt

2.2 docker installation

Build the docker image via Makefile

make docker-build

3. Run the server

3.1 Download the model

make get_models

3.2 Check the config.yaml and the fonts

Edit config.yaml

The type could be ollama, openai, qwen, or google.
The api_key is the corresponding API key.
The model is the specific model name you want to use.

Check the font you want to use in the 'render' part of config.yaml For a basic demo, you can download the font by:

make install-cn-font

3.3 Local run

If you are using uv:

uv run server.py

Otherwise

python3 server.py

3.4 Docker run

Run the docker container via Makefile

make docker-run

GUI Usage

Access to GUI via browser.

http://localhost:8765

Requirements

NVIDIA GPU (currently only support NVIDIA GPU)
Docker

License

This repository does not allow commercial use.

This repository is licensed under CC BY-NC 4.0. See LICENSE for more information.

Some details of the new feature

Feature 1: Use single process for ocr / layout model to save vram

The scene is that if you run a LLM model locally (e.g. ollama), the ocr/layout model will be loaded for nothing while translation, which is a waste of vram (for about 5GB).
This problem is fixed by
- Seperate the ocr / layout process from the translation process.
- Use a single process for ocr/layout model.
- Kill the process before translation.

Feature 2: Use LLM for reference checking

The original code checks the reference by recognizing the 'reference' keyword in the title.
The problem is that:
- There may be some supplemental material after the reference, and by the original code, the supplemental material will be translated as well.
- The 'reference' keyword may not be recognized in some cases.
The problem is fixed by:
- Use LLM to check the reference and skip the translation.

TODOs

Support M1 Mac or CPU
switch to VGT for layout detection
add font detection (family/style/color/size/alignment)
add support for translating lists
add support for translating tables
add support for translating text within images

Batch Translation Example

# batch.py
import warnings
warnings.filterwarnings("ignore")
import server
import os
from pathlib import Path
import tempfile
from loguru import logger

pdf_dir = "" # path to the directory with pdf files

if __name__ == "__main__":
    translator = server.TranslateApi()
    files = list(os.scandir(pdf_dir))
    for file in files:
        if file.is_dir():
            files.extend(list(os.scandir(file.path)))
        elif file.is_file() and file.name.endswith(".pdf"):
            if file.name.endswith("_translated.pdf") or os.path.exists(file.path.replace(".pdf", "_translated.pdf")):
                logger.info(f"Skip {file.path}")
                continue
            logger.info(f"Translating {file.path}")
            response = translator._translate_pdf(
               file.path, 
               translator.temp_dir_name, 
               "English", 
               "Chinese", 
               translate_all=True, 
               p_from=0, 
               p_to=0, 
               side_by_side=True, 
               output_file_path=file.path.replace(".pdf", "_translated.pdf"))

References

For PDF layout analysis, using DiT.
For PDF to text conversion, using PaddlePaddle model.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
cli_operation		cli_operation
fonts		fonts
models		models
modules		modules
pdf_operation		pdf_operation
static		static
temp		temp
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
batch.py		batch.py
cli.py		cli.py
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM PDF Translator

New Feature Introduction

1.Introduction

1.1 About the GUI

This repository offers an WebUI and API endpoint that translates PDF files using openai GPT, preserving the original layout.

Option 1: Upload PDF file and download translated PDF file

Option 2: Download the PDF file by URL and save/download translated PDF file

Model Setting Page: change the setting in gui

OCR Models Page: download ocr models

Features

2.Installation

2.0 Prerequisites

2.1 Local installation

2.2 docker installation

3. Run the server

3.1 Download the model

3.2 Check the config.yaml and the fonts

3.3 Local run

3.4 Docker run

GUI Usage

Requirements

License

Some details of the new feature

Feature 1: Use single process for ocr / layout model to save vram

Feature 2: Use LLM for reference checking

TODOs

Batch Translation Example

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

poppanda/LLM_PDF_Translator

Folders and files

Latest commit

History

Repository files navigation

LLM PDF Translator

New Feature Introduction

1.Introduction

1.1 About the GUI

This repository offers an WebUI and API endpoint that translates PDF files using openai GPT, preserving the original layout.

Option 1: Upload PDF file and download translated PDF file

Option 2: Download the PDF file by URL and save/download translated PDF file

Model Setting Page: change the setting in gui

OCR Models Page: download ocr models

Features

2.Installation

2.0 Prerequisites

2.1 Local installation

2.2 docker installation

3. Run the server

3.1 Download the model

3.2 Check the config.yaml and the fonts

3.3 Local run

3.4 Docker run

GUI Usage

Requirements

License

Some details of the new feature

Feature 1: Use single process for ocr / layout model to save vram

Feature 2: Use LLM for reference checking

TODOs

Batch Translation Example

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages