📄 Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

PaperCoder is a multi-agent LLM system that transforms paper into a code repository. It follows a three-stage pipeline: planning, analysis, and code generation, each handled by specialized agents.
Our method outperforms strong baselines on both Paper2Code and PaperBench and produces faithful, high-quality implementations.

⚡ Quick Start

Note: The following command runs example paper (Attention Is All You Need).

🔑 API Keys Setup

First, configure your API keys by creating a .env file:

# Copy the example file and edit with your keys
cp .env.example .env

# Edit .env file with your actual API keys:
# OPENAI_API_KEY=sk-proj-your-openai-key
# ANTHROPIC_API_KEY=sk-ant-api03-your-anthropic-key  
# GEMINI_API_KEY=your-gemini-key

Using OpenAI API

💵 Estimated cost for using o3-mini: $0.50–$0.70

pip install openai

export OPENAI_API_KEY="<OPENAI_API_KEY>"

cd scripts
bash run.sh

🔀 LLM Router

The router configuration lives in llm_router/config.yaml.

Task Pattern	Primary Model	Fallback
`chat\|faq\|rag`	`gemini_flash_25`	`claude_sonnet_35`
`code\|unit_tests`	`claude_sonnet_37`	`o4mini`
`long_doc>300k`	`gpt41`	`claude_sonnet_35`
`tool_reasoning`	`o4mini`	`gemini_flash_25`

Override the config by setting LLM_CFG:

export LLM_CFG=/path/to/custom.yaml

Using Open Source Models with vLLM

If you encounter any issues installing vLLM, please refer to the official vLLM repository.
The default model is deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct.

pip install vllm

cd scripts
bash run_llm.sh

Output Folder Structure (Only Important Files)

outputs
├── Transformer
│   ├── analyzing_artifacts
│   ├── coding_artifacts
│   └── planning_artifacts
└── Transformer_repo # Final output repository

📚 Detailed Setup Instructions

🛠️ Environment Setup

💡 To use the o3-mini version, make sure you have the latest openai package installed.
📦 Install only what you need:
- For OpenAI API: openai
- For open-source models: vllm
  - If you encounter any issues installing vLLM, please refer to the official vLLM repository.

pip install openai 
pip install vllm

Or, if you prefer, you can install all dependencies using pip:

pip install -r requirements.txt

📄 (Option) Convert PDF to JSON

The following process describes how to convert a paper PDF into JSON format. If you have access to the LaTeX source and plan to use it with PaperCoder, you may skip this step and proceed to 🚀 Running PaperCoder.

Note: In our experiments, we converted all paper PDFs to JSON format. The original workflow relied on the s2orc-doc2json repository. As of 2025 more capable open-source libraries exist. We provide multiple approaches below.

Option 1: Modern Vision-based Approach (Recommended)

We now provide a modern PDF to JSON converter that uses vision models (Gemini 2.5 Flash) instead of the legacy GROBID approach. This method is:

95% cheaper than traditional approaches
Faster (no Java services required)
More accurate for complex layouts, formulas, and tables

# Install dependencies
pip install pdf2image pytesseract aiohttp tqdm

# With Gemini API (best quality)
export GEMINI_API_KEY="your-api-key"
python codes/pdf_to_json_modern.py -i paper.pdf -o output.json

# Or use the convenience script
cd scripts
./run_modern_pdf2json.sh ../examples/Transformer.pdf

For more details, see Modern PDF to JSON Documentation.

Option 2: Legacy GROBID Approach

If you prefer the traditional method, you can still use the s2orc-doc2json repository:

Clone s2orc-doc2json and run its processing service:

git clone https://github.com/allenai/s2orc-doc2json.git
cd ./s2orc-doc2json/grobid-0.7.3
./gradlew run

Convert the PDF into JSON format using the bundled script:

mkdir -p ./s2orc-doc2json/output_dir/paper_coder
python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py \
    -i ${PDF_PATH} \
    -t ./s2orc-doc2json/temp_dir/ \
    -o ./s2orc-doc2json/output_dir/paper_coder

Hybrid approach (recommended for 2025)

Install modern PDF processing libraries.

pip install PyMuPDF pdfplumber layoutparser

Ensure the latest grobid server (v0.8 or later) is running.
Use the script codes/pdf_to_json_hybrid.py to combine page-level text extraction with metadata from grobid and produce a single JSON file:

python codes/pdf_to_json_hybrid.py \
    --pdf_path ${PDF_PATH} \
    --output_json ./paper_coder_output/paper.json \
    --grobid_url http://localhost:8070

This hybrid pipeline leverages modern layout analysis tools for accurate page content while still using grobid for reliable metadata extraction.

Simple approach (no `grobid`)

Install lightweight dependencies.

pip install PyMuPDF pdf2image pytesseract camelot-py

Run the script codes/pdf_to_json_simple.py:

python codes/pdf_to_json_simple.py \
    --pdf_path ${PDF_PATH} \
    --output_json ./paper_coder_output/paper.json

This method relies solely on PyMuPDF and OCR, optionally using camelot to extract tables.

🚀 Running PaperCoder

Note: The following command runs example paper (Attention Is All You Need).
If you want to run PaperCoder on your own paper, please modify the environment variables accordingly.

Using OpenAI API

💵 Estimated cost for using o3-mini: $0.50–$0.70

# Using the PDF-based JSON format of the paper
export OPENAI_API_KEY="<OPENAI_API_KEY>"

cd scripts
bash run.sh

# Using the LaTeX source of the paper
export OPENAI_API_KEY="<OPENAI_API_KEY>"

cd scripts
bash run_latex.sh

Using Open Source Models with vLLM

The default model is deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct.

# Using the PDF-based JSON format of the paper
cd scripts
bash run_llm.sh

# Using the LaTeX source of the paper
cd scripts
bash run_latex_llm.sh

📦 Paper2Code Benchmark Datasets

Huggingface dataset: paper2code
You can find the description of the Paper2Code benchmark dataset in data/paper2code.
For more details, refer to Section 4.1 "Paper2Code Benchmark" in the paper.

🖼️ Enhanced Pipeline with Image Analysis

We've extended the original Paper2Code pipeline with advanced image analysis capabilities, using o4-mini-2025-04-16 for image processing and o3-2025-04-16 for code generation.

Complete Pipeline Steps

Copy and Setup PDF

# Copy your paper to the working directory
cp /path/to/your/paper.pdf ./custom_paper/paper.pdf

Start GROBID in a separate terminal

cd $HOME/grobid-0.7.3 && ./gradlew run

GROBID is required for extracting structured text from scientific PDFs.

Convert PDF to JSON using GROBID

python s2orc-doc2json/doc2json/grobid2json/process_pdf.py -i "custom_paper/paper.pdf" -t custom_paper/temp_dir/ -o custom_paper/

This transforms the PDF into structured JSON with sections, paragraphs, and references.

Preprocess JSON

python codes/0_pdf_process.py --input_json_path custom_paper/paper.json --output_json_path custom_paper/paper_cleaned.json

Cleans and enhances the JSON for better analysis.

Extract and Analyze Images with o4-mini-2025-04-16

python codes/extract_figures.py --pdf_path custom_paper/paper.pdf --json_path custom_paper/paper_cleaned.json --output_dir custom_paper --gpt_version o4-mini-2025-04-16

This step:

Extracts all images from the PDF
Uses o4-mini-2025-04-16 to create detailed descriptions of each image
Adds these descriptions to the JSON, creating enhanced_paper.json

Planning with o3-2025-04-16

python codes/1_planning.py --paper_name YourPaperName --gpt_version o3-2025-04-16 --pdf_json_path custom_paper/enhanced_paper.json --output_dir outputs/YourPaperName_enhanced

Creates a detailed implementation plan using the enriched JSON with image descriptions.

Configuration Extraction

python codes/1.1_extract_config.py --paper_name YourPaperName --output_dir outputs/YourPaperName_enhanced

Extracts configuration parameters from the plan for use in subsequent steps.

Analysis with o3-2025-04-16

python codes/2_analyzing.py --paper_name YourPaperName --gpt_version o3-2025-04-16 --pdf_json_path custom_paper/enhanced_paper.json --output_dir outputs/YourPaperName_enhanced

Performs detailed analysis of system components, creating logical schemas for each module.

Code Generation with o3-2025-04-16

python codes/3_coding.py --paper_name YourPaperName --gpt_version o3-2025-04-16 --pdf_json_path custom_paper/enhanced_paper.json --output_dir outputs/YourPaperName_enhanced --output_repo_dir outputs/YourPaperName_repo_enhanced

Generates the actual code implementing all system components based on planning and analysis results.

One-Step Execution

For convenience, you can use the enhanced script:

./scripts/run_custom_enhanced.sh

This script runs the entire pipeline with the appropriate configuration.

Key Pipeline Features

1. Two-Stage Processing

o4-mini-2025-04-16 for image analysis
o3-2025-04-16 for planning, analysis, and code generation

2. Cost Optimization via Prompt Caching

Static content (text + image descriptions) is placed at the beginning
Token caching between consecutive API calls
Cost reduction of approximately 50% for cached content

3. Enhanced Image Processing

Automatic extraction of all figures from PDF
Image analysis using o4-mini-2025-04-16
Integration of descriptions into JSON for use by o3-2025-04-16

4. Modular Approach

Logical division into stages: planning, analysis, coding
Saving intermediate results
Ability to restart individual stages

5. Result

Structured implementation of the entire system
Complete reproduction of the paper methodology
Ready-to-use code in output_repo_dir

📊 Model-based Evaluation of Repositories Generated by PaperCoder

We evaluate repository quality using a model-based approach, supporting both reference-based and reference-free settings.
The model critiques key implementation components, assigns severity levels, and generates a 1–5 correctness score averaged over 8 samples using o3-mini-high.
For more details, please refer to Section 4.3.1 (Paper2Code Benchmark) of the paper.
Note: The following examples evaluate the sample repository (Transformer_repo).
Please modify the relevant paths and arguments if you wish to evaluate a different repository.

🛠️ Environment Setup

pip install tiktoken
export OPENAI_API_KEY="<OPENAI_API_KEY>"

📝 Reference-free Evaluation

target_repo_dir is the generated repository.

cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --eval_result_dir ../results \
    --eval_type ref_free \
    --generated_n 8 \
    --papercoder

📝 Reference-based Evaluation

target_repo_dir is the generated repository.
gold_repo_dir should point to the official repository (e.g., author-released code).

cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --gold_repo_dir ../examples/Transformer_gold_repo \
    --eval_result_dir ../results \
    --eval_type ref_based \
    --generated_n 8 \
    --papercoder

📄 Example Output

========================================
🌟 Evaluation Summary 🌟
📄 Paper name: Transformer
🧪 Evaluation type: ref_based
📁 Target repo directory: ../outputs/Transformer_repo
📊 Evaluation result:
        📈 Score: 4.5000
        ✅ Valid: 8/8
========================================
🌟 Usage Summary 🌟
[Evaluation] Transformer - ref_based
🛠️ Model: o3-mini
📥 Input tokens: 44318 (Cost: $0.04874980)
📦 Cached input tokens: 0 (Cost: $0.00000000)
📤 Output tokens: 26310 (Cost: $0.11576400)
💵 Current total cost: $0.16451380
🪙 Accumulated total cost so far: $0.16451380
============================================

🔀 LLM Router

The router configuration lives in llm_router/config.yaml.

Task Pattern	Primary Model	Fallback
`chat\|faq\|rag`	`gemini_flash_25`	`claude_sonnet_35`
`code\|unit_tests`	`claude_sonnet_37`	`o4mini`
`long_doc>300k`	`gpt41`	`claude_sonnet_35`
`tool_reasoning`	`o4mini`	`gemini_flash_25`

Override the config by setting LLM_CFG:

export LLM_CFG=/path/to/custom.yaml

💵 Official AI Model API Pricing (May 2025)

The following prices were collected from official documentation in May 2025. All values are shown per million tokens.

OpenAI Models

o4-mini-2025-04-16: Input $1.10, Output $4.40 – fast, cost‑efficient reasoning with multimodal support.
gpt-4.1-2025-04-14: Input $2.00, Output $8.00 – improved coding and instruction following with a 1M token context window.
o3-2025-04-16: Input $10.00 (cached input $2.50), Output $40.00 – OpenAI's most powerful reasoning model with a 200K token context window.

Google Gemini Models

Gemini 2.5 Flash (preview):
- Input: Text/Image/Video $0.15, Audio $1.00
- Output: Non-thinking mode $0.60, Thinking mode $3.50
- First Flash model with thinking capabilities (preview).
Gemini 2.5 Pro (preview):
- Input ≤ 200k tokens $1.25, > 200k tokens $2.50
- Output ≤ 200k tokens $10.00, > 200k tokens $15.00
- Most advanced Gemini reasoning model with a 1M token context window.

Prices may change as these models move from preview to general availability. Consult the respective provider pages for the latest information.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
assets		assets
codes		codes
config		config
configs		configs
custom_adapt		custom_adapt
data		data
docs		docs
examples		examples
llm_router		llm_router
llm_utility_eval		llm_utility_eval
scripts		scripts
tasks		tasks
tests		tests
webapp		webapp
--env		--env
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
.taskmasterconfig		.taskmasterconfig
=0.21.0		=0.21.0
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MINERU_INTEGRATION.md		MINERU_INTEGRATION.md
README.md		README.md
README_MCP.md		README_MCP.md
UPGRADE_CLAUDE4.md		UPGRADE_CLAUDE4.md
arxiv_plugin_concept.md		arxiv_plugin_concept.md
claude_watcher.log		claude_watcher.log
codex_watcher.log		codex_watcher.log
load_env.py		load_env.py
mcp.json		mcp.json
mineru_analysis.md		mineru_analysis.md
new_data2pper.txt		new_data2pper.txt
paper2code_strategic_analysis.md		paper2code_strategic_analysis.md
requirements.txt		requirements.txt

License

Malaeu/Paper2Code

Folders and files

Latest commit

History

Repository files navigation

📄 Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

🗺️ Table of Contents

⚡ Quick Start

🔑 API Keys Setup

Using OpenAI API

🔀 LLM Router

Using Open Source Models with vLLM

Output Folder Structure (Only Important Files)

📚 Detailed Setup Instructions

🛠️ Environment Setup

📄 (Option) Convert PDF to JSON

Option 1: Modern Vision-based Approach (Recommended)

Option 2: Legacy GROBID Approach

Hybrid approach (recommended for 2025)

Simple approach (no grobid)

🚀 Running PaperCoder

Using OpenAI API

Using Open Source Models with vLLM

📦 Paper2Code Benchmark Datasets

🖼️ Enhanced Pipeline with Image Analysis

Complete Pipeline Steps

One-Step Execution

Key Pipeline Features

1. Two-Stage Processing

2. Cost Optimization via Prompt Caching

3. Enhanced Image Processing

4. Modular Approach

5. Result

📊 Model-based Evaluation of Repositories Generated by PaperCoder

🛠️ Environment Setup

📝 Reference-free Evaluation

📝 Reference-based Evaluation

📄 Example Output

🔀 LLM Router

💵 Official AI Model API Pricing (May 2025)

OpenAI Models

Google Gemini Models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Simple approach (no `grobid`)

Packages