Introduction

Visual Physics Comprehension Test Leaderboard

This is the benchmark runner code for the visual physics comprehension test (VPCT). To run the benchmark, execute run-vpct.py with options specified below. You'll need to acquire a VPCT dataset and place it in the data directory that you feed to the runner--VPCT-1 dataset can be found here.

Usage

Check model_registry.py to see a list of configured model slugs.

usage: run-vpct.py [-h] [-d DATA_DIR] [-o OUTPUT_DIR] [-p PROMPT_FILE] [-m MODELS] [--runs RUNS] [--batch-size BATCH_SIZE] [--subset SUBSET] [--max-retries MAX_RETRIES] [--base-delay BASE_DELAY] [--overwrite] [--max-tokens MAX_TOKENS] [--timeout-seconds TIMEOUT_SECONDS] [--thinking-budget THINKING_BUDGET] [--openai-base-url OPENAI_BASE_URL] [--openai-api-key OPENAI_API_KEY]

Run image-based bucket-prediction benchmarks across OpenAI-compatible and Anthropic endpoints.

options:

-h, --help - show this help message and exit
-d DATA_DIR, --data-dir DATA_DIR
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
-p PROMPT_FILE, --prompt-file PROMPT_FILE
-m MODELS, --models MODELS - Comma-separated model slugs (see MODEL_REGISTRY). (default: None)
--runs RUNS
--batch-size BATCH_SIZE
--subset SUBSET - Run a smaller subset of the VPCT benchmark. (default: None)
--max-retries MAX_RETRIES
--base-delay BASE_DELAY
--overwrite
--max-tokens MAX_TOKENS
--timeout-seconds TIMEOUT_SECONDS
--thinking-budget THINKING_BUDGET - Anthropic thinking_budget (ignored by OpenAI). (default: 0)
--openai-base-url OPENAI_BASE_URL - Override base_url for OpenAI-compatible endpoints (e.g. https://openrouter.ai/api/v1). (default: None)
--openai-api-key OPENAI_API_KEY - API key for that endpoint. If omitted, falls back to OPENAI_API_KEY env var. (default: None)

Example:

python run-vpct.py -d ./data -o ./runs/gpt-4o-mini-test-subset -m gpt-4o-mini --runs 1 --batch-size 5 --max-tokens 16384 --subset 5

Or for a 3rd party with an openai compatibility layer:

python run-vpct.py -d ./data -o ./runs/gemini-flash-test -m gemini-2.0-flash --runs 1 --batch-size 5 --max-tokens 4096 --subset 5 --openai-base-url https://generativelanguage.googleapis.com/v1beta/openai/ --openai-api-key $GEMINI_API_KEY

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
adapters		adapters
.gitignore		.gitignore
README.md		README.md
cli.py		cli.py
live_test.py		live_test.py
model_registry.py		model_registry.py
prompt.py		prompt.py
requirements.txt		requirements.txt
run-vpct.py		run-vpct.py
utils.py		utils.py
vpct_dataclasses.py		vpct_dataclasses.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Usage

options:

Example:

About

Uh oh!

Releases

Packages

Languages

camelCase12/vpct-runner

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

options:

Example:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages