The APA Benchmark: A People-centric Benchmark for Testing Vision-Language Models

We introduce the APA Benchmark, consisting of images of Actors, Politicians, and Athletes paired with a series of text prompts. Our benchmark serves as a tool for practitioners and researchers who are considering VLMs for people-centric tasks. Our benchmark tests VLM for their association ability with respect to pictures of public figures in three domains: Athletics, Politics and Acting. We issue text prompts against photos of famous people in each of these domains and provide a score for VLMs according to their matching capabilities. Images are mostly sourced from Wikimedia Commons and Wikipedia which means they are either in the public domain or have a friendly license to redistribute. Other metadata is manually curated from Wikipedia or from official sources.

Environment Setup

1. Create Conda Environment (Optional)

If needed, create a new conda environment with Python 3.10:

conda create -n apa_benchmark python=3.10
conda activate apa_benchmark

2. Install Required Packages

Install the following Python packages:

pandas
pytorch==1.7.1
torchvision
cudatoolkit==11.0
ftfy
regex
tqdm
transformers
accelerate

Note: Gemma 3 is supported starting from transformers==4.50.0.

CLIP Benchmark

Additional Installation

Make sure you have the latest version of CLIP installed:

pip install git+https://github.com/openai/CLIP.git

Running Benchmarks

Compute scores by prompt type: category, occupation, and specialty

python clip/base-benchmark.py

Compute scores for Image-to-Text tasks

python clip/pks_benchmark.py

Compute scores for Text-to-Image tasks

python clip/pks_benchmark_rev.py

Gemma 3 (Generative Vision-Language Model)

Running Benchmarks

Compute scores by prompt type: category, occupation, and specialty

python gemma3/base-benchmark.py

Compute scores for Image-to-Text tasks

python gemma3/pks_benchmark.py

Compute scores for Text-to-Image tasks

python gemma3/pks_benchmark_rev.py

Evaluation Scripts

These scripts summarize benchmark results from different analytical perspectives.

1. Basic associative abilities

python eval/exp1/cal_three_mean_actor.py
python eval/exp1/cal_three_mean_politician.py
python eval/exp1/cal_three_mean_athletes.py

2. Influence of societal biases

python eval/exp2/cal_bias_actor_athlete.py
python eval/exp2/cal_bias_politician.py

3. Identity recognition capability

python eval/exp3/cal_image_score_mean.py
python eval/exp3/cal_text_score_mean.py

📚 Citation

If you use this work, please cite us using the following BibTeX entry:

@misc{apabench,
  author       = {Yuri Ishitoya and Veronica Flores and Ziyan Yang and Paola Cascante-Bonilla and Vicente Ordonez},
  title        = {The APA Benchmark: A People-centric Benchmark for Testing Vision-Language Models},
  year         = {2025},
  howpublished = {\url{https://github.com/uvavision/apa-bench}}
}

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
benchmark-data-v1		benchmark-data-v1
clip		clip
eval		eval
gemma3		gemma3
LICENSE		LICENSE
README.md		README.md
base-prompts-actors.csv		base-prompts-actors.csv
base-prompts-athletes.csv		base-prompts-athletes.csv
base-prompts-house.csv		base-prompts-house.csv
base-prompts-mayors.csv		base-prompts-mayors.csv
base-prompts-senate.csv		base-prompts-senate.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The APA Benchmark: A People-centric Benchmark for Testing Vision-Language Models

Environment Setup

1. Create Conda Environment (Optional)

2. Install Required Packages

CLIP Benchmark

Additional Installation

Running Benchmarks

Compute scores by prompt type: category, occupation, and specialty

Compute scores for Image-to-Text tasks

Compute scores for Text-to-Image tasks

Gemma 3 (Generative Vision-Language Model)

Running Benchmarks

Compute scores by prompt type: category, occupation, and specialty

Compute scores for Image-to-Text tasks

Compute scores for Text-to-Image tasks

Evaluation Scripts

1. Basic associative abilities

2. Influence of societal biases

3. Identity recognition capability

📚 Citation

License

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

uvavision/apa-bench

Folders and files

Latest commit

History

Repository files navigation

The APA Benchmark: A People-centric Benchmark for Testing Vision-Language Models

Environment Setup

1. Create Conda Environment (Optional)

2. Install Required Packages

CLIP Benchmark

Additional Installation

Running Benchmarks

Compute scores by prompt type: category, occupation, and specialty

Compute scores for Image-to-Text tasks

Compute scores for Text-to-Image tasks

Gemma 3 (Generative Vision-Language Model)

Running Benchmarks

Compute scores by prompt type: category, occupation, and specialty

Compute scores for Image-to-Text tasks

Compute scores for Text-to-Image tasks

Evaluation Scripts

1. Basic associative abilities

2. Influence of societal biases

3. Identity recognition capability

📚 Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages