Skip to content

URRealHero/JudgeAnything

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

JudgeAnything

See our demo at web

Todo

  • Release the Benchmark and the Dataset
  • Gather code for auto-evaluation on our benchmark
  • Release the human-annotated ground truth
  • Construct the OmniArena platform

Data

We have released our benchmark and models responses in dataset, you should download them into dataset/

Reproduction of auto-evaluation

You can use scripts under scripts/ to re-run the evaluation on existing data.

API

We now provide the judging code for Gemini-1.5-pro, Gemini-2.0-flash, Gemini-2.0-flash-lite.

Local

We now provide the judging code for Phi4Multimodal We will provide the judging code for Qwen2.5 Omni and InternOmni-7B We also provides an extensive huggingface-support interface to evaluate, the prompt template is Phi4v-multimodal, you can modify the local special token template in scripts/utils/config.py or modify the procedure of building local prompt template in scripts/utils/prompt_builder.py.

General Evaluation

To evaluate your own private generated data, transform the data into following format to do the score evaluation.(We are still working on gathering general code for our OmniArena's pair comparison code). We encourage you to get the prompt in scripts/utils/prompt_builder.py and evaluate yourself

[
    {
        "uniq_id": "the uniq_id of task",
        "response_id": "create a response_id for your response",
        "task_name": "task_name of task",
        "model_name": "your model name",
        "response": {
            "type": "text/image/video/audio",
            "content": "string or list, if non-text type, provide the abspath here"
        }
    }
]

Citation

@article{pu2025judge,
  title={Judge Anything: MLLM as a Judge Across Any Modality},
  author={Pu, Shu and Wang, Yaochen and Chen, Dongping and Chen, Yuhang and Wang, Guohao and Qin, Qi and Zhang, Zhongyi and Zhang, Zhiyuan and Zhou, Zetong and Gong, Shuang and others},
  journal={arXiv preprint arXiv:2503.17489},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages