EvoCUA: Evolving Computer Use Agent

🥇 #1 Open-Source Model on OSWorld | A General-Purpose Multimodal Model Excelling at Computer Use

🥇 #1 Open-Source Model on OSWorld Leaderboard (Jan 2026)

🌟 Highlights

🥇 #1 Open-Source Model on OSWorld: Achieves 56.7% task completion rate, #1 among all open-source models
📈 Significant Improvements: +11.7% over OpenCUA-72B (45.0%→56.7%), +15.1% over Qwen3-VL thinking (41.6%→56.7%), with fewer parameters and half the steps
🖥️ End-to-End Multi-Turn Automation: Operates Chrome, Excel, PowerPoint, VSCode and more through screenshots and natural language instructions
🧠 Novel Training Method: Our data synthesis and training approach consistently improves Computer Use capability across multiple open-source VLMs without degrading general performance

📊 Performance Comparison

Rank	Model	Open/Closed	Type	Max Steps	Score
1	Claude-sonnet-4-5	🔒 Closed	General	100	62.9%
2	Seed-1.8	🔒 Closed	General	100	61.9%
3	Claude-sonnet-4-5	🔒 Closed	General	50	58.1%
4	EvoCUA-20260105 (Ours)	🟢 Open	General	50	56.7% 🥇
5	DeepMiner-Mano-72B	🔒 Closed	Specialized	100	53.9%
6	UI-TARS-2-2509	🔒 Closed	General	100	53.1%
7	EvoCUA (Previous Version)	🔒 Closed	General	50	50.3%
8	OpenCUA-72B	🟢 Open	Specialized	100	45.0%
...	...	...	...	...	...
13	Qwen3-VL-Flash	🔒 Closed	General	100	41.6%

EvoCUA is #1 among all open-source models, achieving competitive results with only 50 steps. Human-level performance remains significantly higher, indicating substantial room for improvement.

🚀 Quick Start

Installation

Python 3.12 is recommended.

git clone https://github.com/meituan/EvoCUA.git
cd EvoCUA
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Model Download & Deployment

EvoCUA requires downloading the model weights from HuggingFace and deploying with vLLM as an OpenAI-compatible inference server.

Recommended versions:

torch: 2.8.0+cu126
transformers: 4.57.3
vllm: 0.11.0

# 1) Download model weights
huggingface-cli download meituan/EvoCUA-32B-20260105 \
  --local-dir /path/to/EvoCUA-32B \
  --local-dir-use-symlinks False

# 2) Launch vLLM serving (recommend separate environment)
vllm serve /path/to/EvoCUA-32B \
  --served-model-name EvoCUA \
  --host 0.0.0.0 \
  --port 8080 \
  --tensor-parallel-size 2

# 3) Set environment variables
# Environment variables can be configured in .env file (see env.template for reference):
cp env.template .env
# Edit .env with your configurations, e.g.,
export OPENAI_API_KEY="dummy"
export OPENAI_BASE_URL="http://127.0.0.1:8080/v1"

Run Evaluation on OSWorld

python3 run_multienv_evocua.py \
  --headless \
  --provider_name aws \
  --observation_type screenshot \
  --model EvoCUA-S2 \
  --result_dir ./evocua_results \
  --test_all_meta_path evaluation_examples/test_nogdrive.json \
  --max_steps 50 \
  --num_envs 30 \
  --temperature 0.01 \
  --max_history_turns 4 \
  --coordinate_type relative \
  --resize_factor 32 \
  --prompt_style S2

📁 Project Structure

EvoCUA/
├── run_multienv_evocua.py      # Main entry point (multi-env parallel evaluation)
├── lib_run_single.py           # Single task rollout logic (trajectory, screenshots, recording, scoring)
├── lib_results_logger.py       # Real-time result aggregation to results.json
├── desktop_env/                # OSWorld environment implementation
│   ├── providers/              # VM providers (AWS/VMware/Docker/etc.)
│   ├── controllers/            # Environment controllers
│   └── evaluators/             # Task evaluators
├── mm_agents/
│   └── evocua/                 # EvoCUA agent (prompts, parsing, action generation)
└── evaluation_examples/        # OSWorld task configurations

📖 About OSWorld

OSWorld is the most influential benchmark in the Computer Use Agent domain. It is adopted by leading AI organizations including OpenAI, Anthropic, ByteDance Seed, Moonshot AI, Zhipu AI, Step, and more. OSWorld evaluates agents' ability to complete real-world computer tasks through multi-turn interactions with actual desktop environments.

🔗 Resources

🤗 Model Weights: meituan/EvoCUA-32B-20260105
📊 OSWorld Benchmark: os-world.github.io
📄 Technical Report: Coming Soon
🚀 More Model Sizes: Models of various sizes are on the way and will be open-sourced soon!

🙏 Acknowledgements

We sincerely thank the open-source community for their outstanding contributions to the Computer Use Agent field. We are grateful to Xinyuan Wang (OpenCUA) and Tianbao Xie (OSWorld) for their insightful discussions, valuable feedback on evaluation, and continuous support throughout this project. Their pioneering work has greatly inspired and advanced our research. We are committed to giving back to the community and will continue to open-source our research to advance the field.

📝 Citation

If you find EvoCUA useful in your research, please consider citing:

@misc{evocua2026,
  title={EvoCUA: Evolving Computer Use Agent},
  author={Chong Peng* and Taofeng Xue*},
  year={2026},
  url={https://github.com/meituan/EvoCUA},
  note={* Equal contribution}
}

📜 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Built with ❤️ by Meituan LongCat Team

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EvoCUA: Evolving Computer Use Agent

🌟 Highlights

📊 Performance Comparison

🚀 Quick Start

Installation

Model Download & Deployment

Run Evaluation on OSWorld

📁 Project Structure

📖 About OSWorld

🔗 Resources

🙏 Acknowledgements

📝 Citation

📜 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets/images		assets/images
desktop_env		desktop_env
evaluation_examples		evaluation_examples
logs		logs
mm_agents		mm_agents
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
env.template		env.template
lib_results_logger.py		lib_results_logger.py
lib_run_single.py		lib_run_single.py
requirements.txt		requirements.txt
run_multienv_evocua.py		run_multienv_evocua.py
show_result.py		show_result.py

License

meituan/EvoCUA

Folders and files

Latest commit

History

Repository files navigation

EvoCUA: Evolving Computer Use Agent

🌟 Highlights

📊 Performance Comparison

🚀 Quick Start

Installation

Model Download & Deployment

Run Evaluation on OSWorld

📁 Project Structure

📖 About OSWorld

🔗 Resources

🙏 Acknowledgements

📝 Citation

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages