Skip to content

Update agent display names to include model and scaffold#412

Open
Chesars wants to merge 1 commit intoSWE-bench:mainfrom
Chesars:update-agent-display-names-v2
Open

Update agent display names to include model and scaffold#412
Chesars wants to merge 1 commit intoSWE-bench:mainfrom
Chesars:update-agent-display-names-v2

Conversation

@Chesars
Copy link
Contributor

@Chesars Chesars commented Feb 9, 2026

Summary

  • Updates names for 16 entries that were missing model or scaffold

Changes

Entry Before After
Augment Agent v1 Augment Agent v1 Augment Agent v1 + Claude Sonnet 4
Augment Agent v0 Augment Agent v0 Augment Agent v0 + Sonnet 3.7 + O1
OpenHands + 4x Scaled OpenHands + 4x Scaled (2024-02-03) OpenHands + 4x Scaled + Claude 3.5 Sonnet + o3-mini (2024-02-03)
AppMap Navie v2 AppMap Navie v2 AppMap Navie v2 + Claude 3.5 Sonnet + GPT-4o
PatchPilot-v1.1 PatchPilot-v1.1 PatchPilot-v1.1 + o4-mini
SWE-Exp SWE-Exp SWE-Exp + DeepSeek-V3-0324
SWE-Rizzo SWE-Rizzo SWE-Rizzo + Claude 3.7
Nemotron-CORTEXA Nemotron-CORTEXA Nemotron-CORTEXA + NV-EmbedCode + Claude 3.5 Sonnet + DeepSeek-V3 + o3-mini + GPT-4o + GPT-4-turbo + Qwen2.5-72B + Llama-3.1-405B + Llama-3.3-70B
GLM-4.5 GLM-4.5 OpenHands + GLM-4.5
Skywork-SWE-32B Skywork-SWE-32B OpenHands + Skywork-SWE-32B
Skywork-SWE-32B + TTS(Bo8) Skywork-SWE-32B + TTS(Bo8) OpenHands + Skywork-SWE-32B + TTS(Bo8)
MCTS-Refine-7B MCTS-Refine-7B Agentless + MCTS-Refine-7B
DeepSWE-Preview DeepSWE-Preview R2E-Agent + DeepSWE-Preview
DeepSWE-Preview + TTS(Bo16) DeepSWE-Preview + TTS(Bo16) R2E-Agent + DeepSWE-Preview + TTS(Bo16)
FrogBoss-32B-2510 FrogBoss-32B-2510 debug-gym + FrogBoss-32B-2510
FrogMini-14B-2510 FrogMini-14B-2510 debug-gym + FrogMini-14B-2510

Related: #406
Closes: SWE-bench/swe-bench.github.io#40

- Augment Agent v1 → + Claude Sonnet 4
- Augment Agent v0 → + Sonnet 3.7 + O1
- OpenHands + 4x Scaled → + Claude 3.5 Sonnet + o3-mini
- AppMap Navie v2 → + Claude 3.5 Sonnet + GPT-4o
- PatchPilot-v1.1 → + o4-mini
- SWE-Exp → + DeepSeek-V3-0324
- SWE-Rizzo → + Claude 3.7
- Nemotron-CORTEXA → + all 9 models used
- GLM-4.5 → OpenHands + GLM-4.5
- Skywork-SWE-32B → OpenHands + Skywork-SWE-32B
- Skywork-SWE-32B + TTS(Bo8) → OpenHands + Skywork-SWE-32B + TTS(Bo8)
- MCTS-Refine-7B → Agentless + MCTS-Refine-7B
- DeepSWE-Preview → R2E-Agent + DeepSWE-Preview
- DeepSWE-Preview + TTS(Bo16) → R2E-Agent + DeepSWE-Preview + TTS(Bo16)
- FrogBoss-32B-2510 → debug-gym + FrogBoss-32B-2510
- FrogMini-14B-2510 → debug-gym + FrogMini-14B-2510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Leaderboard] Column "Model" displays Agent Systems

1 participant