Skip to content

Pull requests: openai/evals

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Update to python 3.12
#1607 opened Dec 21, 2025 by omonimus1 Loading…
Update custom-eval.md
#1598 opened Aug 19, 2025 by rajeshkp Loading…
13 tasks
Fix AttributeError: Update OpenAI error imports (Closes #1564)
#1577 opened Jan 27, 2025 by SaiKrishna-KK Loading…
6 of 13 tasks
Update completion-fn-protocol.md
#1575 opened Jan 18, 2025 by NinoRisteski Loading…
13 tasks
Ice linguistic benchmark
#1561 opened Oct 1, 2024 by bjarkiarmanns Loading…
1 task
anthropic_solver.py
#1554 opened Sep 4, 2024 by iHuydang Loading…
13 tasks done
Fix a bug in examples/mmlu.ipynb when using gpt-4o or gpt-4o-mini
#1551 opened Aug 25, 2024 by RobinWitch Loading…
13 tasks done
Fix the is_chat_model function to work with gpt-4o
#1550 opened Aug 22, 2024 by LoryPack Loading…
3 tasks done
Added Icelandic QA evaluation data from news texts
#1548 opened Aug 20, 2024 by thorunna Loading…
12 of 13 tasks
Added Icelandic QA evaluation data from Wikipedia
#1547 opened Aug 20, 2024 by thorunna Loading…
12 of 13 tasks
Updating make-me-say to be compatible with Solvers
#1546 opened Aug 18, 2024 by lennart-finke Loading…
1 task done
Fix Information exposure alert through an exception #1543
#1545 opened Aug 8, 2024 by arpitjain099 Loading…
13 tasks done
Fix log injection error
#1544 opened Aug 8, 2024 by arpitjain099 Loading…
13 tasks done
Remove global OpenAI client initialization
#1539 opened Jul 21, 2024 by michaelAlvarino Loading…
Fix problematic sample in Schelling Point
#1534 opened May 22, 2024 by JunShern Loading…
Update README: Add Langtrace as an Eval vendor
#1531 opened May 21, 2024 by karthikscale3 Loading…
5 of 13 tasks
Add support for gpt-4o
#1530 opened May 16, 2024 by androettop Loading…
show evals in wandb weave
#1522 opened Apr 19, 2024 by yogeshg Draft
13 tasks
Added Quran Eval & Simple Fact Model-Graded Definition
#1511 opened Apr 1, 2024 by sakher Loading…
13 tasks done
ProTip! Updated in the last three days: updated:>2026-01-13.