State of the Art,
Verified
Independent ML benchmarks across 17 research areas. Track progress, find implementations, compare models.
Vision, NLP, reasoning, code, speech, medical, robotics, and more. All results verified with source links.
Explore Research Areas
17 domains. 286+ benchmarks. Find SOTA for your task.
Computer Vision
Detection, segmentation, classification, OCR
10 tasksNLP
Language models, QA, translation, NER
9 tasksReasoning
Mathematical, logical, commonsense
MATH, GSM8KCode
Generation, SWE-bench, debugging
6 tasksSpeech
ASR, TTS, speaker verification
5 tasksMedical
Imaging, diagnosis, clinical NLP
4 tasksMultimodal
Vision-language, VQA, text-to-image
5 tasksAgentic AI
Autonomous agents, time horizon, HCAST
5 tasksThe Zen of AI Composition
Building intelligent systems from first principles. A philosophical guide to AI transformations, modular composition, and evidence-based prompting.
CodeSOTA
The Zen of AI Composition
Kacper Wikiel
AI Building Blocks
Stop searching. Start building. See which tools transform your data - with production-ready implementations.
In-Depth Comparisons
All guidesSWE-bench SOTA
Which AI agents solve real GitHub issues? Latest scores on SWE-bench Verified.
Mathematical Reasoning
MATH, GSM8K, GPQA benchmarks. How models tackle competition-level problems.

Speech Recognition
LibriSpeech WER scores. Whisper, Conformer, and multilingual ASR compared.

Document OCR
OmniDocBench, OCRBench results. 50+ models tested on real documents.

Chest X-ray AI
CheXpert, MIMIC-CXR benchmarks. AUROC scores for radiology models.
AI Building Blocks
Input-to-output transformations. Find the right architecture for your task.
Can I trust these numbers?
Numbers from published papers, verified with our own tests where possible. No marketing claims, no sponsored rankings.
Which model fits my use case?
Compare accuracy, speed, cost, and deployment complexity. We show you the tradeoffs that matter for production.
Can I use this data?
Yes. All benchmark data available as JSON. Build dashboards, cite it in papers, integrate it into your tools.
Use This Data
All benchmark data available as JSON
Build dashboards, cite in papers, integrate into your tools. No API key needed. Updated weekly with new results.
Frequently Asked Questions
What is CodeSOTA?
CodeSOTA is an independent ML benchmark tracking platform. We provide verified state-of-the-art results across 17 research areas including computer vision, NLP, reasoning, code generation, speech, medical AI, robotics, and more.
Is this a Papers with Code replacement?
CodeSOTA builds on the Papers with Code legacy after Meta shut it down in July 2025. We track 286+ benchmark results with links to implementations. Read the full story.
Are these benchmarks verified?
Yes. We run benchmarks independently where possible, rather than just aggregating paper claims. All data includes source URLs and access dates for verification. See our methodology.
Can I use this benchmark data?
Yes. All benchmark data is available as JSON at /data/benchmarks.json. Build dashboards, cite it in papers, or integrate it into your tools.
What People Say
Piotr Zaczek
AI Consultant, scaling Voice-AI for 15M+ calls/year
"Zajebista robota. Doslownie wczoraj szukalem dobrych porownywarek OCRow i jedynie marketingowy BS. Good job!"
December 2024
Anonymous
AI Engineer
"Super czysty, slop-free UI, ale przede wszystkim copy: bardzo precyzyjne pozycjonowanie i przeglad projektow."
December 2024
Cite CodeSOTA
If you use CodeSOTA in your research, please cite:
@misc{wikiel2025codesota,
author = {Wikieł, Kacper},
title = {CodeSOTA: Independent ML Benchmark Tracking},
year = {2025},
url = {https://codesota.com},
note = {Accessed: 2025}
}Or in plain text: Wikieł, K. (2025). CodeSOTA: Independent ML Benchmark Tracking. https://codesota.com
Want updates on new benchmarks?
We'll let you know when we add tests for new models or tasks.
No spam. Unsubscribe anytime.