Is CodeSOTA a Papers with Code replacement?

CodeSOTA builds on the Papers with Code legacy after Meta shut it down in July 2025. We track 286+ benchmark results across 86 datasets with links to code implementations and paper sources.

Can I use CodeSOTA benchmark data?

Yes. All benchmark data is available as JSON at codesota.com/data/benchmarks.json. You can build dashboards, cite it in papers, or integrate it into your tools.

ThePapers With Codesuccessor

State of the Art,
Verified

Q: What is CodeSOTA?

CodeSOTA is an independent ML benchmark tracking platform that provides verified state-of-the-art results across 17 research areas including computer vision, NLP, reasoning, code generation, speech, medical AI, and more. It serves as the Papers with Code successor with fresh, maintained data and links to implementations.

Q: Are CodeSOTA benchmarks verified?

Yes. CodeSOTA runs benchmarks independently where possible, rather than just aggregating paper claims. All data includes source URLs and access dates for verification.

Independent ML benchmarks across 17 research areas. Track progress, find implementations, compare models.

Vision, NLP, reasoning, code, speech, medical, robotics, and more. All results verified with source links.

Explore Benchmarks AI Building Blocks

286+ benchmark results

17 research areas

143 models tracked

Links to implementations

Explore Research Areas

17 domains. 286+ benchmarks. Find SOTA for your task.

Computer Vision

Detection, segmentation, classification, OCR

10 tasks

NLP

Language models, QA, translation, NER

9 tasks

Reasoning

Mathematical, logical, commonsense

MATH, GSM8K

Code

Generation, SWE-bench, debugging

6 tasks

Speech

ASR, TTS, speaker verification

5 tasks

Medical

Imaging, diagnosis, clinical NLP

4 tasks

Multimodal

Vision-language, VQA, text-to-image

5 tasks

Agentic AI

Autonomous agents, time horizon, HCAST

5 tasks

View all 17 research areas

Free PDF Download

The Zen of AI Composition

Building intelligent systems from first principles. A philosophical guide to AI transformations, modular composition, and evidence-based prompting.

Download now

CodeSOTA

The Zen of AI Composition

Kacper Wikiel

New Feature

AI Building Blocks

Stop searching. Start building. See which tools transform your data - with production-ready implementations.

In-Depth Comparisons

All guides

Agentic

SWE-bench SOTA

Which AI agents solve real GitHub issues? Latest scores on SWE-bench Verified.

Reasoning

Mathematical Reasoning

MATH, GSM8K, GPQA benchmarks. How models tackle competition-level problems.

Speech

Speech Recognition

LibriSpeech WER scores. Whisper, Conformer, and multilingual ASR compared.

Vision

Document OCR

OmniDocBench, OCRBench results. 50+ models tested on real documents.

Medical

Chest X-ray AI

CheXpert, MIMIC-CXR benchmarks. AUROC scores for radiology models.

Reference

AI Building Blocks

Input-to-output transformations. Find the right architecture for your task.

Can I trust these numbers?

Numbers from published papers, verified with our own tests where possible. No marketing claims, no sponsored rankings.

Which model fits my use case?

Compare accuracy, speed, cost, and deployment complexity. We show you the tradeoffs that matter for production.

Can I use this data?

Yes. All benchmark data available as JSON. Build dashboards, cite it in papers, integrate it into your tools.

286+

Benchmark results

Research areas

Datasets tracked

143

Models compared

Open Data

Use This Data

All benchmark data available as JSON

Build dashboards, cite in papers, integrate into your tools. No API key needed. Updated weekly with new results.

Download JSON View Methodology

Free to use

Source links included

Updated weekly

Frequently Asked Questions

What is CodeSOTA?

CodeSOTA is an independent ML benchmark tracking platform. We provide verified state-of-the-art results across 17 research areas including computer vision, NLP, reasoning, code generation, speech, medical AI, robotics, and more.

Is this a Papers with Code replacement?

CodeSOTA builds on the Papers with Code legacy after Meta shut it down in July 2025. We track 286+ benchmark results with links to implementations. Read the full story.

Are these benchmarks verified?

Yes. We run benchmarks independently where possible, rather than just aggregating paper claims. All data includes source URLs and access dates for verification. See our methodology.

Can I use this benchmark data?

Yes. All benchmark data is available as JSON at /data/benchmarks.json. Build dashboards, cite it in papers, or integrate it into your tools.

What People Say

Piotr Zaczek

AI Consultant, scaling Voice-AI for 15M+ calls/year

"Zajebista robota. Doslownie wczoraj szukalem dobrych porownywarek OCRow i jedynie marketingowy BS. Good job!"

December 2024

Anonymous

AI Engineer

"Super czysty, slop-free UI, ale przede wszystkim copy: bardzo precyzyjne pozycjonowanie i przeglad projektow."

December 2024

Cite CodeSOTA

If you use CodeSOTA in your research, please cite:

@misc{wikiel2025codesota,
  author = {Wikieł, Kacper},
  title = {CodeSOTA: Independent ML Benchmark Tracking},
  year = {2025},
  url = {https://codesota.com},
  note = {Accessed: 2025}
}

Or in plain text: Wikieł, K. (2025). CodeSOTA: Independent ML Benchmark Tracking. https://codesota.com

Want updates on new benchmarks?

We'll let you know when we add tests for new models or tasks.

No spam. Unsubscribe anytime.

State of the Art,Verified

Explore Research Areas

Computer Vision

NLP

Reasoning

Code

Speech

Medical

Multimodal

Agentic AI

The Zen of AI Composition

AI Building Blocks

In-Depth Comparisons

SWE-bench SOTA

Mathematical Reasoning

Speech Recognition

Document OCR

Chest X-ray AI

AI Building Blocks

Can I trust these numbers?

Which model fits my use case?

Can I use this data?

Use This Data

Frequently Asked Questions

What is CodeSOTA?

Is this a Papers with Code replacement?

Are these benchmarks verified?

Can I use this benchmark data?

What People Say

Piotr Zaczek

Anonymous

Cite CodeSOTA

Want updates on new benchmarks?

State of the Art,
Verified