scalable-oversight

Here are 6 public repositories matching this topic...

WooooDyy / MathCritique

Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".

reasoning llm scalable-oversight

Updated Nov 29, 2024
Python

halfrot / ALaRM

Star

[ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"

alignment rlhf scalable-oversight

Updated Mar 28, 2024
Python

mintaywon / IF_RLHF

Star

Source code for 'Understanding impacts of human feedback via influence functions'

alignment influence-functions large-language-models rlhf scalable-oversight

Updated Feb 2, 2025
Python

AnaBelenBarbero / ai-wise-council

Star

LLM supervising other LLMs

ai scalable-oversight

Updated Feb 10, 2025
Jupyter Notebook

SamAdamDay / neural-interactive-proofs

Star

Experiments for the Neural Interactive Proofs paper

machine-learning reinforcement-learning ai multi-agent-systems ai-safety weak-to-strong scalable-oversight

Updated Jun 30, 2025
Python

Napiersnotes / ScalableOversight-ADT

Star

Adversarial Deliberation Trees with Mechanistic Verification for scalable LLM oversight

python ai-safety ai-alignment scalable-oversight llm-open-source

Updated Jan 9, 2026
Python

Improve this page

Add a description, image, and links to the scalable-oversight topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the scalable-oversight topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scalable-oversight

Here are 6 public repositories matching this topic...

WooooDyy / MathCritique

halfrot / ALaRM

mintaywon / IF_RLHF

AnaBelenBarbero / ai-wise-council

SamAdamDay / neural-interactive-proofs

Napiersnotes / ScalableOversight-ADT

Improve this page

Add this topic to your repo