on-policy-distillation

Here are 3 public repositories matching this topic...

OpenClaw-RL: Train any agent simply by talking

async gui-application coding slime memory-systems skill-learning rlhf sglang grpo on-policy-distillation openclaw-skills open-claw

🛠️ Apply on-policy distillation to enhance Qwen3-0.6b's performance on GSM8K by learning from its own outputs, reducing bias during inference.

Train and customize OpenClaw agents using reinforcement learning with simple language feedback and fully asynchronous optimization.

agent async gui-application slime memory-systems skill-learning rlhf sglang grpo agentic-rl on-policy-distillation openclaw openclaw-skills open-claw

Add a description, image, and links to the on-policy-distillation topic page so that developers can more easily learn about it.

To associate your repository with the on-policy-distillation topic, visit your repo's landing page and select "manage topics."