Skip to content

knowledge-verse-ai/KnowRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

KnowRL: Self-Knowledge Reinforcement Learning for Large Language Models

This repository contains the official implementation of
📄 “KnowRL: Reinforcement Learning for Reliable Self-Knowledge in Large Language Models”.
KnowRL is inspired by the SeRL framework but significantly modified to strengthen self-knowledge rather than task reasoning.


🌟 Overview

Recent research shows that even the best LLMs misjudge their own competence in more than 1 in 5 cases, limiting trust in their responses.
KnowRL addresses this by enabling models to recognize what they know and what they don’t through a lightweight self-improvement process. RL_flow

Key Ideas

  • Introspection – The model generates and classifies tasks it judges feasible or infeasible for itself.
  • Consensus-based Rewarding – Rewards are derived from the stability of internal agreement, requiring no external labels.
  • Minimal Data – Training begins from a small manually verified seed set and scales via model-generated tasks.

In experiments on LLaMA-3.1-8B and Qwen-2.5-7B, KnowRL improved self-knowledge by up to 28 % accuracy and 12 % F1 in just a few RL iterations—
all without human annotations.


🗂 Directory Overview

openrlhf/            # Core RL training scripts (adapted from SeRL/OpenRLHF)
knowrl_prompts/      # Seed prompts and validation instructions
evaluation/          # Intrinsic and extrinsic evaluation code
scripts/             # Example launch scripts for different models

⚙️ Installation

We recommend Python 3.11 on Ubuntu 20.04 or later.

# 1. Clone repository
git clone <url>
cd KnowRL

# 2. Install dependencies
pip install -r requirements.txt

# 3. Install OpenRLHF backend (needed for RL training)
cd openrlhf
pip install -e .

🧩 Training Configuration

KnowRL supports the same training infrastructure as SeRL but with new prompts and reward logic.
Recommended starter script for LLaMA-3.1-8B using Reinforce++:

ray start --head --node-ip-address 0.0.0.0
cd openrlhf
bash ../scripts/train_llama31_8b_knowrl.sh

Key hyperparameters (adjust to hardware):

--micro_train_batch_size 2
--train_batch_size 16
--micro_rollout_batch_size 4
--rollout_batch_size 16
--n_samples_per_prompt 8
--max_epochs 1
--prompt_max_len 1024
--generate_max_len 1024
--actor_learning_rate 5e-7
--init_kl_coef 1e-4

🔬 Evaluation

Evaluation includes:

  • Intrinsic self-consistency – agreement across repeated generations.
  • Extrinsic benchmarking – domain-expert verification on held-out feasible/infeasible tasks.

Example:

python evaluation/run_eval.py --model-path /path/to/your/checkpoint

🚀 Results

Model Accuracy Gain F1 Gain Iterations
LLaMA-3.1-8B +28 % +12 % ~30
Qwen-2.5-7B +25 % +10 % ~30

All improvements were achieved without external supervision.


⚠️ Known Limitations

  • Single-language focus: experiments are currently English-only.
  • Limited training horizon: performance beyond ~30 RL iterations remains untested due to compute constraints.
  • Scaling uncertainty: effectiveness on models larger than 8B parameters is unverified.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages