KnowRL: Self-Knowledge Reinforcement Learning for Large Language Models

This repository contains the official implementation of
📄 “KnowRL: Reinforcement Learning for Reliable Self-Knowledge in Large Language Models”.
KnowRL is inspired by the SeRL framework but significantly modified to strengthen self-knowledge rather than task reasoning.

🌟 Overview

Recent research shows that even the best LLMs misjudge their own competence in more than 1 in 5 cases, limiting trust in their responses.
KnowRL addresses this by enabling models to recognize what they know and what they don’t through a lightweight self-improvement process.

Key Ideas

Introspection – The model generates and classifies tasks it judges feasible or infeasible for itself.
Consensus-based Rewarding – Rewards are derived from the stability of internal agreement, requiring no external labels.
Minimal Data – Training begins from a small manually verified seed set and scales via model-generated tasks.

In experiments on LLaMA-3.1-8B and Qwen-2.5-7B, KnowRL improved self-knowledge by up to 28 % accuracy and 12 % F1 in just a few RL iterations—
all without human annotations.

🗂 Directory Overview

openrlhf/            # Core RL training scripts (adapted from SeRL/OpenRLHF)
knowrl_prompts/      # Seed prompts and validation instructions
evaluation/          # Intrinsic and extrinsic evaluation code
scripts/             # Example launch scripts for different models

⚙️ Installation

We recommend Python 3.11 on Ubuntu 20.04 or later.

# 1. Clone repository
git clone <url>
cd KnowRL

# 2. Install dependencies
pip install -r requirements.txt

# 3. Install OpenRLHF backend (needed for RL training)
cd openrlhf
pip install -e .

🧩 Training Configuration

KnowRL supports the same training infrastructure as SeRL but with new prompts and reward logic.
Recommended starter script for LLaMA-3.1-8B using Reinforce++:

ray start --head --node-ip-address 0.0.0.0
cd openrlhf
bash ../scripts/train_llama31_8b_knowrl.sh

Key hyperparameters (adjust to hardware):

--micro_train_batch_size 2
--train_batch_size 16
--micro_rollout_batch_size 4
--rollout_batch_size 16
--n_samples_per_prompt 8
--max_epochs 1
--prompt_max_len 1024
--generate_max_len 1024
--actor_learning_rate 5e-7
--init_kl_coef 1e-4

🔬 Evaluation

Evaluation includes:

Intrinsic self-consistency – agreement across repeated generations.
Extrinsic benchmarking – domain-expert verification on held-out feasible/infeasible tasks.

Example:

python evaluation/run_eval.py --model-path /path/to/your/checkpoint

🚀 Results

Model	Accuracy Gain	F1 Gain	Iterations
LLaMA-3.1-8B	+28 %	+12 %	~30
Qwen-2.5-7B	+25 %	+10 %	~30

All improvements were achieved without external supervision.

⚠️ Known Limitations

Single-language focus: experiments are currently English-only.
Limited training horizon: performance beyond ~30 RL iterations remains untested due to compute constraints.
Scaling uncertainty: effectiveness on models larger than 8B parameters is unverified.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
evaluation		evaluation
knowrl_prompts		knowrl_prompts
openrlhf		openrlhf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KnowRL: Self-Knowledge Reinforcement Learning for Large Language Models

🌟 Overview

Key Ideas

🗂 Directory Overview

⚙️ Installation

🧩 Training Configuration

🔬 Evaluation

🚀 Results

⚠️ Known Limitations

About

Uh oh!

Releases

Packages

Languages

knowledge-verse-ai/KnowRL

Folders and files

Latest commit

History

Repository files navigation

KnowRL: Self-Knowledge Reinforcement Learning for Large Language Models

🌟 Overview

Key Ideas

🗂 Directory Overview

⚙️ Installation

🧩 Training Configuration

🔬 Evaluation

🚀 Results

⚠️ Known Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages