RFTKit - Reinforcement Fine-Tuning Toolkit

A comprehensive Python toolkit for OpenAI's Reinforcement Fine-Tuning (RFT), featuring Pydantic-based grader creation, cluster-based rubric evaluation, and seamless OpenAI API integration.

Features

🎯 Cluster-Based Rubric System: Automatically filter rubrics based on activity type
🔧 Pydantic-Based Graders: Type-safe, validated grader definitions
📊 Multiple Grader Types: Python, Score Model, and Multi graders
🎨 Flexible Aggregation: Combine graders with custom weighted formulas
🚀 OpenAI RFT Ready: Export configs directly for OpenAI's RFT API

Installation

pip install rftkit

Or install from source:

git clone https://github.com/yourusername/rftkit.git
cd rftkit
pip install -e .

Quick Start

from rftkit import (
    FormattingGrader,
    RubricGrader,
    IndicatorGrader,
    AggregatorGrader,
    RubricItem
)

# 1. Create a formatting grader
format_grader = FormattingGrader(
    name="format_validator",
    weight=0.2
)

# 2. Create a rubric grader
rubric = RubricItem(
    id=1,
    content="Accuracy: How correct and precise the output is. 0.0 = incorrect, 1.0 = perfect"
)
rubric_grader = RubricGrader(
    name="rubric_accuracy",
    rubric=rubric,
    weight=1.0
)

# 3. Create an indicator grader (cluster-based filtering)
indicator = IndicatorGrader(
    name="indicator_accuracy",
    rubric_name="accuracy"
)

# 4. Combine with aggregator
aggregator = AggregatorGrader(
    name="final_score",
    graders=[format_grader, rubric_grader, indicator],
    rubric_to_indicator_map={"rubric_accuracy": "indicator_accuracy"},
    formatting_grader_name="format_validator"
)

# 5. Export for OpenAI RFT API
config = aggregator.config

Architecture

Cluster-Based Rubric System

RFTKit implements a cluster-based evaluation system where each data type/cluster has its own specific set of rubrics:

┌─────────────────────────────────────────────────────────┐
│                    Input Data                            │
│                  (e.g., Model output)                    │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│                  Cluster Type Detection                  │
│              (Extract type from content)                 │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│                   Indicator Grader                       │
│         (Determines which rubrics apply)                 │
└────────────────────┬────────────────────────────────────┘
                     │
        ┌────────────┼────────────┐
        ▼            ▼            ▼
┌──────────┐  ┌──────────┐  ┌──────────┐
│ Rubric 1 │  │ Rubric 2 │  │ Rubric N │
│ (Active) │  │ (Active) │  │(Inactive)│
│ Score:0.8│  │ Score:0.9│  │ Score:0  │
└──────────┘  └──────────┘  └──────────┘
        │            │            │
        └────────────┼────────────┘
                     ▼
┌─────────────────────────────────────────────────────────┐
│                   Aggregator Grader                      │
│                                                          │
│  Final = Format_Score +                                 │
│          Σ(Indicator[i] × Weight[i] × Rubric_Score[i]) │
└─────────────────────────────────────────────────────────┘

Mathematical Model

The final score is calculated using the formula:

S = s_F · w_F + Σ(s_c,i · w_r,i · s_r,i)

Where:

S: Final score
s_F: FormattingGrader score
w_F: FormattingGrader weight
s_r,i: Rubric grader i's score
w_r,i: Rubric grader i's weight
s_c,i: Indicator grader i's score (0 or 1)

The indicator score acts as a binary mask, zeroing out irrelevant rubrics for each activity type.

Grader Types

FormattingGrader

Validates JSON structure and required fields using Python code:

format_grader = FormattingGrader(
    name="json_validator",
    weight=0.2
)

RubricGrader

Uses LLMs to evaluate outputs against specific criteria:

rubric = RubricItem(
    id=1,
    content="Detailed rubric description with scoring guidelines..."
)
rubric_grader = RubricGrader(
    name="rubric_name",
    rubric=rubric,
    weight=1.0,
    model="gpt-4o-2024-08-06"
)

IndicatorGrader

Determines rubric relevance based on activity type:

indicator = IndicatorGrader(
    name="indicator_name",
    rubric_name="rubric_identifier"
)

AggregatorGrader

Combines multiple graders with a weighted formula:

aggregator = AggregatorGrader(
    name="final_score",
    graders=[format_grader, rubric_grader, indicator],
    rubric_to_indicator_map={"rubric": "indicator"},
    formatting_grader_name="format"
)

Cluster Configuration

Define which rubrics apply to which data types/clusters by modifying the cluster mapping in indicator_validator.py:

cluster_to_rubric_map = {
    "cluster_a": [
        "accuracy",
        "completeness",
        "clarity",
        # ... more rubrics
    ],
    "cluster_b": [
        "efficiency",
        "scalability",
        "performance",
        # ... different rubrics
    ],
}

OpenAI RFT Integration

Use the exported config directly with OpenAI's API:

import openai

client = openai.OpenAI()

# Create RFT job with your grader
job = client.fine_tuning.jobs.create(
    training_file="file-abc123",
    model="gpt-4o-2024-08-06",
    hyperparameters={
        "n_epochs": 3
    },
    grader=aggregator.config  # Use your grader config
)

Examples

See the examples/ directory for Jupyter notebooks demonstrating:

Basic grader usage
Cluster-based rubric configuration
Multi-grader aggregation
OpenAI RFT integration

Development

# Clone the repository
git clone https://github.com/yourusername/rftkit.git
cd rftkit

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

References

Citation

If you use RFTKit in your research, please cite:

@software{rftkit2024,
  title={RFTKit: Reinforcement Fine-Tuning Toolkit},
  author={Barbosa, Artur},
  year={2024},
  url={https://github.com/yourusername/rftkit}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
docs		docs
examples		examples
src/rftkit		src/rftkit
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RFTKit - Reinforcement Fine-Tuning Toolkit

Features

Installation

Quick Start

Architecture

Cluster-Based Rubric System

Mathematical Model

Grader Types

FormattingGrader

RubricGrader

IndicatorGrader

AggregatorGrader

Cluster Configuration

OpenAI RFT Integration

Examples

Development

License

Contributing

References

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RFTKit - Reinforcement Fine-Tuning Toolkit

Features

Installation

Quick Start

Architecture

Cluster-Based Rubric System

Mathematical Model

Grader Types

FormattingGrader

RubricGrader

IndicatorGrader

AggregatorGrader

Cluster Configuration

OpenAI RFT Integration

Examples

Development

License

Contributing

References

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages