A comprehensive Python toolkit for OpenAI's Reinforcement Fine-Tuning (RFT), featuring Pydantic-based grader creation, cluster-based rubric evaluation, and seamless OpenAI API integration.
- π― Cluster-Based Rubric System: Automatically filter rubrics based on activity type
- π§ Pydantic-Based Graders: Type-safe, validated grader definitions
- π Multiple Grader Types: Python, Score Model, and Multi graders
- π¨ Flexible Aggregation: Combine graders with custom weighted formulas
- π OpenAI RFT Ready: Export configs directly for OpenAI's RFT API
pip install rftkitOr install from source:
git clone https://github.com/yourusername/rftkit.git
cd rftkit
pip install -e .from rftkit import (
FormattingGrader,
RubricGrader,
IndicatorGrader,
AggregatorGrader,
RubricItem
)
# 1. Create a formatting grader
format_grader = FormattingGrader(
name="format_validator",
weight=0.2
)
# 2. Create a rubric grader
rubric = RubricItem(
id=1,
content="Accuracy: How correct and precise the output is. 0.0 = incorrect, 1.0 = perfect"
)
rubric_grader = RubricGrader(
name="rubric_accuracy",
rubric=rubric,
weight=1.0
)
# 3. Create an indicator grader (cluster-based filtering)
indicator = IndicatorGrader(
name="indicator_accuracy",
rubric_name="accuracy"
)
# 4. Combine with aggregator
aggregator = AggregatorGrader(
name="final_score",
graders=[format_grader, rubric_grader, indicator],
rubric_to_indicator_map={"rubric_accuracy": "indicator_accuracy"},
formatting_grader_name="format_validator"
)
# 5. Export for OpenAI RFT API
config = aggregator.configRFTKit implements a cluster-based evaluation system where each data type/cluster has its own specific set of rubrics:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Input Data β
β (e.g., Model output) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cluster Type Detection β
β (Extract type from content) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Indicator Grader β
β (Determines which rubrics apply) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β Rubric 1 β β Rubric 2 β β Rubric N β
β (Active) β β (Active) β β(Inactive)β
β Score:0.8β β Score:0.9β β Score:0 β
ββββββββββββ ββββββββββββ ββββββββββββ
β β β
ββββββββββββββΌβββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Aggregator Grader β
β β
β Final = Format_Score + β
β Ξ£(Indicator[i] Γ Weight[i] Γ Rubric_Score[i]) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The final score is calculated using the formula:
S = s_F Β· w_F + Ξ£(s_c,i Β· w_r,i Β· s_r,i)
Where:
S: Final scores_F: FormattingGrader scorew_F: FormattingGrader weights_r,i: Rubric grader i's scorew_r,i: Rubric grader i's weights_c,i: Indicator grader i's score (0 or 1)
The indicator score acts as a binary mask, zeroing out irrelevant rubrics for each activity type.
Validates JSON structure and required fields using Python code:
format_grader = FormattingGrader(
name="json_validator",
weight=0.2
)Uses LLMs to evaluate outputs against specific criteria:
rubric = RubricItem(
id=1,
content="Detailed rubric description with scoring guidelines..."
)
rubric_grader = RubricGrader(
name="rubric_name",
rubric=rubric,
weight=1.0,
model="gpt-4o-2024-08-06"
)Determines rubric relevance based on activity type:
indicator = IndicatorGrader(
name="indicator_name",
rubric_name="rubric_identifier"
)Combines multiple graders with a weighted formula:
aggregator = AggregatorGrader(
name="final_score",
graders=[format_grader, rubric_grader, indicator],
rubric_to_indicator_map={"rubric": "indicator"},
formatting_grader_name="format"
)Define which rubrics apply to which data types/clusters by modifying the cluster mapping in indicator_validator.py:
cluster_to_rubric_map = {
"cluster_a": [
"accuracy",
"completeness",
"clarity",
# ... more rubrics
],
"cluster_b": [
"efficiency",
"scalability",
"performance",
# ... different rubrics
],
}Use the exported config directly with OpenAI's API:
import openai
client = openai.OpenAI()
# Create RFT job with your grader
job = client.fine_tuning.jobs.create(
training_file="file-abc123",
model="gpt-4o-2024-08-06",
hyperparameters={
"n_epochs": 3
},
grader=aggregator.config # Use your grader config
)See the examples/ directory for Jupyter notebooks demonstrating:
- Basic grader usage
- Cluster-based rubric configuration
- Multi-grader aggregation
- OpenAI RFT integration
# Clone the repository
git clone https://github.com/yourusername/rftkit.git
cd rftkit
# Install in editable mode with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/MIT License - see LICENSE file for details
Contributions are welcome! Please feel free to submit a Pull Request.
If you use RFTKit in your research, please cite:
@software{rftkit2024,
title={RFTKit: Reinforcement Fine-Tuning Toolkit},
author={Barbosa, Artur},
year={2024},
url={https://github.com/yourusername/rftkit}
}