-
Notifications
You must be signed in to change notification settings - Fork 0
License
QiRebecca/deepevolve
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
# DeepEvolve - 工程化 LLM 训练框架
## 一键命令(Commands)
- SFT(监督微调)
python scripts/run_sft.py --config configs/sft.yaml
- GRPO(分组奖励策略优化)
# 注意:需满足 num_generations>=2 且 generation_batch_size 可被 num_generations 整除
python scripts/run_grpo.py --config configs/grpo.yaml
- DPO(直接偏好优化)
# 使用默认模型
python scripts/run_dpo.py --config configs/dpo.yaml
# 使用 SFT 检查点(推荐)
python scripts/run_dpo.py --config configs/dpo.yaml \
--model_path output/sft/DialoGPT-small_YYYYMMDD_HHMMSS/
- OpenEvolve(代码进化)
python scripts/run_evolve.py --config configs/evolve.local.yaml
---
## 数据格式(Data Formats)
- SFT:单样本或对话样式
```json
{
"text": "Human: 你好!\nAssistant: 你好! 很高兴为你服务。"
}
```
或
```json
{
"messages": [
{"role": "user", "content": "请介绍一下Python"},
{"role": "assistant", "content": "Python 是一种易读、功能强大的编程语言..."}
]
}
```
- DPO:偏好对(chosen/rejected)
```json
{
"prompt": "如何高效学习?",
"chosen": "制定计划-刻意练习-及时反馈-长期复盘。",
"rejected": "多看就会了。"
}
```
- GRPO:偏好/奖励所需的最小字段
```json
{
"prompt": "Explain overfitting.",
"meta": {"id": "sample-0001"}
}
```
> 训练时会按 `num_generations` 生成多个回复,并通过 `scripts/reward_functions.py` 计算奖励。
---
## 目录架构(Architecture)
```
deepevolve/
├── configs/
│ ├── sft.yaml # SFT 配置(CPU 优化、禁用混合精度)
│ ├── grpo.yaml # GRPO 配置(需 num_generations>=2 的合法组合)
│ ├── dpo.yaml # DPO 配置(Beta、loss_type 等)
│ └── evolve.local.yaml # OpenEvolve 配置(API、迭代等)
├── scripts/
│ ├── run_sft.py # 调用 `trl sft`,保存时间戳输出与配置
│ ├── run_grpo.py # 调用 `trl grpo`,含 CPU/内存优化参数
│ ├── run_dpo.py # 调用 `trl dpo`,支持 `--model_path`
│ ├── run_evolve.py # 运行 OpenEvolve 流程
│ └── reward_functions.py # GRPO 奖励函数(可自定义)
├── output/
│ ├── sft/
│ │ └── {model}_{timestamp}/ # config.yaml, checkpoint-*, logs/
│ ├── grpo/
│ │ └── {model}_{timestamp}/ # config.yaml, training_summary.txt, checkpoint-*
│ ├── dpo/
│ │ └── {model}_{timestamp}/ # config.yaml, training_summary.txt, checkpoint-*
│ └── evolve/
│ └── evolve_{timestamp}/ # config.yaml, best/, logs/
└── models/ # 本地模型缓存(可选)
```
---
## 关键约束与建议(Essentials)
- GRPO:
- 必须 `num_generations >= 2`
- 必须 `generation_batch_size % num_generations == 0`
- 内存紧张时:降低 `num_generations/group_size/max_*_length`,提高 `gradient_accumulation_steps`
- DPO:推荐先进行 SFT,再用 SFT 检查点进行 DPO
- SFT:CPU 下禁用混合精度;必要时减小 `batch_size/max_steps`
参考实现与参数见 TRL 项目:[https://github.com/huggingface/trl](https://github.com/huggingface/trl)About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published