SGLang + KT 自动化性能测试工具

自动化测试 SGLang + KTransformers 在不同配置下的 TTFT（Time To First Token）和 TPOT（Time Per Output Token）性能。

功能特性

自动管理 SGLang 服务器生命周期（启动、等待就绪、关闭）
支持测试多个模型、多种参数组合
自动生成参数笛卡尔积，减少手动配置
每个测试配置独立保存结果
输出 JSON + CSV 格式报告

安装依赖

pip install pyyaml requests matplotlib

快速开始

1. 配置模型

编辑 config.yaml，在 models 部分配置你的模型：

models:
  - name: "DeepSeek-V3.2"
    model_path: "/mnt/data/models/DeepSeek-V3.2"
    tokenizer_path: "/mnt/data/models/DeepSeek-V3.2"
    kt_weight_path: "/mnt/data/models/DeepSeek-V3.2"

2. 运行测试

# 测试单个模型
python benchmark.py --config config.yaml --model DeepSeek-V3.2

# 测试多个模型
python benchmark.py --config config.yaml --model DeepSeek-V3.2 --model MiMo-V2-Flash

# 指定输出目录
python benchmark.py --config config.yaml --model DeepSeek-V3.2 --output-dir ./my_results

# 预览测试计划（不实际运行）
python benchmark.py --config config.yaml --model DeepSeek-V3.2 --dry-run

命令行参数

参数	必需	说明
`--config`	是	配置文件路径
`--model`	是	要测试的模型名称（可多次指定）
`--output-dir`	否	输出目录（默认: `result/<timestamp>`）
`--port`	否	覆盖服务器端口
`--dry-run`	否	只打印测试计划，不实际运行

配置文件说明

parameter_grid - 参数组合

所有参数都支持列表形式，脚本会自动生成笛卡尔积：

parameter_grid:
  # 服务器参数（变化时需要重启服务器）
  tp_size: [2, 4]                    # tensor parallel size
  kt_num_gpu_experts: [0, 1, 2]      # GPU 专家数量
  kt_cpuinfer: [96]                  # CPU 推理核心数
  kt_threadpool_count: [2]           # 线程池数量
  kt_method: ["FP8"]              # KT 方法

  # Benchmark 参数（不需要重启服务器）
  # input_output: [input_len, output_len] 作为一个整体参与笛卡尔积
  input_output:
    - [128, 2]
    - [512, 2]
    - [1024, 2]
    - [2048, 16]
    - [4096, 16]
  num_prompts: [5]
  max_concurrency: [1]

参数分类说明：

服务器参数：tp_size, kt_num_gpu_experts, kt_cpuinfer, kt_threadpool_count, kt_method
- 这些参数变化时需要重启 SGLang 服务器
Benchmark 参数：input_output, num_prompts, max_concurrency
- input_output 是 [input_len, output_len] 的配对，作为整体参与笛卡尔积
- 这些参数变化时只需要重新运行 benchmark，无需重启服务器

测试数量计算

总服务器重启次数 = 模型数 × 服务器参数组合数
总 benchmark 运行次数 = 服务器重启次数 × benchmark 参数组合数

例如：

2 个模型
tp_size: [2, 4] × kt_num_gpu_experts: [0, 1, 2] = 6 种服务器配置
input_output: 5 组 × num_prompts: 1 × max_concurrency: 2 = 10 种 benchmark 配置

则：

服务器重启: 2 × 6 = 12 次
Benchmark 运行: 12 × 10 = 120 次

输出结构

result/
└── 20241219_120000/           # 时间戳目录
    ├── config_used.yaml           # 使用的配置副本
    ├── test_plan.json             # 测试计划
    ├── summary.json               # 汇总报告
    ├── logs/                      # 服务器日志
    │   └── server_DeepSeek-V3.2_tp4_gpuexp1_RAWFP8_xxx.log
    └── DeepSeek-V3.2_tp4_gpuexp1_RAWFP8_20241219_120100/
        ├── config.json            # 该测试的服务器配置
        ├── results.json           # 详细测试结果
        └── results.csv            # CSV 格式结果

results.csv 字段说明

字段	说明
`input_len`	输入 token 长度
`output_len`	输出 token 长度
`num_prompts`	请求数量
`max_concurrency`	最大并发数
`success`	是否成功
`ttft_avg_ms`	TTFT 平均值 (ms)
`ttft_p99_ms`	TTFT P99 (ms)
`tpot_avg_ms`	TPOT 平均值 (ms)
`tpot_p99_ms`	TPOT P99 (ms)
`throughput_req_per_sec`	请求吞吐量 (req/s)
`throughput_output_tok_per_sec`	输出 token 吞吐量 (tok/s)
`e2e_latency_avg_ms`	端到端延迟平均值 (ms)

示例

示例 1: 对比不同 GPU 专家数量

parameter_grid:
  tp_size: [4]
  kt_num_gpu_experts: [0, 1, 2, 4]
  input_output:
    - [1024, 16]
    - [2048, 16]
    - [4096, 16]
  num_prompts: [10]
  max_concurrency: [1]

python benchmark.py --config config.yaml --model DeepSeek-V3.2

示例 2: 对比不同 TP 大小和并发数

parameter_grid:
  tp_size: [1, 2, 4, 8]
  kt_num_gpu_experts: [1]
  input_output:
    - [2048, 64]
  num_prompts: [20]
  max_concurrency: [1, 4, 8]

示例 3: 对比多个模型

python benchmark.py --config config.yaml \
  --model DeepSeek-V3.2 \
  --model MiMo-V2-Flash \
  --model MiniMax-M2

文件说明

文件	说明
`benchmark.py`	主入口脚本
`server_manager.py`	SGLang 服务器生命周期管理
`bench_runner.py`	Benchmark 执行器
`result_parser.py`	结果解析（从 sglang.bench_serving 输出提取指标）
`report_generator.py`	报告生成器
`config.yaml`	配置文件

注意事项

服务器启动超时：默认 600 秒，可在配置中调整 server_startup_timeout
GPU 资源：确保有足够的 GPU 资源支持指定的 tp_size
模型路径：确保 model_path, tokenizer_path, kt_weight_path 都存在
端口占用：确保配置的端口未被占用

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGLang + KT 自动化性能测试工具

功能特性

安装依赖

快速开始

1. 配置模型

2. 运行测试

命令行参数

配置文件说明

parameter_grid - 参数组合

测试数量计算

输出结构

results.csv 字段说明

示例

示例 1: 对比不同 GPU 专家数量

示例 2: 对比不同 TP 大小和并发数

示例 3: 对比多个模型

文件说明

注意事项

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
bench_runner.py		bench_runner.py
benchmark.py		benchmark.py
config.yaml		config.yaml
report_generator.py		report_generator.py
result_parser.py		result_parser.py
server_manager.py		server_manager.py

Folders and files

Latest commit

History

Repository files navigation

SGLang + KT 自动化性能测试工具

功能特性

安装依赖

快速开始

1. 配置模型

2. 运行测试

命令行参数

配置文件说明

parameter_grid - 参数组合

测试数量计算

输出结构

results.csv 字段说明

示例

示例 1: 对比不同 GPU 专家数量

示例 2: 对比不同 TP 大小和并发数

示例 3: 对比多个模型

文件说明

注意事项

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages