50+ 生产级评估器 (Graders) 50+ Production-grade Graders
提供覆盖 Agent 生命周期、LLM 通用评估、多模态(图像/视频)、代码生成与数学推理等多场景的评估器。每个 Grader 都经过基准数据集验证,确保评估结果的准确性和可靠性。 Comprehensive graders covering Agent lifecycle, LLM evaluation, multimodal (image/video), code generation, and math reasoning. Each grader is validated against benchmark datasets.