Skip to content
View jianguoz's full-sized avatar

Block or report jianguoz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

τ²-Bench-Verified is a corrected and verified version of the original τ²-bench benchmark. This release addresses issues discovered in the original dataset where task definitions, expected actions, …

Python 25 2 Updated Dec 15, 2025

Some reading notes edited in LaTeX. 一些学习笔记,使用LaTeX编辑.

Jupyter Notebook 78 5 Updated Jan 5, 2026

MCP-based Agent Deep Evaluation System

Python 142 16 Updated Sep 26, 2025

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

Python 4,574 335 Updated Jan 17, 2026

τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

Python 650 160 Updated Dec 18, 2025

An extremely fast Python package and project manager, written in Rust.

Rust 77,253 2,459 Updated Jan 19, 2026

Designing Multi-Agent Systems with Zero Supervision

Python 109 12 Updated Jul 8, 2025

xLAM: A Family of Large Action Models to Empower AI Agent Systems

Python 600 49 Updated Aug 21, 2025
Python 4 1 Updated Jan 14, 2025
Python 1,343 53 Updated Nov 21, 2024

An extensible benchmark for evaluating large language models on planning

PDDL 441 47 Updated Sep 17, 2025
Python 39 2 Updated May 2, 2024

A reading list on LLM based Synthetic Data Generation 🔥

1,510 91 Updated Jun 5, 2025

Chat with any codebase in under two minutes | Fully local or via third-party APIs

Python 1,261 117 Updated Nov 11, 2024
Jupyter Notebook 2,149 471 Updated Jan 9, 2026

Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]

Python 237 22 Updated Jan 3, 2026

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Python 9,481 744 Updated Jun 7, 2025

🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org

Python 15,599 1,717 Updated Jan 19, 2026

AndroidWorld is an environment and benchmark for autonomous agents

Python 589 123 Updated Jan 15, 2026

m&ms: A Benchmark to Evaluate Tool-Use for multi-step multi-modal tasks

Python 44 5 Updated Sep 26, 2024

Code and Data for Tau-Bench

Python 1,061 175 Updated Aug 28, 2025

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1,808 151 Updated Jun 17, 2025

The Fast Cross-Platform Package Manager

C++ 7,874 427 Updated Jan 16, 2026

A conda-forge distribution.

Shell 9,175 467 Updated Jan 1, 2026

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)

Python 267 27 Updated Apr 18, 2024

[ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

355 9 Updated Mar 22, 2024

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…

Python 34,079 5,381 Updated Jan 19, 2026

Chat language model that can use tools and interpret the results

Python 1,592 118 Updated Dec 3, 2025
Jupyter Notebook 641 83 Updated Nov 10, 2025
Next