🔍 Benchmark Function Calling

This repo contains a pipeline to benchmark Claude 3 haiku, Gemini 1.5 Flash and GPT4o on function calling. The dataset used is the one introduced by Salesforces (available at https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k)

Setup

Create a virtual environment:

 python -m venv env_function_calling
 source env_function_calling/bin/activate

GEMINI_API_KEY=your_gemini_api_key
OPENAI_API_KEY=your_openai_api_key
CLAUDE_API_KEY=your_claude_api_key

Run the main script to execute all function-calling benchmarks:
python -m core.main

Benchmark results will be saved in the results/ directory in .json format. This may take some times (around 45mn)

Explore and analyze the benchmark results using the notebook: Notebooks/data_viz.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Notebooks		Notebooks
core		core
data		data
results		results
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt