Custom Evaluations

Custom evaluations allow you to define and track your own criteria for what makes a successful call or conversation. Since every business and use case is different, you can set unique success metrics that matter most to you—whether that’s booking an appointment, providing accurate information, or ensuring customer satisfaction. This flexibility means you get actionable insights based on your own definition of success, not a generic standard.

Custom evaluations are ideal if you want to measure AI agent performance against custom goals, your business has unique workflows or compliance needs, or you need to track outcomes that can’t be inferred automatically.

Set up custom evaluations

Step 1. Create a custom evaluation

On the left-hand sidebar, click Actions.

Select Custom Evaluations.

Click New Evaluation.

Fill in the details. Give your evaluation a unique, descriptive name. Write a prompt that describes the evaluation criteria (e.g., “Did the agent greet the caller by name and resolve the issue?”). Choose a category for how you want to score the call—options include Numeric, Descriptive, Likert Scale, or Pass/Fail. Finally, set the expected result based on the category you selected.

Once you’re done, click Save.

Step 2. Attach the evaluation to an agent

On the left-hand sidebar, click Agents.

Select the agent you want to attach the custom evaluation to.

Select Actions on the navigation sidebar.

Switch to the After the Call tab in the top panel.

Click Add Action and select Custom Evaluation. Check the box next to the custom evaluation you created and click Add action.

What problems does this solve

Custom evaluations eliminate one-size-fits-all metrics by letting you define exactly what success looks like for your business. They make it easy to prove ROI for your specific use case and support internal quality assurance and compliance.

Use cases

Industry	Example
Healthcare	Confirm if the bot successfully booked a patient appointment
Customer Support	Rate if the customer’s issue was fully resolved
Sales	Track if a follow-up call was scheduled
Compliance	Ensure required legal scripts were read during calls

Usage examples

Appointment Booking (Pass/Fail)

This evaluation checks if the bot successfully booked an appointment. Set the name to Appointment_booked, write a prompt like “Check if the bot successfully booked an appointment”, choose the Pass/Fail category, and set the expected result to True.

User Satisfaction (Numeric)

This evaluation rates overall user satisfaction on a numeric scale. Set the name to User_satisfaction, write a prompt like “Analyze the conversation and rate the user’s overall satisfaction with the call. Consider the tone, language, and resolution”, choose the Numeric category, and set the expected result to 8.

Compliance Check (Descriptive)

This evaluation verifies compliance with privacy requirements. Set the name to Privacy_notice, write a prompt like “Did the agent inform the caller about data privacy and consent requirements at the start of the call?”, choose the Descriptive category, and set the expected result to Excellent.

Customer Experience (Likert Scale)

This evaluation measures overall customer experience using a Likert scale. Set the name to Customer_experience, write a prompt like “The agent was friendly and professional and the user was happy with the interaction”, choose the Likert Scale category, and set the expected result to Strongly Agree.

Best Practices

Write clear and measurable prompts to avoid ambiguity—the more specific your criteria, the more accurate your results. Choose the simplest category that gets the job done: use Pass/Fail for binary outcomes, Numeric for ratings, and so on. Review evaluation results regularly and refine prompts for accuracy.

Troubleshooting

If your evaluation is not appearing, make sure you’ve saved the evaluation and attached it to an agent. If you’re seeing incorrect scoring, double-check that the expected result matches the scoring category. For API errors, ensure your JSON input follows the correct schema and matches the allowed category/result pairs.