A sophisticated AI-powered voice agent testing and self-improvement platform built with Next.js, Twilio, and Anthropic Claude. This application enables automated testing of voice agents using real phone calls, comprehensive analysis, and AI-driven script improvement.
- Claude Opus 4.1: Advanced AI model for natural voice conversations
- Real Phone Calls: Actually dials customers using Twilio Programmable Voice
- Speech Recognition: Real-time speech-to-text with confidence scoring
- Dynamic Responses: Context-aware, natural conversation flow
- Professional Scripts: Specialized for debt collection scenarios
- Persona Generation: AI-generated diverse customer profiles for testing
- Real Voice Testing: Actual phone calls with real customers
- Comprehensive Metrics: 5-dimensional performance scoring
- Batch Testing: Run multiple tests with different personas
- Conversation Tracking: Full conversation logging and analysis
- Repetition Score: Measures agent's tendency to repeat responses
- Negotiation Score: Evaluates flexibility and payment option variety
- Relevance Score: Assesses response appropriateness to customer input
- Empathy Score: Measures emotional intelligence and understanding
- Overall Score: Composite performance metric (0-100)
- AI-Driven Analysis: Identifies failure points and improvement areas
- Script Rewriting: Automatically generates improved agent scripts
- Iterative Testing: Continuous improvement through multiple iterations
- Performance Tracking: Monitors score improvements over time
- Script Persistence: Maintains improved scripts across sessions
- Real-time Logging: Comprehensive console logging for debugging
- Error Handling: Robust error handling with fallback responses
- Webhook Management: Dynamic ngrok URL handling for local development
- State Persistence: localStorage for script and test data persistence
- Modern UI: Clean, responsive interface with Tailwind CSS
- Next.js 15: React framework with App Router
- React 19: Latest React with hooks and modern patterns
- TypeScript: Type-safe development
- Tailwind CSS: Utility-first CSS framework
- Next.js API Routes: Serverless API endpoints
- Anthropic Claude: AI model for conversations and analysis
- Twilio Programmable Voice: Phone call infrastructure
- TwiML: Twilio Markup Language for call control
- ngrok: Local tunnel for webhook testing
- ESLint: Code linting and formatting
- Yarn: Package management
Before running this application, ensure you have:
- Node.js: Version 18 or higher
- Yarn: Package manager
- Twilio Account: Sign up here
- Anthropic Account: Sign up here
- ngrok: Download here
git clone https://github.com/ShivankK26/AI-Voice-Agent .
yarn installCreate a .env.local file in the root directory:
# Twilio Configuration
TWILIO_PHONE_NUMBER=+1XXXXXXXXXX
ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AUTH_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Anthropic AI Configuration
ANTHROPIC_API_KEY=sk-ant-api03-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Base URL (update with your ngrok URL)
NEXT_PUBLIC_BASE_URL=https://your-ngrok-url.ngrok-free.app# Terminal 1: Start Next.js
yarn dev
# Terminal 2: Start ngrok tunnel
ngrok http 3000- Go to Twilio Console
- Navigate to Phone Numbers β Manage β Active numbers
- Click on your phone number
- Set the webhook URL to:
https://your-ngrok-url.ngrok-free.app/api/call/interactive - Set HTTP method to POST
- Navigate to
/voice-testing - Click "Generate Personas"
- The system creates 5 diverse customer profiles:
- Different ages, occupations, and personalities
- Various financial situations and payment capabilities
- Realistic debt collection scenarios
- Select a Persona: Choose from generated personas
- Enter Phone Number: Use format
+[country code][number] - Set Test Duration: 120-300 seconds (default: 120s)
- Click "Run Voice Test"
- Monitor Progress: Watch real-time conversation logs
- Review Results: See detailed metrics and analysis
- Generate Personas first
- Enter Phone Number for testing
- Click "Run Batch Tests"
- Monitor All Tests: System runs tests sequentially
- Review Comprehensive Results: Compare performance across personas
- Repetition Score (0-100): Lower is better
- Negotiation Score (0-100): Higher is better
- Relevance Score (0-100): Higher is better
- Empathy Score (0-100): Higher is better
- Overall Score (0-100): Weighted average
- Issues Found: Specific problems identified
- Recommendations: Actionable improvement suggestions
- Conversation Summary: Key interaction points
- Performance Insights: Detailed breakdown
- Run Voice Tests: Generate test results first
- Click "Improve Agent Script"
- AI Analysis: System analyzes all test results
- Script Generation: Creates improved agent script
- Persistence: Script saved to localStorage
- Iterative Testing: Run new tests with improved script
- Issue Identification: Finds common failure points
- Script Enhancement: Addresses specific problems
- Performance Prediction: Estimates score improvements
- Change Tracking: Logs all improvements made
src/
βββ app/
β βββ api/
β β βββ ai/
β β β βββ conversation/
β β β βββ route.ts # AI conversation handling
β β βββ call/
β β β βββ interactive/
β β β β βββ route.ts # Twilio interactive webhook
β β β βββ recording/
β β β β βββ route.ts # Call recording webhook
β β β βββ status/
β β β β βββ route.ts # Call status webhook
β β β βββ route.ts # Call initiation
β β βββ testing/
β β β βββ generate-personas/
β β β β βββ route.ts # AI persona generation
β β β βββ self-correct/
β β β β βββ route.ts # Script improvement
β β β βββ voice-test/
β β β βββ route.ts # Voice test orchestration
β β βββ token/
β β βββ route.ts # LiveKit token generation
β βββ room/
β β βββ page.tsx # Voice agent interface
β βββ voice-testing/
β β βββ page.tsx # Testing platform UI
β βββ globals.css # Global styles
β βββ layout.tsx # Root layout
β βββ page.tsx # Landing page
βββ components/
β βββ AnthropicDebtCollectionAgent.tsx
βββ lib/
β βββ test-tracker.ts # Conversation tracking
βββ types/ # TypeScript definitions
Initiates outbound phone calls.
Request Body:
{
"phoneNumber": "+1234567890",
"customerName": "John Doe",
"amount": "$1,250.00",
"roomName": "test-room",
"script": "Custom agent script..."
}Handles Twilio interactive webhooks for speech recognition.
Query Parameters:
script: URL-encoded agent script
Generates diverse customer personas for testing.
Request Body:
{
"count": 5
}Orchestrates a complete voice test.
Request Body:
{
"persona": {
"name": "John Doe",
"age": 35,
"occupation": "Teacher",
"personality": "Cooperative"
},
"phoneNumber": "+1234567890",
"testDuration": 120,
"script": "Agent script..."
}Analyzes test results and generates improved scripts.
Request Body:
{
"testResults": [...],
"currentScript": "Current script...",
"iteration": 1
}