Layer 1: Technical Voice Metrics (The Foundation)
This first layer is the foundation of any high-quality voice agent. It measures the raw performance and conversational dynamics of the system. These are the table stakes—if these metrics are poor, the user experience will suffer regardless of how intelligent the agent is. Tuner provides these metrics out-of-the-box for every call.| Metric | Description | Target Benchmark |
|---|---|---|
| Time to First Audio (TTFA) | The time from when the user finishes speaking to when the agent begins its audible response. | < 800ms |
| Barge-In Latency | The time it takes for the agent to stop speaking when a user interrupts. | < 500ms |
| False Endpoint Rate | The percentage of turns where the agent incorrectly cuts the user off. | < 5% |
| Talk Ratio (Agent vs Caller) | The balance of agent vs caller talk time across the conversation. | 40:60 to 50:50 |
| Hallucination Rate | The percentage of responses where the agent invents facts or provides incorrect information. | < 1% |
Layer 2: Business Foundation (Outcomes & Intents)
This is the most critical shift in moving to a business-first mindset, and it’s the core of the Tuner platform. Before you can evaluate performance, you must first define what you are trying to achieve. We enable this by classifying every call along two dimensions: Intent and Outcome.- Intent: Why did the customer call? (e.g., “Schedule Appointment,” “Check Order Status,” “File a Claim”).
- Outcome: What happened on the call? (e.g., “Appointment Booked,” “User Hung Up,” “Transferred to Human”).
Layer 3: Industry-Specific Evals (The Context)
Once you have your business foundation, you can add context by layering on metrics specific to your industry. Our philosophy is that the most effective way to build your voice AI evaluation framework is to start with the established QA scorecards and KPIs used for human agents and translate them into automated checks in Tuner. For example:- Healthcare: A human agent is evaluated on their adherence to HIPAA. Your voice AI should be too. In Tuner, you can create a custom evaluation: “Did the AI verify at least two patient identifiers before providing any PHI?”
- Debt Collection: A human agent must provide the “mini-Miranda” warning. Your voice AI must as well. In Tuner, you can create a custom evaluation: “Did the AI provide the required FDCPA disclosure within the first 30 seconds of the call?”
See our full list of best practices per industry: Go to Vertical Best Practices →
Layer 4: Business-Specific Evals (The Differentiator)
The final layer is a set of metrics unique to your specific business goals. This is where you move beyond industry standards and measure what creates a competitive advantage for your business. Tuner’s flexible Custom Evaluations feature is designed for this. For example:- If your brand is built on premium customer service, you might create a custom evaluation for “Agent Politeness and Empathy”.
- A sales organization might create a custom check for “Upsell Attempted on Eligible Call”.