Skip to main content
At Tuner, we believe a robust evaluation strategy must be built in layers, starting with the technical foundation and moving up to your unique business drivers. This is why we have built our entire platform around this 4-Layer Framework.

Layer 1: Technical Voice Metrics (The Foundation)

This first layer is the foundation of any high-quality voice agent. It measures the raw performance and conversational dynamics of the system. These are the table stakes—if these metrics are poor, the user experience will suffer regardless of how intelligent the agent is. Tuner provides these metrics out-of-the-box for every call.
MetricDescriptionTarget Benchmark
Time to First Audio (TTFA)The time from when the user finishes speaking to when the agent begins its audible response.< 800ms
Barge-In LatencyThe time it takes for the agent to stop speaking when a user interrupts.< 500ms
False Endpoint RateThe percentage of turns where the agent incorrectly cuts the user off.< 5%
Talk Ratio (Agent vs Caller)The balance of agent vs caller talk time across the conversation.40:60 to 50:50
Hallucination RateThe percentage of responses where the agent invents facts or provides incorrect information.< 1%

Layer 2: Business Foundation (Outcomes & Intents)

This is the most critical shift in moving to a business-first mindset, and it’s the core of the Tuner platform. Before you can evaluate performance, you must first define what you are trying to achieve. We enable this by classifying every call along two dimensions: Intent and Outcome.
  • Intent: Why did the customer call? (e.g., “Schedule Appointment,” “Check Order Status,” “File a Claim”).
  • Outcome: What happened on the call? (e.g., “Appointment Booked,” “User Hung Up,” “Transferred to Human”).
By defining and automatically classifying every call into an Intent-Outcome Matrix, Tuner creates the foundational dataset for all business-level analysis.

Layer 3: Industry-Specific Evals (The Context)

Once you have your business foundation, you can add context by layering on metrics specific to your industry. Our philosophy is that the most effective way to build your voice AI evaluation framework is to start with the established QA scorecards and KPIs used for human agents and translate them into automated checks in Tuner. For example:
  • Healthcare: A human agent is evaluated on their adherence to HIPAA. Your voice AI should be too. In Tuner, you can create a custom evaluation: “Did the AI verify at least two patient identifiers before providing any PHI?”
  • Debt Collection: A human agent must provide the “mini-Miranda” warning. Your voice AI must as well. In Tuner, you can create a custom evaluation: “Did the AI provide the required FDCPA disclosure within the first 30 seconds of the call?”
See our full list of best practices per industry: Go to Vertical Best Practices →

Layer 4: Business-Specific Evals (The Differentiator)

The final layer is a set of metrics unique to your specific business goals. This is where you move beyond industry standards and measure what creates a competitive advantage for your business. Tuner’s flexible Custom Evaluations feature is designed for this. For example:
  • If your brand is built on premium customer service, you might create a custom evaluation for “Agent Politeness and Empathy”.
  • A sales organization might create a custom check for “Upsell Attempted on Eligible Call”.
These business-specific evaluations are what connect your AI’s performance directly to your company’s bottom line, and Tuner gives you the tools to build them in plain English.