Conversation Quality

Conversation Quality is a binary metric that assesses whether a chatbot interaction left the user feeling satisfied and positive or frustrated and dissatisfied, based on tone, engagement, and overall experience.

The Conversation Quality metric evaluates user satisfaction across an entire chatbot session by analyzing tone, engagement, and sentiment. It classifies each conversation as GOOD or BAD depending on whether the user’s overall experience reflects positive engagement or frustration directed at the bot. The metric focuses on conversational flow rather than task success, emphasizing how naturally and politely the user and bot interact. It excludes non-textual or purely action-based agent outputs (e.g., button clicks). This is a boolean metric, returning a confidence score that the conversation quality is good. The score ranges from 0% (no confidence the conversation quality is good) to 100% (complete confidence that the conversation quality is good).

Conversation Quality at a glance

Property	Description
Name	Conversation Quality
Category	Agentic AI
Can be applied to	Session
LLM-as-a-judge Support	✅
Luna Support	❌
Protect Runtime Protection	❌
Value Type	Boolean shown as a percentage confidence score

When to use this metric

When to Use This Metric

Conversation Quality is a useful metric when working with chat bots or other tools with lots of user interaction

Chat tools where sentiment is critical: Evaluating chatbot performance where capturing user sentiment is critical, such as customer support or counseling applications.

Customer satisfaction: Monitoring and improving overall user satisfaction in conversational AI systems.

User experience quality: Comparing different models or system versions based on experience quality rather than task completion.

Score interpretation

Expected Score: 80%-100%.

060%100%

Poor

Many conversations indicate frustration, impatience, or dissatisfaction directed at the bot

Fair

Excellent

Most conversations reflect positive user sentiment, polite engagement, and satisfaction

How to improve Conversation Quality scores

Some techniques to improve Conversation Quality scores are:

Ensure bots provide clear, empathetic, and concise responses
Detect and mitigate repeated clarification loops
Train models to de-escalate external frustration effectively
Log complete sessions to allow accurate tone assessment

Common issues that can cause low scores are:

Mislabeling external frustration as bot-directed
Incomplete logs
Abrupt session truncation

Overview

Get Started

Logging and Monitoring

Experiments

Runtime Protection

Metrics

Annotations

Integrations

Security

References

Conversation Quality