Skip to main content
Response quality metrics help you measure how well your AI system answers user questions, follows instructions, and provides useful information. These metrics are key for building reliable, helpful, and user-friendly AI applications. Use these metrics when you want to:
  • Ensure your AI’s responses are factually correct and complete.
  • Check that the model follows instructions and uses retrieved information effectively.
  • Evaluate how well your system grounds answers in context or source material.
Below is a quick reference table of all response quality metrics:
NameDescriptionSupported NodesWhen to UseExample Use Case
Chunk Attribution UtilizationAssesses whether the response uses the retrieved chunks in its response, and properly attributes information to source documents.Retriever spanWhen implementing RAG systems and want to ensure proper attribution and that retrieved information is used efficiently.A legal research assistant that must cite specific cases and statutes when providing legal information.
CompletenessMeasures how thoroughly the response covers the relevant information available in the provided contextLLM spanWhen evaluating if responses fully address the user’s intent.A healthcare chatbot, when provided with a patient’s medical record as context, must include all relevant critical information from that record in its response.
Context AdherenceMeasures how well the response aligns with the provided context.LLM spanWhen you want to ensure the model is grounding its responses in the provided context.A financial advisor bot that must base investment recommendations on the client’s specific financial situation and goals.
Context Relevance (Query Adherence)Evaluates whether the retrieved context is relevant to the user’s query.Retriever spanWhen assessing the quality of your retrieval system’s results.An internal knowledge base search that retrieves company policies relevant to specific employee questions.
Correctness (factuality)Evaluates the factual accuracy of information provided in the response.LLM spanWhen accuracy of information is critical to your application.A medical information system providing drug interaction details to healthcare professionals.
Ground Truth AdherenceMeasures how well the response aligns with established ground truth.

This metric is only available for experiments as it needs ground truth set in your dataset.
TraceWhen evaluating model responses against known correct answers.A customer service AI that must provide accurate product specifications from an official catalog.
Instruction AdherenceAssesses whether the model followed the instructions in your prompt template.LLM spanWhen using complex prompts and need to verify the model is following all instructions.A content generation system that must follow specific brand guidelines and formatting requirements.

Next steps