Evaluate the accuracy, completeness, and relevance of your AI’s responses using Galileo’s response quality metrics
Name | Description | When to Use | Example Use Case |
---|---|---|---|
Chunk Attribution | Assesses whether the response properly attributes information to source documents. | When implementing RAG systems and want to ensure proper attribution. | A legal research assistant that must cite specific cases and statutes when providing legal information. |
Chunk Utilization | Measures how effectively the model uses the retrieved chunks in its response. | When optimizing RAG performance to ensure retrieved information is used efficiently. | A technical support chatbot that needs to incorporate relevant product documentation in troubleshooting responses. |
Completeness | Measures whether the response addresses all aspects of the user’s query. | When evaluating if responses fully address the user’s intent. | A healthcare chatbot that must address all symptoms mentioned by a patient when suggesting next steps. |
Context Adherence | Measures how well the response aligns with the provided context. | When you want to ensure the model is grounding its responses in the provided context. | A financial advisor bot that must base investment recommendations on the client’s specific financial situation and goals. |
Context Relevance (Query Adherence) | Evaluates whether the retrieved context is relevant to the user’s query. | When assessing the quality of your retrieval system’s results. | An internal knowledge base search that retrieves company policies relevant to specific employee questions. |
Correctness (factuality) | Evaluates the factual accuracy of information provided in the response. | When accuracy of information is critical to your application. | A medical information system providing drug interaction details to healthcare professionals. |
Ground Truth Adherence | Measures how well the response aligns with established ground truth. This metric is only available for experiments as it needs ground truth set in your dataset. | When evaluating model responses against known correct answers. | A customer service AI that must provide accurate product specifications from an official catalog. |
Instruction Adherence | Assesses whether the model followed the instructions in your prompt template. | When using complex prompts and need to verify the model is following all instructions. | A content generation system that must follow specific brand guidelines and formatting requirements. |