Evaluate factual accuracy in AI outputs using Galileo Guardrail Metrics to detect and prevent hallucinations in your AI systems
Correctness measures whether a given model response contains factually accurate information.
Correctness is a continuous metric ranging from 0 to 1:
01
Low Correctness
The response contains factual errors
High Correctness
The response is factually accurate
This metric is particularly valuable for uncovering open-domain hallucinations: factual errors that don’t relate to any specific documents or context provided to the model.
It’s important to understand the distinction between related metrics:
Correctness: Measures whether a model response has factually correct information, regardless of whether that information is contained in the provided context.
Context Adherence: Measures whether the response adheres specifically to the information provided in the context.
Example: In a text-to-SQL scenario, a response could be factually correct (high Correctness) but not derived from the provided context (low Context Adherence). Conversely, a response could faithfully represent the context
(high Context Adherence) but contain factual errors if the context itself is incorrect.
For critical applications, implement automated fact-checking against trusted knowledge bases or databases.
Use Grounding Techniques
Instruct models to ground their responses in verifiable information and cite sources when possible.
Monitor Domain-Specific Accuracy
Track Correctness scores across different knowledge domains to identify areas where your model may be less reliable.
Create Factual Guardrails
Develop domain-specific guardrails that can catch common factual errors before they reach users.
When optimizing for Correctness, remember that even human experts can disagree on certain facts. Consider implementing confidence levels for responses, especially in domains with evolving knowledge or subjective elements.