Out-of-the-box metrics reference
The table below summarizes gives the constants used in code to access each metric. To use these metrics, import the relevant enum.LLM-as-a-judge Metrics
- Python
- TypeScript
| Metric | Enum Value |
|---|---|
| Action Advancement | GalileoScorers.action_advancement |
| Action Completion | GalileoScorers.action_completion |
| Agent Efficiency | GalileoScorers.agent_efficiency |
| Agent Flow | GalileoScorers.agent_flow |
| BLEU | GalileoScorers.bleu |
| Chunk Attribution Utilization | GalileoScorers.chunk_attribution_utilization |
| Completeness | GalileoScorers.completeness |
| Context Adherence | GalileoScorers.context_adherence |
| Context Relevance (Query Adherence) | GalileoScorers.context_relevance |
| Conversation Quality | GalileoScorers.conversation_quality |
| Correctness (factuality) | GalileoScorers.correctness |
| Ground Truth Adherence | GalileoScorers.ground_truth_adherence |
| Instruction Adherence | GalileoScorers.instruction_adherence |
| PII (personally identifiable information) | GalileoScorers.input_pii, GalileoScorers.output_pii |
| Prompt Injection | GalileoScorers.prompt_injection |
| Prompt Perplexity | GalileoScorers.prompt_perplexity |
| ROUGE | GalileoScorers.rouge |
| Sexism / Bias | GalileoScorers.input_sexism, GalileoScorers.output_sexism |
| Tone | GalileoScorers.input_tone, GalileoScorers.output_tone |
| Tool Errors | GalileoScorers.tool_error_rate |
| Tool Selection Quality | GalileoScorers.tool_selection_quality |
| Toxicity | GalileoScorers.input_toxicity, GalileoScorers.output_toxicity |
| User Intent Change | GalileoScorers.user_intent_change |
Luna-2 metrics
If you are using the Galileo Luna-2 model, then use these metric values.- Python
- TypeScript
| Metric | Enum Value |
|---|---|
| Action Advancement | GalileoScorers.action_advancement_luna |
| Action Completion | GalileoScorers.action_completion_luna |
| Chunk Attribution Utilization | GalileoScorers.chunk_attribution_utilization_luna |
| Completeness | GalileoScorers.completeness_luna |
| Context Adherence | GalileoScorers.context_adherence_luna |
| PII (personally identifiable information) | GalileoScorers.input_pii, GalileoScorers.output_pii |
| Prompt Injection | GalileoScorers.prompt_injection_luna |
| Sexism / Bias | GalileoScorers.input_sexism_luna, GalileoScorers.output_sexism_luna |
| Tone | GalileoScorers.input_tone, GalileoScorers.output_tone |
| Tool Errors | GalileoScorers.tool_error_rate_luna |
| Tool Selection Quality | GalileoScorers.tool_selection_quality_luna |
| Toxicity | GalileoScorers.input_toxicity_luna, GalileoScorers.output_toxicity_luna |
| Uncertainty | GalileoScorers.uncertainty |
How do I use metrics in experiments?
Therun experiment function (Python, TypeScript) takes a list of metrics as part of its arguments.
Preset metrics
Supply a list of one or more metric names into therun_experiment function as shown below:
Custom metrics
You can use custom metrics in the same way as Galileo’s preset metrics. At a high level, this involves the following steps:- Create your metric in the Galileo Console (or in code). Your custom metric will return a numerical score based on its input.
- Pass the name of your new metric into the
run experiment, like in the example below.
"Compliance - do not recommend any financial actions":

Ground truth data
Ground truth is the authoritative, validated answer or label used to benchmark model performance. For LLM metrics, this often means a gold-standard answer, fact, or supporting evidence against which outputs are compared. The following metrics require ground truth data to compute their scores, as they involve direct comparison to a reference answer, label, or fact.These metrics are only supported in experiments, as they require the ground truth to be set in the dataset used by the experiment.
Are metrics LLM-agnostic?
Yes, all metrics are designed to work across any LLM integrated with Galileo.Next steps
Metrics Overview
Explore Galileo’s comprehensive metrics framework for evaluating and improving AI system performance across multiple dimensions.
Experiments Overview
Learn how to use datasets and experiments to improve your application.
Run experiments
Learn how to run experiments in Galileo using the Galileo SDKs and custom metrics.