- Registered custom metrics: Metrics that can be shared across your organization
- Local metrics: Metrics that run in your local notebook environment
Registered custom metrics
Registered custom metrics are stored and run in Galileo’s environment and can be used across your organization.Creating a registered custom metric
You can create a registered custom metric either through the Python SDK or directly in the Galileo UI. Let’s walk through the UI approach:1
Navigate to the Metrics section
In the Galileo platform, go to the Metrics section and select the Create New Metric button in the top right corner.

2
Select the Code metric type
From the dialog that appears, choose the Code-powered metric type. This option allows you to write custom Python code to evaluate your LLM outputs.

3
Write your custom metric
Use the code editor to write your custom metric. The editor provides a template with the required functions and helpful comments to guide you.
The code editor allows you to write and test your metric directly in the browser. You’ll need to define at least the

scorer_fn
and aggregator_fn
functions as described below.4
Save your metric
After writing your custom metric code, select the Save button in the top right corner of the code editor. Your metric will be validated and, if there are no errors, it will be saved and become available for use across your organization.You can now select this metric when running evaluations.
1. The scorer function (scorer_fn
)
This function evaluates individual responses and returns a score:
**kwargs
to ensure forward compatibility. Here’s a complete example that measures the difference in length between the output and ground truth:
index
: Row index in the datasetnode_input
: Input to the nodenode_output
: Output from the nodenode_name
,node_type
,node_id
,tools
: Workflow/chain-specific parametersdataset_variables
: Key-value pairs from the dataset (includes ground truth)
2. the aggregator function (aggregator_fn
)
This function aggregates individual scores into summary metrics:
Optional functions
Score type function
float
).
Node type restriction
LLM credentials access
To access LLM credentials during scorer execution:scorer_fn
as a dictionary:
Complete example: response length scorer
Let’s create a custom metric that measures response length:Execution environment
Registered custom metrics run in a Python 3.10 environment with these libraries:Local metrics
A Local metric (or Local scorer) is a custom metric that you can attach to an experiment — just like a Galileo preset metric. The key difference is that a Local Metric lives in code on your machine, so you share it by sharing your code. Local Metrics are ideal for running isolated tests and refining outcomes when you need more control than built-in metrics offer. You can also use any library or custom Python code with your local metrics, including calling out to LLMs or other APIs.Galileo currently only supports Local scorers in Python
Local scorer components
A Local scorer consists of three main parts:-
Scorer Function
Receives a single
Span
orTrace
containing the LLM input and output, and computes a score. The exact measurement is up to you — for example, you might measure the length of the output or rate it based on the presence/absence of specific words. -
Aggregator Function
Aggregates the scores generated by the Scorer Function and returns a final metric value. This function receives a list of the type returned by your Scorer. For instance, if your Scorer returns a
str
, the Aggregator will be called with alist[str]
. The Aggregator’s return value can also be any type (e.g.,str
,bool
,int
), depending on how you want to represent the final metric. -
LocalMetricConfig[type]
A typed callable provided by Galileo’s Python SDK that combines your Scorer and Aggregator into a custom metric.- The generic
type
should match the type returned by your Aggregator. - Example: If your Scorer returns
bool
values, you would useLocalMetricConfig[bool](…)
, and your Aggregator must accept alist[bool]
and return abool
.
- The generic
LocalMetricConfig
, running the experiment is as simple as calling run_experiment
. The results appear alongside Galileo’s built-in metrics, so you can compare, visualize, and analyze everything in one place.
With local metrics, you have full control over how you measure LLM behavior—unlocking deeper insights and more targeted evaluations for your AI applications.
Create a local metric
Learn how to create a local metric in Python to use in your experiments
Comparison: registered custom metrics vs. local metrics
Feature | Registered Custom Metrics | Local Metrics |
---|---|---|
Creation | Python client, activated via UI | Python client only |
Sharing | Organization-wide | Current project only |
Environment | Server-side | Local Python environment |
Libraries | Limited to Galileo environment | Any available library |
Resources | Restricted by Galileo | Local resources |
Common use cases
Custom metrics are ideal for:- Heuristic evaluation: Checking for specific patterns, keywords, or structural elements
- Model-guided evaluation: Using pre-trained models to detect entities or LLMs to grade outputs
- Business-specific metrics: Measuring domain-specific quality indicators
- Comparative analysis: Comparing outputs against ground truth or reference data
Simple example: sentiment scorer
Here’s a simple custom metric that evaluates the sentiment of responses:- Counts positive and negative words in responses
- Calculates a sentiment score between -1 (negative) and 1 (positive)
- Aggregates results to show the distribution of positive, neutral, and negative responses
Next steps
Create custom LLM-as-a-judge metrics
Learn how to create custom LLM-as-a-judge metrics in the Galileo console or in code.
LLM-as-a-Judge Prompt Engineering Guide
Learn best practices for prompt engineering with custom LLM-as-a-judge metrics.
Metrics overview
Explore Galileo’s comprehensive metrics framework for evaluating and improving AI system performance across multiple dimensions.
Create a local metric
Learn how to create a local metric in Python to use in your experiments
Run experiments
Learn how to run experiments in Galileo using the Galileo SDKs and custom metrics.