Learn how to run experiments in Galileo using the Galileo SDKs
run_experiment
Python SDK docs, or the runExperiment
TypeScript SDK docs for more details).
Experiments take a dataset, and can either pass it to a prompt template, or to a custom function. This custom function can go from a simple call to an LLM right up to a full agentic workflow. Experiments also take a list of one or more metrics to use to evaluate the traces. This can be one of the out-of-the-box metrics using the constants provided by the Galileo SDK, or the name of a custom metric.
For each row in a dataset, a new trace is created, and either the prompt template is logged as an LLM span, or every span created in the custom function is logged to that trace.
If you are building experiments into your production application, you will need to enable a way to call the experiment runner. For example, you can do this inside a unit test.
"Compliance - do not recommend any financial actions"
:
log
decorator, or a third-party SDK integration, then all the spans created by these will be logged to the experiment.
This example uses the log
decorator. The workflow span created by the log decorator will be logged to the experiment.
run_experiment
call, a logger is created by the experiment runner, and a trace is started. This logger can be passed through the application, accessed using the @log
decorator or by calling galileo_context.get_logger_instance()
in Python, or getLogger
in TypeScript.
You will need to change your code to use this instead of creating a new logger and starting a new trace.
galileo_context.get_logger_instance()
(Python) or getLogger()
(TypeScript) to get the current logger.current_parent()
(Python) or currentParent
(TypeScript) method on the logger. This will return None
/undefined
if there isn’t an active trace.output
column.
output
column when using other metrics, the value is not used in the calculation of the metric, but is surfaced in the console. This can be helpful for providing reference output for manual review.