Introduction
This document covers the design and developer experience of the TypeScript client library for Galileo. Here is the full Galileo TypeScript SDK (latest release).
Note: This library is in pre-release mode and may not be stable.
Installation
Initialization/Authentication
You can configure Galileo using environment variables:
# Scoped to an Organization
GALILEO_API_KEY=...
# (Optional) set a default Project
GALILEO_PROJECT=...
# (Optional) set a default Log Stream
GALILEO_LOG_STREAM=...
# (Optional) set a path to your custom Galileo console deployment
GALILEO_CONSOLE_URL=...
In Node.js, you can use process.env
to specify these variables:
process.env.GALILEO_API_KEY = "your-api-key";
process.env.GALILEO_PROJECT = "your-project";
process.env.GALILEO_LOG_STREAM = "your-log-stream";
Logging
OpenAI Client Wrapper
The simplest way to get started is to use our OpenAI client wrapper. This example will automatically produce a single-span trace in the Logstream UI:
import { OpenAI } from "openai";
import { wrapOpenAI } from "galileo";
const openai = wrapOpenAI(new OpenAI({ apiKey: process.env.OPENAI_API_KEY }));
// Use the wrapped client as you normally would
async function callOpenAI() {
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ content: "Say hello world!", role: "user" }],
});
return response;
}
// Call the function
callOpenAI();
Log Function Wrapper
The log
function wrapper allows you to wrap functions with spans of different types. This is useful for creating workflow spans with nested LLM calls or tool spans.
import { OpenAI } from "openai";
import { wrapOpenAI, flush, log, init } from "galileo";
async function runExample() {
const openai = wrapOpenAI(new OpenAI({ apiKey: process.env.OPENAI_API_KEY }));
// This will automatically create an llm span since we're using the `wrapOpenAI` wrapper
const callOpenAI = async (input: any) => {
const result = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ content: `Say hello ${input}!`, role: "user" }],
});
return result;
};
// Optionally initialize the logger if you haven't set GALILEO_PROJECT and GALILEO_LOG_STREAM environment variables
await init({
projectName: "my-project",
logStreamName: "my-log-stream",
});
const wrappedToolCall = log({ name: "tool span", spanType: "tool" }, (input: any) => {
return "tool call result";
});
const wrappedFunc = await log({ name: "workflow span" }, async (input: any) => {
const result = await callOpenAI(input);
return wrappedToolCall(result);
});
// This will create a workflow span with an llm span and a tool span
const result = await wrappedFunc("world");
await flush();
return result;
}
// Run the example
runExample();
Span Types
Here are the different span types:
- Workflow: A span that can have child spans, useful for nesting several child spans to denote a thread within a trace. If you wrap a parent function with
log
, calls that are made within that scope are automatically logged in the same trace.
- Llm: Captures the input, output, and settings of an LLM call. This span gets automatically created when our client library wrappers (OpenAI and Anthropic) are used. Cannot have nested children.
- Retriever: Contains the output documents of a retrieval operation.
- Tool: Captures the input and output of a tool call. Used to decorate functions that are invoked as tools.
GalileoLogger
For more advanced use cases, you can use the GalileoLogger directly:
import { GalileoLogger } from "galileo";
async function runLoggerExample() {
// You can set the GALILEO_PROJECT and GALILEO_LOG_STREAM environment variables
const logger = new GalileoLogger({
projectName: "my-project",
logStreamName: "my-log-stream",
});
console.log("Creating trace with spans...");
// Create a new trace
const trace = logger.startTrace({
input: "Example trace input", // input
output: undefined, // output (will be set later)
name: "Example Trace", // name
createdAt: Date.now() * 1000000, // createdAt in nanoseconds
durationNs: undefined, // durationNs
metadata: { source: "test-script" }, // metadata
tags: ["test", "example"], // tags
});
// Add a workflow span (parent span)
const workflowSpan = logger.addWorkflowSpan({
input: "Processing workflow", // input
output: undefined, // output (will be set later)
name: "Main Workflow", // name
durationNs: undefined, // durationNs
createdAt: Date.now() * 1000000, // createdAt in nanoseconds
metadata: { workflow_type: "test" }, // metadata
tags: ["workflow"], // tags
});
// Add an LLM span as a child of the workflow span
logger.addLlmSpan({
input: [{ role: "user", content: "Hello, how are you?" }], // input messages
output: {
role: "assistant",
content: "I am doing well, thank you for asking!",
}, // output message
model: "gpt-4o", // model name
name: "Chat Completion", // name
durationNs: 1000000000, // durationNs (1s)
metadata: { temperature: "0.7" }, // metadata
tags: ["llm", "chat"], // tags
});
// Conclude the workflow span
logger.conclude({
output: "Workflow completed successfully",
durationNs: 2000000000, // 2 seconds
});
// Conclude the trace
logger.conclude({
output: "Final trace output with all spans completed",
durationNs: 3000000000, // 3 seconds
});
// Flush the traces to Galileo
const flushedTraces = await logger.flush();
return flushedTraces;
}
// Run the example
runLoggerExample();
Prompts
Create and use a prompt template:
import { createPromptTemplate } from "galileo";
const template = await createPromptTemplate({
template: [
{ role: "system", content: "You are a great storyteller." },
{ role: "user", content: "Please write a short story about the following topic: {topic}" },
],
projectName: "my-project",
name: "storyteller-prompt",
});
You can also use an existing template:
import { getPromptTemplate } from "galileo";
async function retrievePromptTemplate() {
// Get a prompt template
const template = await getPromptTemplate({
projectName: "my-project",
name: "Hello name prompt",
});
return template;
}
Datasets
Creating and Using Datasets
You can create and use datasets for experiments:
import { getDataset } from "galileo";
const dataset = await getDataset(undefined, "names");
Experiments
Evaluating with Runner Function
You can use a runner function to run an experiment with a dataset:
import { runExperiment } from "galileo";
import { OpenAI } from "openai";
async function runFunctionExperiment() {
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const runner = async (input: any) => {
const result = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "system", content: "You are a great storyteller." },
{ role: "user", content: `Write a story about ${input["topic"]}` },
],
});
return [result.choices[0].message.content];
};
await runExperiment({
name: "story-function-experiment",
datasetName: "storyteller-dataset",
function: runner,
metrics: ["correctness"],
projectName: "my-project",
});
}
// Run the experiment
runFunctionExperiment();
Running an Experiment with a Prompt Template
import { runExperiment, getPromptTemplate, getDataset } from "galileo";
async function runPromptExperiment() {
const template = await getPromptTemplate({
projectName: "my-project",
name: "storyteller-prompt",
});
const dataset = await getDataset(undefined, "storyteller-dataset");
await runExperiment({
name: "Test Experiment",
dataset: dataset,
promptTemplate: template,
metrics: ["toxicity"],
projectName: "my-project",
});
}
Running an Experiment with Custom Dataset
You can also use a locally generated dataset with a runner function:
import { runExperiment } from "galileo";
import { OpenAI } from "openai";
async function runCustomDatasetExperiment() {
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const dataset = [{ input: "Spain", output: "Europe" }];
const runner = async (input: any) => {
const result = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "system", content: "You are a geography expert" },
{
role: "user",
content: `Which continent does the following country belong to: ${input["input"]}`,
},
],
});
return [result.choices[0].message.content];
};
await runExperiment({
name: "geography-experiment",
dataset: dataset,
function: runner,
metrics: ["correctness"],
projectName: "my-project",
});
}
// Run the experiment
runCustomDatasetExperiment();