Datasets in Galileo allow you to store and manage collections of examples for testing, evaluation, and experimentation. They are essential for running experiments and evaluating the performance of your LLM applications.

Creating Datasets

You can create a new dataset using the create_dataset function:

from galileo.datasets import create_dataset

# Create a dataset with test data
test_data = [
    {
        "input": "Which continent is Spain in?",
        "output": "Europe",
    },
    {
        "input": "Which continent is Japan in?",
        "output": "Asia",
    },
]

dataset = create_dataset(
    name="countries", 
    content=test_data,
)

Getting Existing Datasets

You can retrieve an existing dataset using the get_dataset function:

from galileo.datasets import get_dataset

# Get a dataset by name
dataset = get_dataset(name="countries")

# Get a dataset by ID
dataset = get_dataset(id="dataset-id")

# Get a dataset with its content
dataset = get_dataset(name="countries", with_content=True)

Adding to Existing Datasets

You can add rows to an existing dataset using the add_rows method:

from galileo.datasets import get_dataset

# Get an existing dataset
dataset = get_dataset(name="countries")

# Add new rows to the dataset
dataset.add_rows([
    {
        "input": "Which continent is Morocco in?",
        "output": "Africa",
    },
    {
        "input": "Which continent is Australia in?",
        "output": "Oceania",
    },
])

Listing Datasets

You can list all available datasets using the list_datasets function:

from galileo.datasets import list_datasets

# List all datasets (limited to 100 by default)
datasets = list_datasets()

# List datasets with a custom limit
datasets = list_datasets(limit=50)

Deleting Datasets

You can delete a dataset using the delete_dataset function:

from galileo.datasets import delete_dataset

# Delete a dataset by name
delete_dataset(name="countries")

# Delete a dataset by ID
delete_dataset(id="dataset-id")

Using Datasets in Experiments

Datasets are primarily used for running experiments to evaluate the performance of your LLM applications:

from galileo.datasets import get_dataset
from galileo.experiments import run_experiment
from galileo.prompts import get_prompt_template

# Get an existing dataset
dataset = get_dataset(name="countries")

# Get an existing prompt template
prompt = get_prompt_template(
    project="my-project",
    name="geography-prompt"
)

# Run an experiment with the dataset and prompt
results = run_experiment(
    "geography-experiment",
    dataset=dataset,
    prompt=prompt,
    metrics=["correctness"],
    project="my-project",
)

Best Practices for Dataset Management

When working with datasets in Galileo, consider these tips:

  1. Start Small: Begin with a core set of representative test cases
  2. Grow Incrementally: Add new test cases as you discover edge cases or failure modes
  3. Use Consistent Formats: Maintain a consistent format for your datasets to make them easier to use
  4. Include Expected Outputs: Always include expected outputs for evaluation
  5. Document Your Datasets: Add descriptions and metadata to make it clear what each dataset is for

By following these practices and utilizing Galileo’s dataset management features, you can build a robust and maintainable test suite that grows with your application’s needs.

List the versions of a dataset and grab a particular version:

from galileo.datasets import get_dataset

dataset = get_dataset(
    name="countries",    
    project="my-project",
)

print(dataset.modified_at)

versions = dataset.get_version_history()

oldest_version = get_dataset(
	name="countries",
	version=versions[-1].version
	project="my-project"
)