Skip to main content
Datasets allow you to store and reuse well-defined data for use in experiments. Datasets can be stored and versioned in Galileo, and available for experiments running both in the console as well as in code. Dataset fields can be sent to a function that is being tested by your application, or used as input variables to prompts.

Work with datasets

Datasets can be used in two ways:
  1. Using the Galileo Console
    • Create and manage datasets directly through the Galileo Console
    • Visually organize and track test cases
    • No coding required
  2. Using the Galileo SDK
    • Programmatically create and manage datasets using Python
    • Integrate dataset management into your existing workflows
    • Automate dataset operations
Choose the approach that best fits your workflow and team’s needs. Many users combine both approaches, using code for bulk operations and the console for visualization and quick edits.

Dataset fields

Each record in a Galileo dataset can have three top-level fields:
  1. input - Input variables that can be passed to your application or prompt to recreate a test case.
  2. output - Reference outputs to evaluate your application. These can be the ground truth for BLEU, ROUGE, and Ground Truth Adherence metrics, or reference outputs for manual reference.
  3. metadata - Additional data you can use to filter or group your dataset.

Create and manage datasets in the Galileo console

Create a new dataset

The dataset creation button, is your starting point for organizing test cases in Galileo’s interface. From the Datasets page of your project, click the + Create Dataset button. Dataset create dataset button You can also create a dataset from a Playground page. Click the Add Dataset button, then select + Create new dataset. Dataset creation from Playground You can choose to create a dataset by: Dataset dialog options

Dataset file uploads

An uploaded file can be in CSV, JSON/JSONC, or Feather format. The file needs to have at least one column that maps to the input values. These columns can have any name. Once you have uploaded the file, you can name the dataset, and map the columns in the file to the dataset’s input, reference output, and metadata by dragging them from the Original uploaded dataset column to the relevant dataset column. Select the Save dataset button when you are done. Mapping a column called input to the input column

Synthetic data generation

You can utilize Large Language Models (LLMs) to generate datasets that you can use to test your AI applications. These test datasets can be used before and after your app is deployed to production. This feature requires an integration with a supported LLM provider (for example, OpenAI, Azure, Mistral). To configure an integration, visit the LLM provider’s platform to obtain an API key, then add the key from the model selection dialog, or from Galileo’s integrations page. Synthetic data generation - LLM integrations To generate data, provide Input Examples for the AI model. At least one example is required, though more examples can help improve the synthetic data. Synthetic data generation - Generated Data After data generation is completed, select Save Dataset to continue working with the data (including editing, exporting, and sharing the data). You can also customize the generated data by setting:
  • The number of rows that you ask the LLM to generate.
  • The LLM model that you’re utilizing.
  • Your AI app’s use case (Optional): What task is your AI app doing? For example, chatbot to answer customer service questions.
  • Special instructions (Optional): Additional guidance to further refine the generated output.
  • The generated data types (Optional): Customize data types that the generated data should follow.
    Data types can be used for testing specific scenarios. For example, testing your app’s resilience to prompt injection scenarios where attackers try to get your app to produce harmful output.
Synthetic data generation - Customized Data Synthetically generated data can be used in many scenarios — expanding upon your existing datasets to increase test coverage and help you more quickly improve your AI applications.

Manual dataset creation

The console allows you to manually add and edit data rows. Select the Save dataset button when you are done. Dataset manual creation

Add rows to your dataset

You can manually add new rows to your dataset through the console, allowing you to capture problematic inputs or edge cases as you discover them. Adding a new row to an existing dataset After making changes to your dataset, select the Save changes button to create a new version that preserves your modifications while maintaining the history of previous versions.

View version history

The version history view allows you to track changes to your dataset over time, see when modifications were made, and access previous versions for comparison or regression testing. Dataset versions After we add a new row to the dataset, we can see the version history by clicking the Version History tab. Creating a new version of your dataset

Create and manage datasets in code

Create datasets

When you create a dataset, it is uploaded to Galileo and available to future experiments. Datasets need to have unique names, and are available to all projects across your organization.
from galileo.datasets import create_dataset

# Create a dataset with test data
test_data = [
    {
        "input": "Which continent is Spain in?",
        "output": "Europe",
    },
    {
        "input": "Which continent is Japan in?",
        "output": "Asia",
    },
]

dataset = create_dataset(
    name="countries",
    content=test_data
)
See the create_dataset Python SDK docs or createDataset TypeScript SDK docs for more details.

Get existing datasets

Once a dataset has been created in Galileo, you can retrieve it to use in your experiments by name or ID.
from galileo.datasets import get_dataset

# Get a dataset by name
dataset = get_dataset(
    name="countries"
)

# Get a dataset by ID
dataset = get_dataset(
    id="dataset-id"
)

# Get its content
dataset.get_content()
See the get_dataset Python SDK docs or getDataset TypeScript SDK docs for more details.

Add rows to existing datasets

Once a dataset has been created, you can manually add rows to it.
from galileo.datasets import get_dataset

# Get an existing dataset
dataset = get_dataset(
    name="countries"
)

# Add new rows to the dataset
dataset.add_rows([
    {
        "input": "Which continent is Morocco in?",
        "output": "Africa",
    },
    {
        "input": "Which continent is Australia in?",
        "output": "Oceania",
    },
])
See the add_rows Python SDK docs or addRowsToDataset TypeScript SDK docs for more details.

Generate synthetic data to extend a dataset

Galileo can use an LLM integration to generate rows of synthetic data that you can then add to a dataset. This synthetic data is generated using a mixture of prompts, instructions, few-shot examples, and data types. Once these rows have been generated, they can be added to a new or existing dataset.
from galileo.datasets import extend_dataset

# Generate synthetic data
dataset = extend_dataset(
    prompt_settings={'model_alias': 'GPT-4o'},
    prompt="Nutrition and health chatbot",
    instructions="Questions that an average-health person would be interested in",
    examples=[
        "Is cereal for breakfast healthy?",
        "How many cups of coffee is unhealthy?",
    ],
    data_types=['General Query'],
    count=10,
)

# Print the generated dataset contents
for row in dataset:  
    print(row.values_dict.additional_properties["input"])
See the extend_dataset Python SDK docs or extendDataset TypeScript SDK docs for more details.

List datasets

You can retrieve all the datasets for a project.
from galileo.datasets import list_datasets

# List all datasets in a project
datasets = list_datasets()

# List datasets with a custom limit
datasets = list_datasets(
    limit=50,
)
See the list_datasets Python SDK docs or getDatasets TypeScript SDK docs for more details.

Delete datasets

If a dataset is no longer needed, you can delete it by name or ID.
from galileo.datasets import delete_dataset

# Delete a dataset by name
delete_dataset(name="countries")

# Delete a dataset by ID
delete_dataset(id="dataset-id")
See the delete_dataset Python SDK docs or deleteDataset TypeScript SDK docs for more details.

Work with dataset versions

Galileo automatically creates new versions of datasets when they are modified. You can access different versions by getting the dataset history.
from galileo.datasets import get_dataset_version_history

# Get the version history
datasets = get_dataset_version_history(
    dataset_name="countries"
)

# List out the rows added with each version
for dataset in datasets.versions:
    print(f"""
    Version index: {dataset.version_index},
    rows added: {dataset.rows_added}
    """)
See the get_dataset_version_history Python SDK docs for more details.

Use datasets in experiments

Datasets are primarily used for running experiments to evaluate the performance of your LLM applications:
from galileo.datasets import get_dataset
from galileo.experiments import run_experiment
from galileo.prompts import get_prompt
from galileo.schema.metrics import GalileoScorers

# Get an existing dataset
dataset = get_dataset(
    name="countries"
)

# Get an existing prompt
prompt = get_prompt(
    name="geography-prompt"
)

# Run an experiment with the dataset and prompt
results = run_experiment(
    "geography-experiment",
    dataset=dataset,
    prompt_template=prompt,
    metrics=[GalileoScorers.completeness],
    project="my-project",
)

Best practices for dataset management

When working with datasets consider these tips:
  1. Start small and representative: begin with a handful of diverse examples to validate quickly.
  2. Grow incrementally: add cases as you find bugs, edge cases, or new scenarios.
  3. Version thoughtfully: create new versions for significant changes and compare results over time.
  4. Document changes: record the rationale behind additions and versions in comments or changelogs.
  5. Organize by purpose: separate datasets for basics, edge cases, and regressions.
  6. Choose the right approach: use the console for quick edits/visualization and the SDK for automation/bulk.
  7. Track progress: monitor metrics/dashboards or review results to catch regressions.
  8. Keep history: archive old cases and maintain version history—don’t delete.
  9. Keep your dataset schema consistent: ensure every row includes all fields referenced by prompts.
  10. Use nested access for dictionaries: reference nested fields with dot notation (e.g., input.metadata.days).
  11. Test your prompt templates: render with sample rows to verify variable substitution.
  12. Document your prompt templates: note required fields and assumptions near the template.