Datasets
The data driving your experiments
Datasets are a fundamental building block in Galileo’s experimentation workflow. They provide a structured way to organize, version, and manage your test cases. Whether you’re evaluating prompts, testing application functionality, or analyzing model behavior, having well-organized datasets is crucial for systematic testing and continuous improvement.
Working with Datasets
Datasets can be used in two ways:
-
Using the Galileo UI
- Create and manage datasets directly through Galileo’s web interface
- Visually organize and track test cases
- No coding required
-
Using the Galileo SDK
- Programmatically create and manage datasets using Python
- Integrate dataset management into your existing workflows
- Automate dataset operations
Choose the approach that best fits your workflow and team’s needs. Many users combine both approaches, using code for bulk operations and the UI for visualization and quick edits.
For detailed information about dataset management, see our Datasets Documentation.
Path 1: Creating and Managing Datasets via UI
Creating a New Dataset
The dataset creation button, shown above, is your starting point for organizing test cases in Galileo’s interface.
The dataset configuration dialog provides options for naming, describing, and setting up your dataset with the appropriate schema for your testing needs.
Adding Samples to Your Dataset
As shown above, you can manually add samples to your dataset through the interface, allowing you to quickly capture problematic inputs or edge cases as you discover them.
Saving Changes and Creating Versions
After making changes to your dataset, use the save button to create a new version that preserves your modifications while maintaining the history of previous versions.
Viewing Version History
The version history view allows you to track changes to your dataset over time, see when modifications were made, and access previous versions for comparison or regression testing.
After we add a new sample to the dataset, we can see the version history by clicking the “Version History” tab.
Path 2: Creating and Managing Datasets via Code
Creating and Growing Your Dataset
When building your test suite programmatically, you can create datasets using the Galileo SDK.
As you discover new test cases, you can add them to your dataset by running the following:
Version Management and History
One of the benefits of Galileo’s dataset management is automatic versioning. Automatic versioning allows you to track how your test suite evolves over time as well as ensures reproducibility of your experiments. You can always reference specific versions of a dataset or work with the latest version:
Accessing Dataset Variables in Prompt Templates
When you use datasets in Galileo, each sample in your dataset is made available to your prompt template as the input
object. This allows you to create dynamic prompts that adapt to the data in each sample.
Example Dataset
Suppose you have the following dataset:
Example Prompt Template
To reference fields from your dataset in your prompt, use double curly braces and the input
object:
{{ input.city }}
will be replaced with the value of thecity
field from each sample.{{ input.metadata.days }}
will be replaced with the value of thedays
field inside themetadata
dictionary.
How It works
For each sample in your dataset, Galileo will render the prompt template, replacing the variables with the corresponding values from the sample.
If a field is missing in a sample, the variable will be empty or may cause an error, so ensure your dataset is consistent.
Creating Focus Sets
When you find problems, you can create focused subsets of data:
- Create subsets of data that trigger specific issues
- Track how well your fixes work on these subsets
- Make sure fixes don’t cause new problems
- Build a library of test cases for future testing
This can be done either through the UI or programmatically, depending on your workflow.
Best Practices for Dataset Management
When working with datasets consider these tips:
- Start Small and Representative
- Why: Beginning with a core set of representative test cases helps you quickly validate your workflow and catch obvious issues before scaling up.
- How: Select a handful of diverse, meaningful examples that reflect the range of inputs your model will see.
- Grow Incrementally
- Why: Add new test cases as you discover edge cases or failure modes. This ensures your dataset evolves alongside your understanding of the problem.
- How: Whenever you encounter a new bug, edge case, or user scenario, add it to your dataset.
- Version Thoughtfully
- Why: Versioning lets you track major changes, reproduce past experiments, and understand how your test suite evolves.
- How: Create a new version when you make significant changes, and use version history to compare results over time.
- Document Changes
- Why: Keeping a record of why you added certain test cases or created new versions helps future you (and your teammates) understand the reasoning behind your dataset’s evolution.
- How: Use comments, changelogs, or dataset descriptions to note the purpose of additions or modifications.
- Organize by Purpose
- Why: Separate datasets for different types of tests (e.g., basic functionality, edge cases, regression tests) make it easier to target specific goals and analyze results.
- How: Create and name datasets according to their intended use.
- Choose the Right Approach
- Why: The UI is great for quick edits and visual exploration, while code is better for automation and bulk operations.
- How: Use both as needed—UI for ad hoc changes, SDK for systematic or large-scale updates.
- Track Progress
- Why: Monitoring how changes affect both specific issues and overall performance helps you measure improvement and catch regressions.
- How: Use metrics, dashboards, or manual review to assess the impact of dataset and prompt changes.
- Keep History
- Why: Saving problematic inputs and maintaining version history prevents regressions and helps you understand past issues.
- How: Never delete old test cases—archive or version them instead.
- Keep Your Dataset Schema Consistent
- Why: Inconsistent schemas (e.g., missing fields) can cause prompt rendering errors or unexpected results.
- How: Ensure every sample contains all fields referenced in your prompt templates.
- Use Nested Access for Dictionaries
- Why: Many real-world datasets have nested structures. Dot notation (e.g.,
input.metadata.days
) lets you access these fields cleanly in your prompt templates. - How: Reference nested fields using dot notation in your prompt templates.
- Test Your Prompt Templates
- Why: Testing ensures that variables are replaced as expected and helps catch typos or missing fields before running large experiments.
- How: Render your prompt with a sample input to verify correct variable substitution.
- Document Your Prompt Templates
- Why: Clear documentation of which fields are used in your prompt templates helps maintainers and collaborators understand dependencies between your data and prompts.
- How: Add comments or documentation near your prompt templates explaining expected input fields.
By following these practices and using Galileo’s dataset management features, you can build a robust and maintainable test suite that grows with your application’s needs.
Summary
Galileo’s dataset management capabilities provide a foundation for systematic testing and continuous improvement of your AI applications. With two distinct paths for creating and managing datasets—through the UI or programmatically—choose the approach that best fits your workflow and team’s needs.
By leveraging datasets and the best practices provided, you can build a comprehensive test suite that helps you identify and address issues before they impact your users.