Datasets
Datasets are a fundamental building block in Galileo’s experimentation workflow. They provide a structured way to organize, version, and manage your test cases. Whether you’re evaluating prompts, testing application functionality, or analyzing model behavior, having well-organized datasets is crucial for systematic testing and continuous improvement.
Working with Datasets
You can use datasets in two ways:
-
Using the Galileo UI
- Create and manage datasets directly through Galileo’s intuitive interface
- Visually organize and track test cases
- No coding required
-
Using the Galileo SDK
- Programmatically create and manage datasets using Python
- Integrate dataset management into your existing workflows
- Automate dataset operations
Choose the approach that best fits your workflow and team’s needs. Many users combine both approaches, using code for bulk operations and the UI for visualization and quick edits.
Path 1: Creating and Managing Datasets via UI
Creating a New Dataset
The dataset creation button, shown above, is your starting point for organizing test cases in Galileo’s interface.
The dataset configuration dialog provides options for naming, describing, and setting up your dataset with the appropriate schema for your testing needs.
Adding Samples to Your Dataset
As shown above, you can manually add samples to your dataset through the interface, allowing you to quickly capture problematic inputs or edge cases as you discover them.
Saving Changes and Creating Versions
After making changes to your dataset, use the save button to create a new version that preserves your modifications while maintaining the history of previous versions.
Viewing Version History
The version history view allows you to track changes to your dataset over time, see when modifications were made, and access previous versions for comparison or regression testing.
After we add a new sample to the dataset, we can see the version history by clicking the “Version History” tab.
Path 2: Creating and Managing Datasets via Code
Creating a New Dataset
When building your test suite programmatically, you can create datasets using the Galileo SDK:
Adding Samples to Your Dataset
As you discover new test cases, you can easily add them to your dataset programmatically:
Working with Dataset Versions
One of the key benefits of Galileo’s dataset management is automatic versioning. This allows you to track how your test suite evolves over time and ensures reproducibility of your experiments:
Creating Focus Sets
When you find problems, you can create focused subsets of data:
- Create subsets of data that trigger specific issues
- Track how well your fixes work on these subsets
- Make sure fixes don’t cause new problems
- Build a library of test cases for future testing
This can be done through either the UI or programmatically, depending on your workflow.
Best Practices for Dataset Management
When working with datasets in Galileo, consider these tips:
- Start Small: Begin with a core set of representative test cases
- Grow Incrementally: Add new test cases as you discover edge cases or failure modes
- Version Thoughtfully: Use versioning to track major changes in your test suite
- Document Changes: Keep track of why you added certain test cases or created new versions
- Organize by Purpose: Create separate datasets for different types of tests (e.g., basic functionality, edge cases, regression tests)
- Choose the Right Approach: Use the UI for visual exploration and quick edits, and code for automation and bulk operations
By following these practices and utilizing Galileo’s dataset management features, you can build a robust and maintainable test suite that grows with your application’s needs.
Summary
Galileo’s dataset management capabilities provide a powerful foundation for systematic testing and continuous improvement of your AI applications. With two distinct paths for creating and managing datasets—through the UI or programmatically—you can choose the approach that best fits your workflow and team’s needs.
By leveraging both approaches, you can build a comprehensive test suite that helps you identify and address issues before they impact your users.