Datasets
Datasets in Galileo allow you to store and manage collections of examples for testing, evaluation, and experimentation. They are essential for running experiments and evaluating the performance of your LLM applications.
Creating Datasets
You can create a new dataset using the create_dataset
function:
Getting Existing Datasets
You can retrieve an existing dataset using the get_dataset
function:
Adding to Existing Datasets
You can add rows to an existing dataset using the add_rows
method:
Listing Datasets
You can list all available datasets using the list_datasets
function:
Deleting Datasets
You can delete a dataset using the delete_dataset
function:
Using Datasets in Experiments
Datasets are primarily used for running experiments to evaluate the performance of your LLM applications:
Best Practices for Dataset Management
When working with datasets in Galileo, consider these tips:
- Start Small: Begin with a core set of representative test cases
- Grow Incrementally: Add new test cases as you discover edge cases or failure modes
- Use Consistent Formats: Maintain a consistent format for your datasets to make them easier to use
- Include Expected Outputs: Always include expected outputs for evaluation
- Document Your Datasets: Add descriptions and metadata to make it clear what each dataset is for
By following these practices and utilizing Galileo’s dataset management features, you can build a robust and maintainable test suite that grows with your application’s needs.
List the versions of a dataset and grab a particular version: