Learn how to create and manage datasets for use in your experiments with our SDKs
Datasets allow you to store and reuse well-defined data for use in experiments. Datasets can be stored and versioned in Galileo, and available for experiments running both in the console as well as in code.
Each record in a Galileo dataset can have three top-level fields:
input
- Input variables that can be passed to your application to recreate a test case.output
- Reference outputs to evaluate your application. These can be the ground truth for BLEU, ROUGE, and Ground Truth Adherence metrics, or reference outputs for manual reference.metadata
- Additional data you can use to filter or group your dataset.When you create a dataset, it is uploaded to Galileo and available to future experiments. Datasets need to have unique names, and are available to all projects across your organization.
See the create_dataset
Python SDK docs or createDataset
TypeScript SDK docs for more details.
Once a dataset has been created in Galileo, you can retrieve it to use in your experiments by name or ID.
See the get_dataset
Python SDK docs or getDataset
TypeScript SDK docs for more details.
See the add_rows
Python SDK docs for more details.
You can retrieve all the datasets for a project.
See the list_datasets
Python SDK docs or getDatasets
TypeScript SDK docs for more details.
If a dataset is no longer needed, you can delete it by name or ID.
See the delete_dataset
Python SDK docs or deleteDataset
TypeScript SDK docs for more details.
Galileo automatically creates new versions of datasets when they are modified. You can access different versions by getting the dataset history.
See the get_dataset_version_history
Python SDK docs for more details.
Datasets are primarily used for running experiments to evaluate the performance of your LLM applications:
Learn about more datasets, the data driving your experiments.
Learn how to use datasets and experiments to improve your application.
Learn how to create and use prompt templates in experiments
Learn how to create and manage datasets for use in your experiments with our SDKs
Datasets allow you to store and reuse well-defined data for use in experiments. Datasets can be stored and versioned in Galileo, and available for experiments running both in the console as well as in code.
Each record in a Galileo dataset can have three top-level fields:
input
- Input variables that can be passed to your application to recreate a test case.output
- Reference outputs to evaluate your application. These can be the ground truth for BLEU, ROUGE, and Ground Truth Adherence metrics, or reference outputs for manual reference.metadata
- Additional data you can use to filter or group your dataset.When you create a dataset, it is uploaded to Galileo and available to future experiments. Datasets need to have unique names, and are available to all projects across your organization.
See the create_dataset
Python SDK docs or createDataset
TypeScript SDK docs for more details.
Once a dataset has been created in Galileo, you can retrieve it to use in your experiments by name or ID.
See the get_dataset
Python SDK docs or getDataset
TypeScript SDK docs for more details.
See the add_rows
Python SDK docs for more details.
You can retrieve all the datasets for a project.
See the list_datasets
Python SDK docs or getDatasets
TypeScript SDK docs for more details.
If a dataset is no longer needed, you can delete it by name or ID.
See the delete_dataset
Python SDK docs or deleteDataset
TypeScript SDK docs for more details.
Galileo automatically creates new versions of datasets when they are modified. You can access different versions by getting the dataset history.
See the get_dataset_version_history
Python SDK docs for more details.
Datasets are primarily used for running experiments to evaluate the performance of your LLM applications:
Learn about more datasets, the data driving your experiments.
Learn how to use datasets and experiments to improve your application.
Learn how to create and use prompt templates in experiments