Dataset fields
Each record in a Galileo dataset can have three top-level fields:input
- Input variables that can be passed to your application to recreate a test case.output
- Reference outputs to evaluate your application. These can be the ground truth for BLEU, ROUGE, and Ground Truth Adherence metrics, or reference outputs for manual reference.metadata
- Additional data you can use to filter or group your dataset.
Create datasets
When you create a dataset, it is uploaded to Galileo and available to future experiments. Datasets need to have unique names, and are available to all projects across your organization.create_dataset
Python SDK docs or createDataset
TypeScript SDK docs for more details.
Get existing datasets
Once a dataset has been created in Galileo, you can retrieve it to use in your experiments by name or ID.get_dataset
Python SDK docs or getDataset
TypeScript SDK docs for more details.
Add rows to existing datasets
Once a dataset has been created, you can manually add rows to it.add_rows
Python SDK docs or addRowsToDataset
TypeScript SDK docs for more details.
Generate synthetic data to extend a dataset
Galileo can use an LLM integration to generate rows of synthetic data that you can then add to a dataset. This synthetic data is generated using a mixture of prompts, instructions, few-shot examples, and data types. Once these rows have been generated, they can be added to a new or existing dataset.extend_dataset
Python SDK docs or extendDataset
TypeScript SDK docs for more details.
List datasets
You can retrieve all the datasets for a project.list_datasets
Python SDK docs or getDatasets
TypeScript SDK docs for more details.
Delete datasets
If a dataset is no longer needed, you can delete it by name or ID.delete_dataset
Python SDK docs or deleteDataset
TypeScript SDK docs for more details.
Work with dataset versions
Galileo automatically creates new versions of datasets when they are modified. You can access different versions by getting the dataset history.get_dataset_version_history
Python SDK docs for more details.