Work with datasets
Datasets can be used in two ways:- 
Using the Galileo Console
- Create and manage datasets directly through the Galileo Console
- Visually organize and track test cases
- No coding required
 
- 
Using the Galileo SDK
- Programmatically create and manage datasets using Python
- Integrate dataset management into your existing workflows
- Automate dataset operations
 
Dataset fields
Each record in a Galileo dataset can have three top-level fields:- input- Input variables that can be passed to your application or prompt to recreate a test case.
- output- Reference outputs to evaluate your application. These can be the ground truth for BLEU, ROUGE, and Ground Truth Adherence metrics, or reference outputs for manual reference.
- metadata- Additional data you can use to filter or group your dataset.
Create and manage datasets in the Galileo console
Create a new dataset
The dataset creation button, is your starting point for organizing test cases in Galileo’s interface. From the Datasets page of your project, click the + Create Dataset button.


Dataset file uploads
An uploaded file can be in CSV, JSON/JSONC, or Feather format. The file needs to have at least one column that maps to the input values. These columns can have any name. Once you have uploaded the file, you can name the dataset, and map the columns in the file to the dataset’s input, reference output, and metadata by dragging them from the Original uploaded dataset column to the relevant dataset column. Select the Save dataset button when you are done.
Synthetic data generation
You can utilize Large Language Models (LLMs) to generate datasets that you can use to test your AI applications. These test datasets can be used before and after your app is deployed to production. This feature requires an integration with a supported LLM provider (for example, OpenAI, Azure, Mistral). To configure an integration, visit the LLM provider’s platform to obtain an API key, then add the key from the model selection dialog, or from Galileo’s integrations page.

- The number of rows that you ask the LLM to generate.
- The LLM model that you’re utilizing.
- Your AI app’s use case (Optional): What task is your AI app doing? For example, chatbot to answer customer service questions.
- Special instructions (Optional): Additional guidance to further refine the generated output.
- The generated data types (Optional): Customize data types that the generated data should follow.
Data types can be used for testing specific scenarios. For example, testing your app’s resilience to prompt injection scenarios where attackers try to get your app to produce harmful output. 

Manual dataset creation
The console allows you to manually add and edit data rows. Select the Save dataset button when you are done.
Add rows to your dataset
You can manually add new rows to your dataset through the console, allowing you to capture problematic inputs or edge cases as you discover them.
View version history
The version history view allows you to track changes to your dataset over time, see when modifications were made, and access previous versions for comparison or regression testing.

Create and manage datasets in code
Create datasets
When you create a dataset, it is uploaded to Galileo and available to future experiments. Datasets need to have unique names, and are available to all projects across your organization.create_dataset Python SDK docs or createDataset TypeScript SDK docs for more details.
Get existing datasets
Once a dataset has been created in Galileo, you can retrieve it to use in your experiments by name or ID.get_dataset Python SDK docs or getDataset TypeScript SDK docs for more details.
Add rows to existing datasets
Once a dataset has been created, you can manually add rows to it.add_rows Python SDK docs or addRowsToDataset TypeScript SDK docs for more details.
Generate synthetic data to extend a dataset
Galileo can use an LLM integration to generate rows of synthetic data that you can then add to a dataset. This synthetic data is generated using a mixture of prompts, instructions, few-shot examples, and data types. Once these rows have been generated, they can be added to a new or existing dataset.extend_dataset Python SDK docs or extendDataset TypeScript SDK docs for more details.
List datasets
You can retrieve all the datasets for a project.list_datasets Python SDK docs or getDatasets TypeScript SDK docs for more details.
Delete datasets
If a dataset is no longer needed, you can delete it by name or ID.delete_dataset Python SDK docs or deleteDataset TypeScript SDK docs for more details.
Work with dataset versions
Galileo automatically creates new versions of datasets when they are modified. You can access different versions by getting the dataset history.get_dataset_version_history Python SDK docs for more details.
Use datasets in experiments
Datasets are primarily used for running experiments to evaluate the performance of your LLM applications:Best practices for dataset management
When working with datasets consider these tips:- Start small and representative: begin with a handful of diverse examples to validate quickly.
- Grow incrementally: add cases as you find bugs, edge cases, or new scenarios.
- Version thoughtfully: create new versions for significant changes and compare results over time.
- Document changes: record the rationale behind additions and versions in comments or changelogs.
- Organize by purpose: separate datasets for basics, edge cases, and regressions.
- Choose the right approach: use the console for quick edits/visualization and the SDK for automation/bulk.
- Track progress: monitor metrics/dashboards or review results to catch regressions.
- Keep history: archive old cases and maintain version history—don’t delete.
- Keep your dataset schema consistent: ensure every row includes all fields referenced by prompts.
- Use nested access for dictionaries: reference nested fields with dot notation (e.g., input.metadata.days).
- Test your prompt templates: render with sample rows to verify variable substitution.
- Document your prompt templates: note required fields and assumptions near the template.