You can create a new dataset using the create_dataset function:
from galileo.datasets import create_dataset# Create a dataset with test datatest_data =[{"input":"Which continent is Spain in?","output":"Europe",},{"input":"Which continent is Japan in?","output":"Asia",},]dataset = create_dataset( name="countries", content=test_data)
You can retrieve an existing dataset using the get_dataset function:
from galileo.datasets import get_dataset# Get a dataset by namedataset = get_dataset( name="countries")# Get a dataset by IDdataset = get_dataset(id="dataset-id")# Get its contentdataset.get_content()
You can add rows to an existing dataset using the add_rows method:
from galileo.datasets import get_dataset# Get an existing datasetdataset = get_dataset( name="countries")# Add new rows to the datasetdataset.add_rows([{"input":"Which continent is Morocco in?","output":"Africa",},{"input":"Which continent is Australia in?","output":"Oceania",},])
You can list all available datasets using the list_datasets function:
from galileo.datasets import list_datasets# List all datasets in a projectdatasets = list_datasets()# List datasets with a custom limitdatasets = list_datasets( limit=50,)
You can delete a dataset using the delete_dataset function:
from galileo.datasets import delete_dataset# Delete a dataset by namedelete_dataset( name="countries", project="my-project",)# Delete a dataset by IDdelete_dataset(id="dataset-id", project="my-project",)
Galileo automatically creates new versions of datasets when they are modified. You can access different versions:
from galileo.datasets import get_dataset# Get the latest version by defaultdataset = get_dataset( name="countries")# Check when this version was last modifiedprint(dataset.modified_at)
Datasets are primarily used for running experiments to evaluate the performance of your LLM applications:
from galileo.datasets import get_datasetfrom galileo.experiments import run_experimentfrom galileo.prompts import get_prompt_template# Get an existing datasetdataset = get_dataset( name="countries")# Get an existing prompt templateprompt_template = get_prompt_template( project="my-project", name="geography-prompt")# Run an experiment with the dataset and promptresults = run_experiment("geography-experiment", dataset=dataset, prompt_template=prompt_template, metrics=["correctness"], project="my-project",)