Evaluate metrics with the Luna 2 model
Learn how to evaluate metrics cheaper and faster using the Luna 2 model
Overview
This guide shows you how to use Luna 2 metrics to evaluate your AI applications.
You will be running a basic AI app using OpenAI as an LLM, and evaluating it for input toxicity and prompt injections using Luna 2.
Luna is only available in the Enterprise tier of Galileo. Contact us to learn more and get started.
Before you start
To complete this how-to, you will need:
- An OpenAI API key
- A Galileo project configured to use the Luna 2 model
- Your Galileo API key
Install dependencies
To use Galileo, you need to install some package dependencies, and configure environment variables.
Install Required Dependencies
Install the required dependencies for your app. If you are using Python, create a virtual environment using your preferred method, then install dependencies inside that environment:
Create a `.env` file, and add the following values
Create your AI application
Create a file for your app called `app.py` or `app.ts`.
Add the following code to this file
This code makes a call to OpenAI using the Galileo OpenAI wrapper, making a compliment and asking a question about sunflowers.
If you are using TypeScript, you will also need to configure your code to use ESM. Add the following to your package.json
file:
Run the app to ensure everything is working
View the app in the Galileo console
Open the Galileo console and view the log stream for your app. You should see a single session with a single trace.
Configure Luna 2 metrics
Now you can configure metrics using Luna 2. You will be adding metrics to look for toxicity and prompt injection attacks in the input.
Configure metrics for the logstream
Select the Configure metrics button.
Turn on the Luna input toxicity and prompt injection metrics
Locate the Input Toxicity (SLM) and Prompt Injection (SLM) metrics, and turn these on.
You will see 2 versions of these metrics, the LLM as a judge versions which use whatever integrations you have set up to third party LLMs, and the Luna versions.
The Luna versions are labelled (SLM), so make sure to select these.
For example, ensure you turn Input Toxicity (SLM) on, NOT Input Toxicity.
Save and close the metric configuration tab
Run your app again to evaluate these metrics
Run your app again
Run your app as before to generate a new trace. This time the metrics will be evaluated.
View the traces for your app in the Galileo console
Open the Galileo console and view the log stream for your app. You should see a single session with a single trace.
Select this session to see the details of the trace, then select the Metrics tab from the Trace Summary. You will see an evaluation of the toxicity and prompt injection from the input, showing no toxicity or prompt injection attacks.
Adjust your prompt to increase toxicity and add a prompt injection
Now that your app is evaluating metrics using Luna 2, you will change the prompt to see them in action.
Update the prompt
Update the prompt in your code to the following.
Run your app again
Run your app as before to generate a new trace. This time the metrics will be evaluated using the new prompt.
View the traces for your app in the Galileo console
Navigate to the latest session in the Galileo console. You will now see evaluations showing both toxicity in the input, and a context switch attack in the prompt.
You’ve successfully evaluated an app using the Luna 2 model.