In the log to Galileo guide, you logged your first trace to Galileo. In this guide, you will evaluate the response from the LLM using the context adherence metric, then improve the prompt, and re-evaluate your application.

Configure metrics for your Log stream

To evaluate metrics, you need to set up an LLM integration, then configure which metrics are evaluated against each logged trace.
1

Open the integrations page

Navigate to the LLM integrations page. Select the user menu in the top right, then select Integrations.
The integrations menu
2

Add an integration

Locate the option for the LLM platform you are using, then select the +Add Integration button.The add integration button
3

Add the settings

Set the relevant settings for your integration, such as your API keys or endpoints. Then select Save.
The OpenAI integrations pane
4

Open your project

From the Galileo console, select your project.
5

Open the Log stream

Open the Log stream for your project by selecting the View all logs button.The view all logs
6

Open the configure metrics pane

Open the configure metrics pane by selecting the Configure Metrics button.The view all logs
7

Turn on Context Adherence

For this get started guide, you will be using Context Adherence to evaluate an AI response. Search for this metric and turn it on.The context adherence metric turned onOnce this is on, select the Save and close button.
Your Log stream is now configured. Every time a trace is logged with an LLM span, this will be evaluated for context adherence. You can read more about context adherence in our metrics guide.

Log a trace with the calculated metric

1

Run your application

Now that you have metrics turned on for your Log stream, re-run your application to generate another trace. This time the context adherence metric will be calculated.
python app.py
2

Open the Log stream in the Galileo console

In the Galileo console, select your project, choose the Log stream, and select View all logs.
3

Select the Traces tab

You can see the trace that was just logged in the Traces tab. The context adherence metric will be calculated, showing low score.A logged trace with a 0% context adherence
4

Get more information on the evaluation

Select the trace to drill down for more information. Select the LLM span, and use the arrow next to the context adherence score to see an explanation of the metric.The trace details with an explanation of the metric
This shows a typical problem with an AI application - the LLM doesn’t have enough relevant context to answer a question correctly, so hallucinates, or uses irrelevant information from its training data. Let’s now fix this, and show the fix with an improved evaluation score.

Improve your application

To improve the context adherence score, you can provide relevant context to the LLM in the system.
1

Add relevant context to your system prompt

To improve the context adherence, you can add relevant context to the system prompt. This is similar to adding extra information from a RAG system.

Update your code, replacing the code to set the system prompt with the following:
relevant_documents = [
    """
    Galileo is the fastest way to ship reliable apps.
    Galileo brings automation and insight to AI evaluations so you can
    ship with confidence.
    """,
    """
    Galileo has Automated evaluations
    Eliminate 80% of evaluation time by replacing manual reviews
    with high-accuracy, adaptive metrics. Test your AI features,
    offline and online, and bring CI/CD rigor to your AI workflows.
    """,
    """
    Galileo allows Rapid iteration
    Ship iterations 20% faster by automating testing numerous
    prompts and models. Find the best performance for any given
    test set. When something breaks, Galileo helps identify
    failure modes and root cause.
    """
]

system_prompt = f"""
You are a helpful assistant that wants to provide a user as much information
as possible. Avoid saying I don't know.

Here is some relevant information:
{relevant_documents}
"""
2

Run your application

Run your application again to log a new trace.
3

View the results in your terminal

Now the results should show relevant information:
Galileo is an advanced platform designed to streamline the development and deployment of reliable AI applications. It focuses on enhancing the efficiency of AI evaluations through automation and insightful metrics. Here are some of the key features and benefits of using Galileo:

1. **Automated Evaluations**: Galileo significantly reduces the time spent on manual reviews by automating the evaluation process. This can eliminate up to 80% of evaluation time through the use of high-accuracy, adaptive metrics. Both offline and online testing of AI features are supported, allowing for a more structured and rigorous CI/CD (Continuous Integration/Continuous Delivery) approach within AI workflows.

2. **Rapid Iteration**: The platform accelerates the iteration process, enabling teams to ship new features 20% faster. It automates the testing of multiple prompts and models, helping teams quickly identify the best performance for different test sets. When issues arise, Galileo aids in pinpointing failure modes and root causes, which streamlines the troubleshooting process.

3. **CI/CD Integration**: By introducing CI/CD rigor to AI workflows, Galileo ensures that AI models undergo continuous testing and improvement, ultimately boosting the quality and reliability of applications being deployed.

In summary, Galileo is a powerful tool for teams seeking to enhance their AI app development capabilities by utilizing automation and insightful metrics for evaluations, leading to faster iterations and improved reliability.
4

Check the new trace

A new trace will have been logged. This time, the context adherence score will be higher. Select the trace to see more details.The trace details with an explanation of the metric
🎉 Congratulations, you have evaluated a trace, and used the results of the evaluation to improve your AI application.

Next steps

How-to guides

SDK reference