Add evaluations to a multi-agent LangGraph application
Learn how to add evaluations to a multi-agent LangGraph chat bot using Galileo
Overview
In this tutorial, you’ll learn how to add evaluations with Galileo to an existing multi-agent LangGraph app. This tutorial is intended for Python LangGraph developers who already have an app and are looking to add evaluation. It assumes you have basic knowledge of:
- Python
- LangGraph
- Setting up a project and metrics in Galileo
By the end of this tutorial, you’ll be able to:
- Add Galileo evaluations to a multi-agent LangGraph app
- View and understand session level metrics
You can also watch the video walkthrough here.
Background
This tutorial uses an existing banking chatbot app powered by Chainlit and LangGraph. This is a very simplistic example of a chatbot for a fictitious bank. It is a multi-agent app, with a supervisor agent, and a single additional agent that can be used to answer questions on the credit cards offered by the bank. This agent uses some dummy credit card documents stored in a Pinecone vector database.
For example, you can ask questions like “What credit cards do you offer?” or “Which card has the lowest annual fee?”
These are the 2 agents:
-
Credit card information agent
This agent provides information on the available credit cards.
The credit card documentation that the agent uses is stored in a Pinecone vector database.
-
Supervisor agent
Chainlit provides a web front end for a chatbot, managing user interaction and conversation history. The important files in this app are:
app.py
- This contains the main application logic for a Chainlit app. It has anon_chat_start
function that is called whenever a new chat is started, and amain
function that is called whenever a message is sent.src/galileo_langgraph_fsi_agent/agents/supervisor_agent.py
- This is a LangGraph supervisor agent that manages the other agents, routing messages where needed. This is configured to use GPT-4.1-mini.src/galileo_langgraph_fsi_agent/agents/credit_card_information_agent.py
- This is a LangGraph agent that uses a tool to extract information about the available credit cards from Pinecone. This is also configured to use GPT-4.1-mini.src/galileo_langgraph_fsi_agent/tools/pinecone_retrieval_tool.py
- This is a LangGraph tool that interacts with the Pinecone vector database. It is called by thecredit_card_information_agent
.
Before you start
Before you start the tutorial, you will need:
- The starter project - Clone the Galileo SDK-Examples repo. This repo contains both the starting LangGraph app that you will be adding Galileo evaluations to, as well as a final version for reference.
- A Pinecone account and API key - If you don’t have an existing Pinecone account, head to pinecone.io, sign up for a free account, and get an API key.
- An OpenAI API key - This example uses OpenAI as the underlying LLM to run the agents.
- A Galileo API key - To access your Galileo API keys, open the Galileo Console and log in or create an account. From the Settings and Users page you can create a new API key.
Set up the project
The starter project is in the sdk-examples/python/agent/langgraph-fsi-agent/before
folder in the cloned repo.
-
Open the starter project in your Python IDE of choice.
-
Install the dependencies that are defined in the
pyproject.toml
. Create a virtual environment, and install these dependencies using a tool such asuv
: -
Configure your
.env
file. Copy the.env.example
file to.env
, and set the values for your OpenAI and Pinecone API keys:Replace
<Your OpenAI API key>
with your OpenAI API key. Replace<Your Pinecone API key>
with your Pinecone API key. -
Upload the dummy credit card documentation to Pinecone using the provided helper script:
This will take a few seconds and a successful run should look like:
-
Run the project to test it out:
The app will be running at localhost:8000, so open it in your browser.
Ask the bot questions like “What credit cards do you offer?”.
You are now ready to add Galileo evaluations to your app.
Create a new Galileo project
First you need a new Galileo project to log evaluations to.
-
Create a new project from the Galileo Console using the + New Project button. Name this project
bank-chatbot
.
Install the Galileo Python package
To send data to Galileo, you need to use the Galileo Python package.
-
Install the Galileo Python package in your virtual environment.
This installs the Galileo Python package with the optional OpenAI wrapper.
-
Add the following Galileo environment variables to you
.env
file:Replace
<Your Galileo API key>
with your Galileo API key. The project is set to the new project you just created, and the log stream is set tochatbot-logs
.You don’t need to create the log stream in advance, a new log stream will be created automatically.
Add logging to Galileo
Next you need to add code to log to Galileo. Galileo has a LangGraph callback handler that can be passed into the agent to automatically log traces for every step in the chain, including agent calls, tool calls, and LLM calls.
You can find a complete version of this code with all the code added in the sdk-examples/python/agent/langgraph-fsi-agent/after
folder in the cloned repo.
Add the logging code
-
Add include directives for the Galileo components to the top of the
app.py
file: -
Start a Galileo session. In the
on_chat_start
function inapp.py
, add the following code to create a new logging session:This creates a new session named “FSI Agent - {time}” with the current date and time. This also sets the
external_id
to the current Chainlit session ID. Each separate conversation in Chainlit is a separate session with a unique ID. -
Create a callback handler. After the code you just added, add the following to create the callback handler, and save it in the Chainlit session:
This creates the callback handler, and saves it against the current user session.
The Galileo logging handlers use the current thread context to connect to the current Galileo context. This means to have a callback handler tied to a session, it needs to be created in the same thread as the session. It can then be access from any other thread.
-
Pass the callback handler to LangGraph. In the
main
function, replace this line:With the following:
This will extract the Galileo callback from the user session, and adds it to a callbacks collection. This collection is passed to the LangGraph
RunnableConfig
that is passed when the supervisor agent is used.
Run the app
-
Run the app.
Open the app in your browser at localhost:8000, and ask the bot a question. In your terminal you will see references to the Galileo log stream being created, and traces being flushed:
Leave the app running whilst you view the traces.
View the traces
-
View the session in Galileo. Open the Galileo console and select your project. In the Sessions tab you should see a single session created for the conversation.
-
Select the single session. It will open in the sessions view showing a flowchart
Select the nodes in this chart to see the input and output.
Add more traces to the session
Sessions can contain multiple traces. For example, a single user conversation with your bot would be a single session, containing multiple traces for the different questions you ask the bot.
-
Ask the bot a follow up question related to credit cards, such as “Which card has no annual fee?”
-
Follow this up with a third question that does not involve specific information about the credit cards, such as “What does APR stand for?”
-
View the session in the Galileo console.
This session will have 3 traces. Use the Trace navigation to move between the traces. In the Input and Output you will see the relevant messages.
-
Navigate to the last trace. Where you asked “What does APR stand for?”, the credit card agent would not need to be used, so the flowchart doesn’t show this node.
Summary
In this tutorial, you learned how to:
- Add Galileo evaluations to a multi-agent LangGraph app
- View and navigate session level traces
Next steps
Some suggested next steps are: