Get started with the multi-agent banking chatbot sample project powered by LangGraph, with RAG using Pinecone as a vector database
The multi-agent banking chatbot sample project is a demo of a multi-agent chatbot powered by LangGraph, with RAG using Pinecone as a vector database. You can have a conversation with the chatbot, and it will bring back information on your (fictional) credit score, as well as give you details of credit cards available from a fictional bank.
The code for this sample is available in Python and TypeScript, and you can run this code to generate more traces, and experiment with improving the app based off the evaluations.
The Python version of this app uses Chainlit to host the chatbot in a web UI. The TypeScript version is terminal-based.
The sample project comes with a Log stream pre-populated with a set of traces for some sample interactions with the chatbot - some asking relevant questions, some asking questions unrelated to the banking agents capabilities.
Navigate to the Default Log Stream by ensuring it is selected, and selecting the View all logs button.
The Log stream is configured to evaluate the following metrics:
For some of the traces, these metrics are evaluated at 100%, showing the agents are working well for those inputs. For other traces, these metrics are reporting lower values, showing the chatbot needs some improvements.
Select different rows to see more details, including the input and output data, the metric scores, and explanations
Galileo has an Insights Engine that continuously reviews your traces and metrics, and gives suggestions to improve your application. Navigate back to the project page by selecting the Multi-Agent Banking Chatbot from the top navigation bar.
Insights will be showing on the right-hand side:
Review the generated insights, and think about ways to improve the chatbot by tweaking the agent prompts. The insights will likely have something like this:
Summary
The supervisor agent exhibits inconsistent behavior that undermines the multi-agent system’s effectiveness. In a credit score inquiry, the supervisor correctly identified the query type and transferred it to the credit-score-agent, which successfully retrieved the user’s credit score (550) and provided helpful context about the score’s meaning. However, when control returned to the supervisor, it responded with ‘I don’t know’ despite the specialist having successfully completed the task. This creates a frustrating user experience where the system retrieves the requested information but then claims ignorance, potentially making users think the system is broken or unreliable.
Suggestions
Ensure the supervisor agent properly processes and relays the results from specialist agents instead of defaulting to ‘I don’t know’ responses.
To see how you can use these insights to improve the app, get the code and try some different agent prompts.
You can run the sample app to generate more traces, and test out different agent prompts.
To run the code yourself to generate more traces, you will need:
To get metrics calculated in Galileo, you will need:
An integration with an LLM configured. If you don’t have an integration configured, then:
Open the integrations page
Select the menu in the top right, and select Integrations
Add an integration
Select + Add Integration for the LLM you are using and add the relevant details, such as an API key or endpoint.
Clone the SDK examples repo
Navigate to the relevant project folder
Start by navigating to the root folder for the programming language you are using:
The Python code for the sample is in a folder called after
. If you want to learn more about adding logging with Galileo to a LangGraph app, check out the add evaluations to a multi-agent LangGraph application cookbook.
The full source code for all of our sample projects is available in the Galileo SDK Examples GitHub repo.
This project uses Pinecone as a vector database to power a RAG agent that retrieves data around the fictional credit cards offered by a bank. Before you can run the app, you will need to upload the documents.
Configure environment variables
In each project folder is a .env.example
file. Rename this file to .env
and populate the PINECONE_API_KEY
value. You can leave the other values for now as you will populate them later
Upload the documents
There is a helper script in the scripts
folder. Run this script to create a new index in Pinecone and upload the documents.
This will take a few seconds, and a successful run should look like:
Install required dependencies
From the project folder, Install the required dependencies. For Python, make sure to create and activate a virtual environment before installing the dependencies.
Configure environment variables
In your .env
file, populate the Galileo values:
Environment Variable | Value |
---|---|
GALILEO_API_KEY | Your API key |
GALILEO_PROJECT | The name of your Galileo project - this is preset to Multi-Agent Banking Chatbot |
GALILEO_LOG_STREAM | The name of your Log stream - this is preset to Default Log Stream |
GALILEO_CONSOLE_URL | Optional. The URL of your Galileo console for custom deployments. For the fre tier, you don’t need to set this. |
You can find these values from the project page for the multi-agent banking chatbot sample page in the Galileo Console.
Next populate the values for your LLM:
Environment Variable | Value |
---|---|
OPENAI_API_KEY | Your OpenAI API key. If you are using Ollama, set this to ollama . If you are using another OpenAI compatible API, then set this to the relevant API key. |
OPENAI_BASE_URL | Optional. The base URL of your OpenAI deployment. Leave this commented out if you are using the default OpenAI API. If you are using Ollama, set this to http://localhost:11434/v1 . If you are using another OpenAI compatible API, then set this to the relevant URL. |
MODEL_NAME | The name of the model you are using |
Run the project
Run the project with the following command:
If you are using the Python version, the app will be running at localhost:8000, so open it in your browser.
If you are using TypeScript, the app will run in your terminal:
You can ask the agent questions about:
The insights you viewed earlier suggested improving how the supervisor agent processes messages, especially with credit scores. You can try this out to see what issues might occur:
Despite there being an agent to get the users credit score, it is not always used.
To improve the agent, have a look at the agent prompt defined in the following file:
In this file is the current agent prompt:
This supervisor agent prompt explicitly mentions the credit card agent, but not the credit score agent. You can encourage the supervisor agent to use the credit score agent to get better results:
Try this new prompt out and see how the agent responds.
Once you have asked a few questions, head back to the Galileo Console and examine the new traces. You should see the metrics improving.
Galileo allows you to run experiments against datasets of known data, generating traces in an experiment Log stream and evaluating these for different metrics. Experiments allow you to take a known set of inputs and evaluate different prompts, LLMs, or versions of your apps.
This sample project has a unit test that runs the chatbot against a pre-defined dataset, containing a mixture of sensible and irrelevant questions:
You can use this unit test to evaluate different supervisor agent prompts for your app.
Run the unit test
Use the following command to run the unit test:
Evaluate the experiment
The unit test will output a link to the experiment in the Galileo Console:
Follow this link to see the metrics for the experiment Log stream.
Try different supervisor agent prompts
Experiment with different supervisor agent prompts. Edit the supervisor agent prompt in the app, then re-run the experiment through the unit test to see how different supervisor agent prompts affect the metrics.
Compare experiments
If you navigate to the experiments list using the All Experiments link, you will be able to compare the average metric values of each run.
You can then select multiple rows and compare the experiments in detail.
Learn how to run experiments with multiple data points using datasets and prompt templates
Log with full control over sessions, traces, and spans using the Galileo logger.
Quickly add logging to your code with the log decorator and wrapper.
Manage logging using the Galileo context manager.
Learn how to integrate and use OpenAI’s API with Galileo’s wrapper client.
Python
Learn how to use the Galileo @log decorator to log functions to traces
Python
Learn how to create log traces and spans manually in your AI apps
Python