Multi-agent banking chatbot sample

The multi-agent banking chatbot sample project is a demo of a multi-agent chatbot powered by LangGraph, with RAG using Pinecone as a vector database. You can have a conversation with the chatbot, and it will bring back information on your (fictional) credit score, as well as give you details of credit cards available from a fictional bank.

The project page for the multi-agent banking chatbot with setup instructions and an insight

The code for this sample is available in Python and TypeScript, and you can run this code to generate more traces, and experiment with improving the app based off the evaluations.

The Python version of this app uses Chainlit to host the chatbot in a web UI. The TypeScript version is terminal-based.

Evaluate the app

The sample project comes with a Log stream pre-populated with a set of traces for some sample interactions with the chatbot - some asking relevant questions, some asking questions unrelated to the banking agents capabilities.

Investigate the Log stream

Navigate to the Default Log Stream by ensuring it is selected, and selecting the View all logs button.

The Log stream is configured to evaluate the following metrics:

For some of the traces, these metrics are evaluated at 100%, showing the agents are working well for those inputs. For other traces, these metrics are reporting lower values, showing the chatbot needs some improvements.

A set of traces with Correctness and Instruction Adherence metrics with a range of values from 33% to 100%

Select different rows to see more details, including the input and output data, the metric scores, and explanations

Get insights

Galileo has an Insights Engine that continuously reviews your traces and metrics, and gives suggestions to improve your application. Navigate back to the project page by selecting the Multi-Agent Banking Chatbot from the top navigation bar.

Insights will be showing on the right-hand side:

A list of insights

Review the generated insights, and think about ways to improve the chatbot by tweaking the agent prompts. The insights will likely have something like this:

Summary

The supervisor agent exhibits inconsistent behavior that undermines the multi-agent system’s effectiveness. In a credit score inquiry, the supervisor correctly identified the query type and transferred it to the credit-score-agent, which successfully retrieved the user’s credit score (550) and provided helpful context about the score’s meaning. However, when control returned to the supervisor, it responded with ‘I don’t know’ despite the specialist having successfully completed the task. This creates a frustrating user experience where the system retrieves the requested information but then claims ignorance, potentially making users think the system is broken or unreliable.

Suggestions

Ensure the supervisor agent properly processes and relays the results from specialist agents instead of defaulting to ‘I don’t know’ responses.

To see how you can use these insights to improve the app, get the code and try some different agent prompts.

Run the sample app

You can run the sample app to generate more traces, and test out different agent prompts.

Prerequisites

To run the code yourself to generate more traces, you will need:

Access to an OpenAI compatible API, such as
- An OpenAI API key
- Access to an OpenAI compatible API, such as Google Vertex
- Ollama installed locally with a model downloaded
A Pinecone account. The free Starter tier is more than enough for this project. You will need your Pinecone API key.
Either Python 3.9 or later, or Node installed

To get metrics calculated in Galileo, you will need:

An integration with an LLM configured. If you don’t have an integration configured, then:

1
Open the integrations page
Select the menu in the top right, and select Integrations
2
Add an integration
Select + Add Integration for the LLM you are using and add the relevant details, such as an API key or endpoint.

Get the code

Clone the SDK examples repo

Terminal

git clone https://github.com/rungalileo/sdk-examples

Navigate to the relevant project folder

Start by navigating to the root folder for the programming language you are using:

cd python/agent/langgraph-fsi-agent/after

The Python code for the sample is in a folder called after. If you want to learn more about adding logging with Galileo to a LangGraph app, check out the add evaluations to a multi-agent LangGraph application cookbook.

The full source code for all of our sample projects is available in the Galileo SDK Examples GitHub repo.

SDK Examples

Check out sample projects using Galileo

Set up Pinecone

This project uses Pinecone as a vector database to power a RAG agent that retrieves data around the fictional credit cards offered by a bank. Before you can run the app, you will need to upload the documents.

Configure environment variables

In each project folder is a .env.example file. Rename this file to .env and populate the PINECONE_API_KEY value. You can leave the other values for now as you will populate them later

Upload the documents

There is a helper script in the scripts folder. Run this script to create a new index in Pinecone and upload the documents.

python ./scripts/setup_pinecone.py

This will take a few seconds, and a successful run should look like:

Terminal

Loading documents for credit-card-information folder...
...
✅ Document processing and upload complete!

Run the code

Install required dependencies

From the project folder, Install the required dependencies. For Python, make sure to create and activate a virtual environment before installing the dependencies.

pip install -r requirements.txt

Configure environment variables

In your .env file, populate the Galileo values:

Environment Variable	Value
`GALILEO_API_KEY`	Your API key
`GALILEO_PROJECT`	The name of your Galileo project - this is preset to `Multi-Agent Banking Chatbot`
`GALILEO_LOG_STREAM`	The name of your Log stream - this is preset to `Default Log Stream`
`GALILEO_CONSOLE_URL`	Optional. The URL of your Galileo console for custom deployments. For the fre tier, you don’t need to set this.

You can find these values from the project page for the multi-agent banking chatbot sample page in the Galileo Console.

Next populate the values for your LLM:

Environment Variable	Value
`OPENAI_API_KEY`	Your OpenAI API key. If you are using Ollama, set this to `ollama`. If you are using another OpenAI compatible API, then set this to the relevant API key.
`OPENAI_BASE_URL`	Optional. The base URL of your OpenAI deployment. Leave this commented out if you are using the default OpenAI API. If you are using Ollama, set this to `http://localhost:11434/v1`. If you are using another OpenAI compatible API, then set this to the relevant URL.
`MODEL_NAME`	The name of the model you are using

Run the project

Run the project with the following command:

chainlit run app.py -w

If you are using the Python version, the app will be running at localhost:8000, so open it in your browser.

A demo of the bot responding to being asked what credit cards do you offer. The bot lists 2 cards

If you are using TypeScript, the app will run in your terminal:

You: What credit cards do you offer?
Assistant: Brahe Bank offers the Orbit Basic Credit Card, which features no
  annual fee, variable interest rates (29.9% APR on purchases, 34.9% APR
  on cash advances), and a 0% APR on balance transfers for 12 months. It
  does not include a rewards program but offers standard fraud protection,
  digital card management tools, and contactless payment support.
  
  Eligibility requires being 18 or older with a valid U.S. address and SSN,
  and a fair credit history with a credit score over 500.
  Would you like information on eligibility or application process?

You can ask the agent questions about:

The different credit cards offered by the bank
Your credit score

Improve the app

The insights you viewed earlier suggested improving how the supervisor agent processes messages, especially with credit scores. You can try this out to see what issues might occur:

Terminal

You: What is my credit score?
Assistant: I cannot answer that question.

Despite there being an agent to get the users credit score, it is not always used.

To improve the agent, have a look at the agent prompt defined in the following file:

src/galileo_langgraph_fsi_agent/agents/supervisor_agent.py

In this file is the current agent prompt:

bank_supervisor_agent = create_supervisor(
    model=ChatOpenAI(model=os.environ["MODEL_NAME"], name="Supervisor"),
    agents=[credit_card_information_agent, credit_score_agent],
    prompt=(
        """
        You are a supervisor managing the following agents:
        - a credit card information agent. Assign any tasks related to
          information about credit cards to this agent
        Otherwise, only respond with 'I don't know' or 'I cannot answer
        that question'.
        If you need to ask the user for more information, do so in a 
        concise manner.
        """
    ),
    add_handoff_back_messages=True,
    output_mode="full_history",
    supervisor_name="brahe-bank-supervisor-agent",
).compile()

This supervisor agent prompt explicitly mentions the credit card agent, but not the credit score agent. You can encourage the supervisor agent to use the credit score agent to get better results:

"""
You are a supervisor managing the following agents:
- a credit card information agent. Assign any tasks related to
    information about credit cards to this agent
- a credit score agent. Use this to get the users credit score.
Otherwise, only respond with 'I don't know' or 'I cannot answer
that question'.
If you need to ask the user for more information, do so in a 
concise manner.
"""

Try this new prompt out and see how the agent responds.

Terminal

You: what is my credit score
Assistant: Your credit score is 550. If you have any other questions or
need further assistance, please let me know.

Once you have asked a few questions, head back to the Galileo Console and examine the new traces. You should see the metrics improving.

Run the sample app as an experiment

Galileo allows you to run experiments against datasets of known data, generating traces in an experiment Log stream and evaluating these for different metrics. Experiments allow you to take a known set of inputs and evaluate different prompts, LLMs, or versions of your apps.

This sample project has a unit test that runs the chatbot against a pre-defined dataset, containing a mixture of sensible and irrelevant questions:

dataset.json

[
    {"input": "What are the cashback rewards offered by the Orbit Credit Card?"},
    {"input": "What is my credit score?"},
    {"input": "What is the APR for balance transfers on the Orbit Credit Card?"},
    {"input": "What credit cards am I eligible for?"},
    {"input": "What can I do with my credit score?"},
    {"input": "Recommend me a good book."}
    ...
]

You can use this unit test to evaluate different supervisor agent prompts for your app.

Run the unit test

Use the following command to run the unit test:

python -m pytest test.py

Evaluate the experiment

The unit test will output a link to the experiment in the Galileo Console:

Terminal

Experiment multi-agent-chatbot-experiment 2025-07-15 at 00:48:11.842 has 
completed and results are available at 
https://app.galileo.ai/project/<id>/experiments/<id>

Follow this link to see the metrics for the experiment Log stream.

The experiment with low correctness scores for most rows

Try different supervisor agent prompts

Experiment with different supervisor agent prompts. Edit the supervisor agent prompt in the app, then re-run the experiment through the unit test to see how different supervisor agent prompts affect the metrics.

Compare experiments

If you navigate to the experiments list using the All Experiments link, you will be able to compare the average metric values of each run.

A list of experiments with the scores increasing as you go up the list

You can then select multiple rows and compare the experiments in detail.

Next steps

Logging with the SDKs

Learn how to log experiments

Learn how to run experiments with multiple data points using datasets and prompt templates

Galileo logger

Log with full control over sessions, traces, and spans using the Galileo logger.

Log decorator

Quickly add logging to your code with the log decorator and wrapper.

Galileo context

Manage logging using the Galileo context manager.

How-to guides

Log Using the OpenAI Wrapper

Learn how to integrate and use OpenAI’s API with Galileo’s wrapper client.

Python

Log Using the @log Decorator

Learn how to use the Galileo @log decorator to log functions to traces

Python

Create Traces and Spans

Learn how to create log traces and spans manually in your AI apps

Python

SDK reference

Python SDK Reference

The Galileo Python SDK reference.

TypeScript SDK Reference

The Galileo TypeScript SDK reference.

Overview

Get Started

How-to Guides

Cookbooks

Integrations

Concepts

SDK/API Reference

References

Multi-agent banking chatbot sample

Evaluate the app

Investigate the Log stream

Get insights

Run the sample app

Prerequisites

Get the code

SDK Examples

Set up Pinecone

Run the code

Improve the app

Run the sample app as an experiment

Next steps

Logging with the SDKs

Learn how to log experiments

Galileo logger

Log decorator

Galileo context

How-to guides

Log Using the OpenAI Wrapper

Log Using the @log Decorator

Create Traces and Spans

SDK reference

Python SDK Reference

TypeScript SDK Reference

Overview

Get Started

How-to Guides

Cookbooks

Integrations

Concepts

SDK/API Reference

References

​Evaluate the app

​Investigate the Log stream

​Get insights

​Run the sample app

​Prerequisites

​Get the code

SDK Examples

​Set up Pinecone

​Run the code

​Improve the app

​Run the sample app as an experiment

​Next steps

​Logging with the SDKs

Learn how to log experiments

Galileo logger

Log decorator

Galileo context

​How-to guides

Log Using the OpenAI Wrapper

Log Using the @log Decorator

Create Traces and Spans

​SDK reference

Python SDK Reference

TypeScript SDK Reference

Evaluate the app

Investigate the Log stream

Get insights

Run the sample app

Prerequisites

Get the code

Set up Pinecone

Run the code

Improve the app

Run the sample app as an experiment

Next steps

Logging with the SDKs

How-to guides

SDK reference