Get started with the simple chatbot sample project
The simple chatbot sample project is a demo of a simplistic terminal-based LLM chatbot where you can have a back-and-forth conversation with an LLM. This project comes pre-populated with a Log stream with traces and evaluated metrics, as well as insights to help you improve this project.
The code for this sample is available in Python and TypeScript, and you can run this code using a range of LLM providers to generate more traces, and experiment with improving the app based off the evaluations.
The sample code has 3 variations for the following LLM providers:
The sample project comes with a Log stream pre-populated with a set of traces for some sample interactions with the chatbot - some serious, some asking nonsense questions.
Navigate to the Default Log Stream by ensuring it is selected, and selecting the View all logs button.
The Log stream is configured to evaluate the following metrics:
For some of the traces, these metrics are evaluated at 100%, showing the chatbot is working well for those inputs. For other traces, these metrics are reporting lower values, showing the chatbot needs some improvements.
Select different rows to see more details, including the input and output data, the metric scores, and explanations
Galileo has an Insights Engine that continuously reviews your traces and metrics, and gives suggestions to improve your application. Navigate back to the project page by selecting the Simple Chatbot from the top navigation bar.
Insights will be showing on the right-hand side:
Review the generated insights, and think about ways to improve the chatbot. For example, the system prompt for the chatbot is:
This will likely cause the chatbot to mislead users. The insights will say something like this:
Summary
The system message contains explicit instructions preventing the LLM from expressing uncertainty: ‘Under no circumstances should you respond with “I don’t know”’ and requires it to ‘make educated guesses even when unsure.’ While this worked fine for the straightforward factual question about Italy’s capital, this instruction could be problematic for complex or ambiguous questions where expressing uncertainty would be more appropriate and honest. Forcing confidence could mislead users about the LLM’s actual level of certainty and potentially lead to confident-sounding but incorrect responses.
Suggestions
Consider allowing the LLM to express uncertainty for complex or ambiguous questions where confidence may be inappropriate.
To see how you can use these insights to improve the app, get the code and try some different system prompts.
You can run the sample app to generate more traces, and test out different system prompts.
To run the code yourself to generate more traces, you will need:
To get metrics calculated in Galileo, you will need:
An integration with an LLM configured. If you don’t have an integration configured, then:
Open the integrations page
Select the menu in the top right, and select Integrations
Add an integration
Select + Add Integration for the LLM you are using and add the relevant details, such as an API key or endpoint.
Clone the SDK examples repo
Navigate to the relevant project folder
Start by navigating to the root folder for the programming language you are using:
Then navigate to the folder for the relevant LLM you are using:
The full source code for all of our sample projects is available in the Galileo SDK Examples GitHub repo.
Install required dependencies
From the project folder, Install the required dependencies. For Python, make sure to create and activate a virtual environment before installing the dependencies.
Configure environment variables
In each project folder is a .env.example
file. Rename this file to .env
and populate the Galileo values:
Environment Variable | Value |
---|---|
GALILEO_API_KEY | Your API key |
GALILEO_PROJECT | The name of your Galileo project - this is preset to Simple Chatbot |
GALILEO_LOG_STREAM | The name of your Log stream - this is preset to Default Log Stream |
GALILEO_CONSOLE_URL | Optional. The URL of your Galileo console for custom deployments. For the fre tier, you don’t need to set this. |
You can find these values from the project page for the simple chatbot sample page in the Galileo Console.
Next populate the values for your LLM:
Environment Variable | Value |
---|---|
OPENAI_API_KEY | Your OpenAI API key. If you are using Ollama, set this to ollama . If you are using another OpenAI compatible API, then set this to the relevant API key. |
OPENAI_BASE_URL | Optional. The base URL of your OpenAI deployment. Leave this commented out if you are using the default OpenAI API. If you are using Ollama, set this to http://localhost:11434/v1 . If you are using another OpenAI compatible API, then set this to the relevant URL. |
MODEL_NAME | The name of the model you are using |
Run the project
Run the project with the following command:
The app will run in your terminal, and you can ask the LLM questions and get responses:
The insights you viewed earlier suggested improving the system prompt. The default system prompt is defined in the following file:
In this file is the current system prompt, as well as a suggested improvement:
Try commenting out the original system prompt, and uncomment the suggestion. Then restart the chatbot and interact with it, asking questions about made-up things to see how it responds.
Once you have asked a few questions, head back to the Galileo Console and examine the new traces. You should see the metrics improving.
Galileo allows you to run experiments against datasets of known data, generating traces in an experiment Log stream and evaluating these for different metrics. Experiments allow you to take a known set of inputs and evaluate different prompts, LLMs, or versions of your apps.
This sample project has a unit test that runs the chatbot against a pre-defined dataset, containing a mixture of sensible and nonsense questions:
You can use this unit test to evaluate different system prompts for your app.
Run the unit test
Use the following command to run the unit test:
Evaluate the experiment
The unit test will output a link to the experiment in the Galileo Console:
Follow this link to see the metrics for the experiment Log stream.
Try different system prompts
Experiment with different system prompts. Edit the system prompt in the app, then re-run the experiment through the unit test to see how different system prompts affect the metrics.
Compare experiments
If you navigate to the experiments list using the All Experiments link, you will be able to compare the average metric values of each run.
You can then select multiple rows and compare the experiments in detail.
Learn how to run experiments with multiple data points using datasets and prompt templates
Log with full control over sessions, traces, and spans using the Galileo logger.
Quickly add logging to your code with the log decorator and wrapper.
Manage logging using the Galileo context manager.
Learn how to integrate and use OpenAI’s API with Galileo’s wrapper client.
Python
Learn how to use the Galileo @log decorator to log functions to traces
Python
Learn how to create log traces and spans manually in your AI apps
Python
The Galileo Python SDK reference.
The Galileo TypeScript SDK reference.