When implementing RAG systems, it’s crucial to properly handle document retrieval, context management, and response generation. This guide demonstrates a basic RAG implementation using Galileo’s observability features.

What You’ll Need

  • OpenAI API key
  • Galileo API key
  • Python environment with required packages
  • Basic understanding of RAG concepts

Setup Instructions

1

Set Up Your Environment

Create a .env file with your API keys:

.env
GALILEO_API_KEY=your_galileo_api_key
OPENAI_API_KEY=your_openai_api_key
2

Install Dependencies

Install required dependencies:

requirements.txt
galileo==0.0.7
openai==1.61.1
python-dotenv==0.19.0
rich==13.7.0
questionary==2.0.1
3

Running and Monitoring

Execute the application:

python app.py

Use Galileo to monitor:

  • Document retrieval performance
    • Chunk relevance
    • Chunk utilization
    • Chunk attribution
    • Completeness
  • System performance metrics

Implementation Guide

Let’s break down the implementation into manageable sections:

1. Setting Up the Environment

First, we’ll set up our imports and initialize our environment:

app.py
import os
from dotenv import load_dotenv
from galileo import log
from galileo.openai import openai
from rich.console import Console
from rich.panel import Panel
from rich.markdown import Markdown
import questionary
import sys

load_dotenv()

# Initialize console for rich output
console = Console()

# Check if Galileo logging is enabled
logging_enabled = os.environ.get("GALILEO_API_KEY") is not None

# Initialize OpenAI client directly
client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

This section:

  • Imports necessary libraries
  • Loads environment variables
  • Sets up rich console output
  • Initializes the OpenAI client with Galileo integration

2. Document Retrieval System

The document retrieval function is decorated with Galileo’s logging:

app.py
@log(span_type="retriever")
def retrieve_documents(query: str):
    # TODO: Replace with actual RAG retrieval
    documents = [
        {
            "id": "doc1",
            "text": "Galileo is an observability platform for LLM applications. It helps developers monitor, debug, and improve their AI systems by tracking inputs, outputs, and performance metrics.",
            "metadata": {
                "source": "galileo_docs",
                "category": "product_overview"
            }
        },
        {
            "id": "doc2",
            "text": "RAG (Retrieval-Augmented Generation) is a technique that enhances LLM responses by retrieving relevant information from external knowledge sources before generating an answer.",
            "metadata": {
                "source": "ai_techniques",
                "category": "methodology"
            }
        },
        {
            "id": "doc3",
            "text": "Common RAG challenges include hallucinations, retrieval quality issues, and context window limitations. Proper evaluation metrics include relevance, faithfulness, and answer correctness.",
            "metadata": {
                "source": "ai_techniques",
                "category": "challenges"
            }
        },
        {
            "id": "doc4",
            "text": "Vector databases like Pinecone, Weaviate, and Chroma are optimized for storing embeddings and performing similarity searches, making them ideal for RAG applications.",
            "metadata": {
                "source": "tech_stack",
                "category": "databases"
            }
        },
        {
            "id": "doc5",
            "text": "Prompt engineering is crucial for RAG systems. Well-crafted prompts should instruct the model to use retrieved context, avoid making up information, and cite sources when possible.",
            "metadata": {
                "source": "best_practices",
                "category": "prompting"
            }
        }
    ]
    return documents

Key points:

  • Uses @log decorator with the retriever span type
  • Returns structured document objects
  • Includes metadata for tracking sources
  • Simulates a real document retrieval system

3. RAG Pipeline Implementation

The core RAG functionality:

app.py
def rag(query: str):
    documents = retrieve_documents(query)

    # Format documents for better readability in the prompt
    formatted_docs = ""
    for i, doc in enumerate(documents):
        formatted_docs += f"Document {i+1} (Source: {doc['metadata']['source']}):\n{doc['text']}\n\n"

    prompt = f"""
    Answer the following question based on the context provided. If the answer is not in the context, say you don't know.

    Question: {query}

    Context:
    {formatted_docs}
    """

    try:
        console.print("[bold blue]Generating answer...[/bold blue]")
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that answers questions based only on the provided context."},
                {"role": "user", "content": prompt}
            ],
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"Error generating response: {str(e)}"

This section:

  • Retrieves relevant documents
  • Formats context for the LLM
  • Constructs a clear prompt
  • Handles API calls and errors
  • Uses the gpt-4o model for responses

4. Interactive Interface

The main application interface:

app.py
def main():
    console.print(Panel.fit(
        "[bold]RAG Demo[/bold]\nThis demo uses a simulated RAG system to answer your questions.",
        title="Galileo RAG Terminal Demo",
        border_style="blue"
    ))

    # Check environment setup
    if logging_enabled:
        console.print("[green]✅ Galileo logging is enabled[/green]")
    else:
        console.print("[yellow]⚠️ Galileo logging is disabled[/yellow]")

    api_key = os.environ.get("OPENAI_API_KEY")
    if api_key:
        console.print("[green]✅ OpenAI API Key is set[/green]")
    else:
        console.print("[red]❌ OpenAI API Key is missing[/red]")
        sys.exit(1)

    # Main interaction loop
    while True:
        query = questionary.text(
            "Enter your question about Galileo, RAG, or AI techniques:",
            validate=lambda text: len(text) > 0
        ).ask()

        if query.lower() in ['exit', 'quit', 'q']:
            break

        try:
            result = rag(query)
            console.print("\n[bold green]Answer:[/bold green]")
            console.print(Panel(Markdown(result), border_style="green"))

            continue_session = questionary.confirm(
                "Do you want to ask another question?",
                default=True
            ).ask()

            if not continue_session:
                break

        except Exception as e:
            console.print(f"[bold red]Error:[/bold red] {str(e)}")

if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        console.print("\n[bold]Exiting RAG Demo. Goodbye![/bold]")

This section provides:

  • Environment validation
  • Interactive question-answer loop
  • Rich formatting for outputs
  • Graceful error handling
  • Clean exit handling

Key Features

  • Galileo Logging: Track document retrieval and LLM interactions
  • Rich Console Interface: User-friendly terminal interface
  • Error Handling: Graceful handling of API and runtime errors
  • Context Management: Proper formatting of retrieved documents
  • Interactive Experience: Easy-to-use question-answering interface

Next Steps

  • Implement real document retrieval using a vector database
  • Add response streaming for better user experience
  • Implement more sophisticated prompt engineering
  • Add evaluation metrics for retrieval quality
  • Integrate advanced Galileo logging features