The Out-of-Context Problem in RAG Systems

Picture this scenario: You’ve built a RAG system to answer questions about famous landmarks using your carefully curated knowledge base. A user asks, “When was the Eiffel Tower completed?” Your system retrieves a relevant document:

The Eiffel Tower is an iron lattice tower located in Paris, France.
It was designed by Gustave Eiffel.

The response comes back:

The Eiffel Tower was completed in 1889 for the World's Fair in Paris and is the most visited paid monument in the world.

At first glance, this might seem like a helpful response. It’s detailed, informative, and answers the user’s question. There’s just one problem: most of this information isn’t from your knowledge base. The model has ventured beyond the retrieved context, drawing from its pre-trained knowledge to provide what it thinks is a helpful answer.

This is the out-of-context problem in RAG systems – when a language model generates information not found in the retrieved documents. It’s one of the most challenging issues in RAG implementations, often referred to as “closed-domain hallucination.”

Understanding the Challenge

The root of this problem lies in how modern language models work. These models are trained on vast amounts of data and retain this knowledge. When asked a question, they naturally try to be helpful by combining:

  • Information from the provided context
  • Their pre-existing knowledge
  • Patterns they’ve learned from similar questions

In our Eiffel Tower example, the model:

  • Used the context correctly (location and designer)
  • Added the completion date (1889) from its training data
  • Included visitor statistics it “knew” from pre-training

While this additional information might be factually correct, it creates several problems:

  • Users can’t verify the source of this information
  • The response mixes verified knowledge base facts with external information
  • There’s no distinction between what came from our documents and what didn’t

This behavior particularly impacts RAG systems because they’re specifically designed to provide information from a controlled set of documents. When the model starts adding external information, it undermines the entire purpose of having a curated knowledge base.

The Path to a Solution

The key to solving this problem lies in understanding that language models will naturally try to be helpful by providing complete answers. They need explicit constraints and clear instructions to override this behavior. Let’s look at how we can transform our example to prevent out-of-context information.

First, here’s how we typically implement a RAG system with minimal constraints:

@log(name="rag_with_hallucination")
def rag_with_hallucination(query: str):
    documents = retrieve_documents(query)
    formatted_docs = format_documents(documents)

    weak_prompt = f"""
    Answer the following question based on the context provided.

    Question: {query}
    Context: {formatted_docs}
    """

    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": weak_prompt}
        ]
    )

This implementation has several weaknesses:

  • The system message is too general
  • The prompt doesn’t explicitly restrict external knowledge
  • There’s no guidance on handling missing information

Here’s how we can strengthen our implementation:

@log(name="rag_with_constraint")
def rag_with_constraint(query: str):
    documents = retrieve_documents(query)
    formatted_docs = format_documents(documents)

    strong_prompt = f"""
    Answer the following question based STRICTLY on the context provided.
    If the information needed to answer the question is not explicitly contained in the context,
    respond with: "I don't have enough information in the provided context to answer this question."

    DO NOT use any knowledge outside of the provided context.

    Question: {query}
    Context: {formatted_docs}
    """

    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are a helpful assistant that ONLY answers based on the provided context. Never use external knowledge."},
            {"role": "user", "content": strong_prompt}
        ]
    )

Now, when we ask about the Eiffel Tower’s completion date, we get a different response:

I don't have enough information in the provided context to answer this question.
The context only mentions that the Eiffel Tower is an iron lattice tower in Paris
and was designed by Gustave Eiffel.

This response might seem less helpful at first, but it’s actually much better because:

  • It’s honest about what information is available
  • It clearly indicates what the source documents tell us
  • It maintains the integrity of our knowledge base

The improvement comes from two key changes:

  1. A stronger system message that explicitly defines the model’s role and limitations
  2. A structured prompt that:
    • Uses clear, direct language (“STRICTLY”, “DO NOT”)
    • Provides explicit instructions for handling missing information
    • Reinforces the importance of staying within the provided context

Building a Complete Solution

To implement this approach in your own RAG system, you’ll need several components working together. Let’s walk through a complete implementation that demonstrates both the problem and its solution.

1

Setting Up the Environment

First, let’s set up our environment with the necessary imports and configurations:

import os
from dotenv import load_dotenv
from galileo import openai, log, galileo_context
import questionary

load_dotenv()

# Check if Galileo logging is enabled
logging_enabled = os.environ.get("GALILEO_API_KEY") is not None

galileo_context.init(project="out-of-context", log_stream="dev")

# Initialize OpenAI client
client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

This setup includes:

  • Loading environment variables for API keys
  • Setting up Galileo logging for tracking operations
  • Creating an OpenAI client for model interactions
2

Understanding the Document Retriever

The document retriever is designed to demonstrate how incomplete context can lead to out-of-context information:

@log(span_type="retriever")
def retrieve_documents(query: str):
    """
    Simulated document retrieval that intentionally returns incomplete information
    to demonstrate the out-of-context problem.
    """
    # Dictionary of queries and their intentionally incomplete contexts
    incomplete_contexts = {
        "eiffel tower": [
            {
                "content": "The Eiffel Tower is an iron lattice tower located in Paris, France. It was designed by Gustave Eiffel.",
                "metadata": {
                    "id": "doc1",
                    "source": "travel_guide",
                    "category": "landmarks",
                    "relevance": "high"
                }
            }
        ],
        "python language": [
            {
                "content": "Python is a high-level programming language known for its readability and simple syntax.",
                "metadata": {
                    "id": "doc1",
                    "source": "programming_guide",
                    "category": "languages",
                    "relevance": "high"
                }
            }
        ],
        "climate change": [
            {
                "content": "Climate change refers to long-term shifts in temperatures and weather patterns. Human activities have been the main driver of climate change since the 1800s.",
                "metadata": {
                    "id": "doc1",
                    "source": "environmental_science",
                    "category": "global_issues",
                    "relevance": "high"
                }
            }
        ],
        "artificial intelligence": [
            {
                "content": "Artificial intelligence involves creating systems capable of performing tasks that typically require human intelligence.",
                "metadata": {
                    "id": "doc1",
                    "source": "technology_overview",
                    "category": "ai",
                    "relevance": "high"
                }
            }
        ],
        "quantum computing": [
            {
                "content": "Quantum computing uses quantum bits or qubits that can represent multiple states simultaneously.",
                "metadata": {
                    "id": "doc1",
                    "source": "computing_technology",
                    "category": "quantum",
                    "relevance": "high"
                }
            }
        ]
    }

    # Default case for queries not in our predefined list
    default_docs = [
        {
            "content": "This is a generic response with limited information about the query topic.",
            "metadata": {
                "id": "default_doc",
                "source": "general_knowledge",
                "category": "miscellaneous",
                "relevance": "low"
            }
        }
    ]

    # Find the most relevant predefined query
    for key in incomplete_contexts:
        if key in query.lower():
            return incomplete_contexts[key]

    return default_docs

Key points about the retriever:

  • It simulates real-world document retrieval with intentionally incomplete information
  • Uses predefined contexts to demonstrate the out-of-context problem
  • Includes metadata for tracking document sources and relevance
3

Demonstrating the Problem

Let’s look at how a weak prompt can lead to out-of-context information:

@log(name="rag_with_hallucination")
def rag_with_hallucination(query: str):
    """
    RAG implementation that demonstrates the out-of-context problem by using
    a system prompt that doesn't properly constrain the model.
    """
    documents = retrieve_documents(query)

    # Format documents for better readability in the prompt
    formatted_docs = ""
    for i, doc in enumerate(documents):
        formatted_docs += f"Document {i+1} (Source: {doc['metadata']['source']}):\n{doc['content']}\n\n"

    # This prompt doesn't strongly constrain the model
    weak_prompt = f"""
    Answer the following question based on the context provided.

    Question: {query}

    Context:
    {formatted_docs}
    """

    try:
        print("Generating answer (prone to out-of-context information)...")
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": weak_prompt}
            ],
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"Error generating response: {str(e)}"

Problems with this approach:

  • The weak prompt doesn’t explicitly constrain the model
  • The system message is too generic
  • No explicit instruction to avoid using external knowledge
4

Implementing the Solution

Now, let’s see how to prevent out-of-context information with a stronger prompt:

@log(name="rag_with_constraint")
def rag_with_constraint(query: str):
    """
    RAG implementation that demonstrates how to mitigate the out-of-context problem
    by using a stronger system prompt and explicit instructions.
    """
    documents = retrieve_documents(query)

    # Format documents for better readability in the prompt
    formatted_docs = ""
    for i, doc in enumerate(documents):
        formatted_docs += f"Document {i+1} (Source: {doc['metadata']['source']}):\n{doc['content']}\n\n"

    # This prompt strongly constrains the model
    strong_prompt = f"""
    Answer the following question based STRICTLY on the context provided.
    If the information needed to answer the question is not explicitly contained in the context,
    respond with: "I don't have enough information in the provided context to answer this question."

    DO NOT use any knowledge outside of the provided context.

    Question: {query}

    Context:
    {formatted_docs}
    """

    try:
        print("Generating answer (constrained to context)...")
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that ONLY answers based on the provided context. Never use external knowledge."},
                {"role": "user", "content": strong_prompt}
            ],
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"Error generating response: {str(e)}"

Key improvements:

  • Explicit instruction to use only provided context
  • Clear directive to acknowledge when information is missing
  • Stronger system message that reinforces context adherence
5

Running the Interactive Demo

The main function provides an interactive way to test and compare both approaches:

@log
def main():
    print("Out-of-Context RAG Demo")
    print("This demo shows how RAG systems can generate out-of-context information and how to prevent it.")

    # Check environment setup
    if logging_enabled:
        print("Galileo logging is enabled")
    else:
        print("Galileo logging is disabled")

    api_key = os.environ.get("OPENAI_API_KEY")
    if not api_key:
        print("OpenAI API Key is missing")
        return

    # Example queries that demonstrate the problem
    suggested_queries = [
        "When was the Eiffel Tower completed?",
        "Who created the Python language and when?",
        "What are the main effects of climate change?",
        "When was artificial intelligence first developed?",
        "How many qubits are in the most powerful quantum computer?"
    ]

    print("\nSuggested queries (these will demonstrate the problem):")
    for i, q in enumerate(suggested_queries):
        print(f"{i+1}. {q}")

    # Main interaction loop
    while True:
        try:
            # Get user query
            query = questionary.text(
                "Enter your question (or type a number 1-5 to use a suggested query):",
                validate=lambda text: len(text) > 0
            ).ask()

            if query.lower() in ['exit', 'quit', 'q']:
                break

            # Check if user entered a number for suggested queries
            if query.isdigit() and 1 <= int(query) <= len(suggested_queries):
                query = suggested_queries[int(query)-1]
                print(f"Using query: {query}")

            # Generate both types of responses
            hallucinated_result = rag_with_hallucination(query)
            constrained_result = rag_with_constraint(query)

            # Display the responses
            print("\nUnconstrained Response (Prone to Out-of-Context Information):")
            print(hallucinated_result)

            print("\nConstrained Response (Limited to Context):")
            print(constrained_result)

            # Ask if user wants to continue
            continue_session = questionary.confirm(
                "Do you want to ask another question?",
                default=True
            ).ask()

            if not continue_session:
                break

        except Exception as e:
            print(f"Error: {str(e)}")

if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        print("\nExiting Out-of-Context RAG Demo. Goodbye!")
    finally:
        galileo_context.flush()  # Only flush at the very end

Using this code, you can:

  • Test predefined queries that highlight the problem
  • Compare responses from both approaches
  • See the effectiveness of context constraints
6

Analyzing the Results

When running the demo, you’ll notice:

  • Unconstrained Responses: May include information not present in the context
  • Constrained Responses: Strictly adhere to provided information
  • Completeness vs. Accuracy: Trade-off between complete answers and factual accuracy

Here’s an example comparison:

Query: “When was the Eiffel Tower completed?”

Unconstrained Response:

The Eiffel Tower was completed in 1889 for the World's Fair in Paris.

Constrained Response:

I don't have enough information in the provided context to answer this question.
The context only mentions that the Eiffel Tower is an iron lattice tower in Paris
and was designed by Gustave Eiffel.

The constrained response demonstrates better adherence to the available context, even though it provides less information.

7

Best Practices and Recommendations

To prevent out-of-context information in your RAG system:

  1. Strong Prompting:

    • Be explicit about using only provided context
    • Include clear instructions for handling missing information
    • Use system messages that reinforce context adherence
  2. Context Management:

    • Ensure retrieved documents are relevant and complete
    • Include metadata for tracking document sources
    • Monitor and log context utilization
  3. Response Validation:

    • Compare responses against provided context
    • Track and measure context adherence
    • Use Galileo metrics to monitor performance
  4. User Experience:

    • Clearly communicate when information is limited
    • Provide transparent source attribution
    • Balance completeness with accuracy