Completeness in RAG Systems

This guide explains the challenge of ensuring answer completeness in Retrieval-Augmented Generation (RAG) systems, using the Galileo completeness metric to measure success. We’ll compare a basic implementation with an enhanced version to demonstrate how different approaches affect answer completeness.

The completeness challenge

Answer completeness refers to how thoroughly and comprehensively a RAG system answers a given question. The Galileo completeness metric evaluates this by using the LLM as a judge to compare the response against its own knowledge of the topic. This approach has important implications:

The metric can identify when a response misses information that is part of open domain knowledge
The metric cannot identify gaps in information that is not part of open domain knowledge
The evaluation helps ensure responses are complete relative to what is generally known about a topic

A complete answer should:

Cover all relevant aspects of the question
Include all significant details from the source documents
Synthesize information from multiple sources when relevant
Provide proper context and background information
Not miss any important information that is part of open domain knowledge

The Galileo completeness metric evaluates these aspects by analyzing how well the answer covers all relevant information that is part of open domain knowledge, using the LLM as a judge to identify any gaps or omissions in the response.

Basic implementation

The basic implementation (ensure-completeness-basic.py) demonstrates several limitations that lead to incomplete answers:

Limited Document Retrieval

# In ensure-completeness-basic.py
store = DocumentStoreBasic(
    source="custom",
    custom_documents_path=str(custom_docs_path),
    num_docs=1,  # Only one document
    chunk_size=1000  # Larger chunks, less precise
)

The basic implementation suffers from several critical limitations in document retrieval. It’s designed to return only a single document, which significantly limits the breadth of information available for answering questions. The search mechanism relies on basic L2 distance for similarity matching, which is less effective for semantic search compared to more sophisticated approaches. Additionally, the system lacks any form of reranking or relevance scoring, meaning it can’t refine its results based on deeper semantic understanding. The chunk size is also quite large at 1000 characters, which can lead to less precise and potentially less relevant information being included in each chunk.

Simple Document Processing

# In document_store_basic.py
def _load_custom_documents(self, documents_path: str, chunk_size: int):
    # Simple chunking by paragraphs
    paragraphs = [p.strip() for p in text.split('\n\n') if p.strip()]
    # Basic sentence-based chunking
    sentences = re.split(r'(?<=[.!?])\s+', paragraph)

The document processing in the basic implementation is quite rudimentary. It uses a simple paragraph-based splitting approach followed by basic sentence-based chunking. This method doesn’t preserve context effectively, as it doesn’t maintain relationships between chunks or consider the semantic coherence of the text. The metadata stored with each chunk is minimal, containing only basic information like source and chunk ID, without any sophisticated enrichment or relationship tracking. The chunking strategy is also fixed, applying the same approach regardless of the content type or structure.

Basic Prompting

# In ensure-completeness-basic.py
SYSTEM_PROMPT = """You are a helpful assistant.
Answer questions based on the provided document."""

class Prompts:
    BASIC = """Question: {query}

Document:
{documents}

Please provide an answer based on the document above."""

The prompting strategy in the basic implementation is minimal and lacks specific guidance for ensuring complete answers. The system prompt is very simple, providing no explicit instructions about completeness or how to handle multiple sources of information. There are no requirements for document synthesis or citation, which can lead to answers that don’t fully utilize the available information or properly attribute their sources.

These limitations in the basic implementation often result in incomplete answers that miss key information from other relevant documents, lose context due to poor chunking, fail to synthesize information across different sources, and omit important details. This typically leads to lower completeness scores.

Improved implementation

The improved implementation (ensure-completeness-enhanced.py) addresses these limitations through several significant improvements:

Improved Document Retrieval

# In ensure-completeness-enhanced.py
store = DocumentStore(
    source="custom",
    custom_documents_path=str(custom_docs_path),
    num_docs=10,  # More documents
    use_reranking=True,  # Use reranking
    reranking_threshold=0.6
)

The improved implementation significantly improves document retrieval by returning multiple relevant documents (10 or more) instead of just one. It uses cosine similarity for semantic matching, which provides better results for finding conceptually related content. The system implements sophisticated reranking with a threshold of 0.6, allowing it to refine and prioritize the most relevant results. The chunk size is also reduced to 512 characters, making the chunks more precise and focused.

Sophisticated Document Processing

# In document_store.py
def _chunk_text(self, text: str, chunk_size: int) -> List[str]:
    # First split into sentences
    sentences = re.split(r'(?<=[.!?])\s+', text)
    # Maintain context by keeping sentences together
    chunks = []
    current_chunk = []
    current_length = 0

The document processing in the improved implementation is much more sophisticated. It uses context-aware chunking that preserves sentence relationships and maintains semantic coherence. Each chunk is enriched with rich metadata including relevance scores, source tracking, and detailed formatting with relevance indicators. The system also implements dynamic chunking that adapts to the content structure, ensuring better context preservation.

Improved Prompting

# In ensure-completeness-enhanced.py
SYSTEM_PROMPT = """You are a knowledgeable science historian tasked with
providing comprehensive answers...

IMPORTANT INSTRUCTIONS:
1. You MUST use ALL relevant information from the retrieved documents...
2. For each key fact or detail you mention, explicitly cite the source
    document number...
3. If multiple documents contain related information, synthesize them...
"""

The improved implementation uses a comprehensive system prompt that provides clear guidance for ensuring complete answers. It explicitly instructs the model to use all relevant information from the retrieved documents, requires proper citation of sources, and encourages synthesis of information across multiple documents. The prompt also specifies thoroughness requirements and provides a clear role for the model as a knowledgeable science historian.

These improvements in the improved implementation lead to more comprehensive answers that better incorporate multiple sources, preserve context through improved chunking, effectively synthesize information across documents, and include all significant details. This typically results in higher Galileo completeness scores.

Key differences in practice

When comparing the two implementations:

Document Selection

# Basic (document_store_basic.py)
def search(self, query: str) -> list:
    # Get only one result
    distances, indices = self.index.search(query_vector.astype('float32'), self.num_docs)

# Enhanced (document_store.py)
def search(self, query: str) -> list:
    # Get more initial results if using reranking
    initial_k = self.k * self.reranking_multiplier if self.use_reranking else self.k
    scores, indices = self.index.search(query_vector.astype('float32'), initial_k)

The basic implementation’s search functionality is quite limited, returning only a single document which can lead to missing key information. In contrast, the improved implementation uses a more sophisticated approach that retrieves multiple documents and implements reranking to ensure comprehensive coverage of the topic. The enhanced version also uses a multiplier for initial results when reranking is enabled, allowing for better selection of the most relevant content.

Information Synthesis

# Basic (ensure-completeness-basic.py)
SYSTEM_PROMPT = """You are a helpful assistant.
Answer questions based on the provided document."""

# Enhanced (ensure-completeness-enhanced.py)
SYSTEM_PROMPT = """You are a knowledgeable science historian..."""

The basic implementation’s prompting strategy is minimal and doesn’t encourage synthesis of information across multiple sources. The improved implementation, however, uses a more sophisticated prompt that explicitly requires the model to synthesize information from multiple documents and provide comprehensive answers. This leads to more complete and well-rounded responses that draw from all available relevant information.

Context Preservation

# Basic (document_store_basic.py)
def _load_custom_documents(self, documents_path: str, chunk_size: int):
    # Simple chunking by paragraphs
    paragraphs = [p.strip() for p in text.split('\n\n') if p.strip()]

# Enhanced (document_store.py)
def _chunk_text(self, text: str, chunk_size: int) -> List[str]:
    # Context-aware chunking
    sentences = re.split(r'(?<=[.!?])\s+', text)

The basic implementation uses a simple paragraph-based chunking approach that can lead to loss of context, especially with larger chunks. The improved implementation, on the other hand, uses a more sophisticated context-aware chunking strategy that maintains sentence relationships and semantic coherence. This results in better preservation of context and more coherent information retrieval.

Answer Quality

# Basic (ensure-completeness-basic.py)
class Prompts:
    BASIC = """Question: {query}
    Document: {documents}
    Please provide an answer based on the document above."""

# Enhanced (ensure-completeness-enhanced.py)
class Prompts:
    ENHANCED = """Question: {query}
    Documents: {documents}
    IMPORTANT: Your response must:
    1. Use ALL relevant information from the documents above
    2. Cite specific document numbers for each fact
    3. Synthesize information from multiple documents..."""

The basic implementation’s prompting strategy often results in answers that miss key details and lack proper context. The improved implementation, with its comprehensive prompting strategy, produces answers that are more thorough, properly cited, and effectively synthesize information from multiple sources. This leads to higher quality, more complete answers that better serve the user’s needs.

Measuring success with Galileo

The Galileo completeness metric provides a quantitative way to evaluate the effectiveness of RAG systems by assessing several key aspects. It measures the coverage of relevant information, evaluating how well the system incorporates all necessary details from the source documents. The metric also assesses the synthesis of multiple sources, ensuring that information from different documents is properly combined and presented coherently. Additionally, it evaluates the citation of sources, checking that all information is properly attributed. The metric also considers context preservation, ensuring that the meaning and relationships between pieces of information are maintained. Finally, it provides an overall assessment of answer thoroughness. The improved implementation consistently achieves higher completeness scores by addressing the limitations of the basic approach through better document retrieval, processing, and prompting strategies.

Practical example: penicillin discovery

To illustrate the difference between the basic and improved implementations, let’s examine how they handle a query about the discovery of penicillin. This example is particularly interesting because penicillin’s discovery is well-known open domain knowledge that the LLM already has access to. The completeness metric evaluates how well the system uses the retrieved documents compared to this baseline knowledge.

Basic Implementation (75% Completeness)

# Basic implementation response
response = "Penicillin was discovered by Fleming. The development process involved collaboration between many scientists and institutions."

improved implementation (100% Completeness)

# improved implementation response
response = """Penicillin's discovery and development was a collaborative effort spanning several years.
Alexander Fleming made the initial discovery of penicillin in 1928 (Document 2).
The development into a practical antibiotic was led by Howard Florey and Ernst Chain at Oxford University (Document 3),
who developed methods to produce penicillin in large quantities.
This collaborative achievement was recognized with the 1945 Nobel Prize awarded to Fleming, Florey, and Chain (Document 2)."""

This example clearly demonstrates how the improved implementation achieves higher completeness by:

Retrieving and utilizing multiple relevant documents
Properly attributing information to specific sources
Synthesizing information across documents
Including all significant details and context
Presenting information in a coherent narrative
Prioritizing retrieved document content over general knowledge

The difference in completeness scores (75% vs 100%) reflects the improved implementation’s ability to provide more thorough, well-sourced, and contextually rich answers that effectively utilize the provided documents rather than falling back on general knowledge.

Best practices for ensuring completeness

Document Retrieval

# improved approach
store = DocumentStore(
    num_docs=10,
    use_reranking=True,
    reranking_threshold=0.6
)

Effective document retrieval is crucial for ensuring complete answers. The improved approach retrieves multiple relevant documents (typically 10 or more) to ensure comprehensive coverage of the topic. It uses semantic similarity search to find conceptually related content, rather than just keyword matches. The implementation of reranking with a threshold of 0.6 helps refine and prioritize the most relevant results. The system also uses appropriate chunk sizes to ensure that the retrieved information is precise and focused.

Document Processing

# improved approach
def _chunk_text(self, text: str, chunk_size: int) -> List[str]:
    # Context-aware chunking
    sentences = re.split(r'(?<=[.!?])\s+', text)
    chunks = []
    current_chunk = []
    current_length = 0

Sophisticated document processing is essential for maintaining context and relationships between pieces of information. The improved approach uses context-aware chunking that preserves sentence relationships and semantic coherence. Each chunk is enriched with relevant metadata, including source tracking and relevance indicators. The system also implements dynamic chunking that adapts to the content structure, ensuring better context preservation and more coherent information retrieval.

Prompting

# improved approach
SYSTEM_PROMPT = """You are a knowledgeable science historian...
IMPORTANT INSTRUCTIONS:
1. You MUST use ALL relevant information...
2. For each key fact or detail you mention...
3. If multiple documents contain related information..."""

Effective prompting is crucial for guiding the model to produce complete and well-structured answers. The improved approach uses a comprehensive system prompt that provides clear instructions for ensuring completeness. It requires proper citation of sources and encourages synthesis of information across multiple documents. The prompt also specifies thoroughness requirements and provides a clear role for the model, helping it understand the expectations for the quality and completeness of its responses.

By implementing these best practices, RAG systems can achieve higher completeness scores and provide more comprehensive, accurate answers to user queries.

Agentic AI

Conversational AI

Retrieval-Augmented Generation

Completeness in RAG Systems

The completeness challenge

Basic implementation

Improved implementation

Key differences in practice

Measuring success with Galileo

Practical example: penicillin discovery

Best practices for ensuring completeness

Agentic AI

Conversational AI

Retrieval-Augmented Generation

​The completeness challenge

​Basic implementation

​Improved implementation

​Key differences in practice

​Measuring success with Galileo

​Practical example: penicillin discovery

​Best practices for ensuring completeness

The completeness challenge

Basic implementation

Improved implementation

Key differences in practice

Measuring success with Galileo

Practical example: penicillin discovery

Best practices for ensuring completeness