Skip to main content

Chunk Relevance measures whether a given chunk of text contains information that could help answer the user’s query in a RAG pipeline.
Chunk Relevance is a binary metric: each chunk is either Relevant or Not Relevant. A chunk is considered Relevant when:
  • The chunk contains at least some information that is useful to answer the query (even partially)
  • The chunk provides a key piece of information (such as the name of an entity or a specific fact) that can be used to find the answer in another chunk
  • Any part of the chunk helps answer the query either partially or completely
A chunk is considered Not Relevant when:
  • The chunk contains no useful information for answering the query
  • The chunk is completely off-topic or only contains background/tangential information with no bearing on the query
  • The content is topically related but doesn’t answer the question even partially
We do not require the chunk to fully answer the query—even partial relevance is sufficient. We do not penalize for incompleteness; as long as something in the chunk is relevant, we label it as Relevant.

Calculation method

Chunk Relevance is computed through a multi-step process:
1

Model Request

Additional evaluation requests are sent to an LLM to analyze each chunk’s relevance to the user query.
2

Independent Evaluation

Each chunk is evaluated independently against the query to determine its relevance, without using outside knowledge or making assumptions.
3

Binary Classification

Each chunk receives a binary classification: Relevant (true) if it provides any useful information, or Not Relevant (false) if it contains no useful information for the query.
4

Result Generation

The evaluation produces both a classification for each chunk and an overall explanation of the reasoning.
This metric is computed by prompting an LLM, and thus requires additional LLM calls to compute, which may impact usage and billing.

Understanding chunk relevance

Example Scenario

This example illustrates chunk relevance:
User query: “What is the population of the capital city of France?”
  • Chunk 1: “The capital city of France is Paris.”
  • Chunk 2: “Paris has a population of approximately 2.1 million people.”
  • Chunk 3: “France is a country located in Western Europe.”
Relevance analysis: Chunks 1 and 2 are Relevant because they provide information directly related to answering the query. Chunk 1 provides crucial context (the capital is Paris) that enables the answer to be found, and Chunk 2 provides the actual answer. Chunk 3 is Not Relevant because it only provides general background information about France’s location, which doesn’t help answer the population question.
Chunk Relevance forms the foundation for other retrieval metrics, including Context Precision and Precision @ K.

Optimizing your RAG pipeline

Addressing Low Relevance Scores

When many chunks are marked as Not Relevant, it indicates issues with the retrieval system. To improve the system:
Improve retrieval quality: Refine embedding models, similarity search algorithms, or retrieval parameters to better match queries with relevant content.
Optimize chunking strategy: Ensure chunks are semantically coherent and contain complete information units rather than arbitrary text splits.
Adjust retrieval parameters: Experiment with different Top K values, similarity thresholds, or reranking strategies to improve relevance.
Analyze patterns: Identify common characteristics of non-relevant chunks to understand why the retrieval system is selecting them.

Best practices

Combine with Other Metrics

Chunk Relevance works alongside Context Precision and Precision @ K for a comprehensive view of retrieval effectiveness.

Optimize Retrieval Strategy

Relevance scores help refine embedding models, similarity search algorithms, and retrieval parameters.

Monitor Across Queries

Tracking relevance rates across different query types helps identify patterns and improve retrieval system performance.
Chunk Relevance is designed to be lenient—we mark chunks as relevant if they provide any useful information, even if incomplete. This ensures that chunks with partial answers or bridging information are not incorrectly marked as irrelevant.