Chunk Utilization measures the fraction of text in each retrieved chunk that had an impact on the model’s response in a RAG pipeline.

Chunk Utilization is a continuous metric ranging from 0 to 1:

0

0.5

1

Low Utilization

None of the chunk's content was utilized

Mid Utilization

Only half of the chunk's content influenced the response

High Utilization

The entire chunk's content influenced the response

A chunk with low utilization contains “extraneous” text that did not affect the final response, indicating potential inefficiencies in your chunking strategy.

Calculation Method

1

Model Architecture

We use a fine-tuned in-house Galileo evaluation model based on a transformer encoder architecture.

2

Multi-metric Computation

The same model computes Chunk Adherence, Chunk Completeness, Chunk Attribution and Utilization in a single inference call.

3

Token-level Analysis

For each token in the provided context, the model outputs a utilization probability indicating if that token affected the response.

4

Score Calculation

Chunk Utilization is computed as the fraction of tokens with high utilization probability out of all tokens in the chunk.

5

Model Training

The model is trained on carefully curated RAG datasets and optimized to closely align with the RAG Plus metrics.

Optimizing Your RAG Pipeline

Addressing Low Utilization Scores

Low Chunk Utilization scores could indicate one of two scenarios:

Oversized Chunks: Your chunks are longer than they need to be

  • Check if Chunk Relevance is also low, which confirms this scenario
  • Solution: Tune your retriever to return shorter, more focused chunks
  • Benefits: Improved system efficiency, lower cost, and reduced latency

Ineffective LLM Utilization: The LLM generator model is failing to incorporate all relevant information

  • Check if Chunk Relevance is high, which confirms this scenario
  • Solution: Explore a different LLM that may leverage the relevant information more effectively
  • Benefits: Better response quality and more efficient use of retrieved information

Best Practices

Optimize Chunk Size

Experiment with different chunking strategies to find the optimal chunk size that maximizes utilization without sacrificing relevance.

Monitor Across Models

Compare Chunk Utilization scores across different LLMs to identify which models most efficiently use retrieved information.

Combine with Other Metrics

Use Chunk Utilization alongside Chunk Relevance and Chunk Attribution for a complete picture of retrieval effectiveness.

Analyze Patterns

Look for patterns in low-utilization chunks to identify specific content types or formats that your system processes inefficiently.

When optimizing for Chunk Utilization, balance efficiency with comprehensiveness. Extremely high utilization might indicate chunks that are too small and lack sufficient context for the model.