Precision @ K

Precision @ K measures the percentage of relevant chunks among the top K retrieved chunks, where K is a specific rank position.

Precision @ K is a continuous metric ranging from 0 to 1:

Low Precision

Few or no chunks in the top K are relevant to the query

High Precision

Most or all chunks in the top K are relevant to the query

This metric helps evaluate retrieval quality at specific rank positions and determine the optimal number of chunks to retrieve (Top K). Unlike Context Precision, Precision @ K focuses on a specific rank K to assess ranking quality.

Calculation method

Precision @ K is computed based on Chunk Relevance scores for the top K chunks:

Chunk Relevance Calculation

First, Chunk Relevance is computed for each retrieved chunk, producing a binary classification (Relevant or Not Relevant) for each chunk.

Rank Ordering

The retrieved chunks are ordered by their rank position (as logged in the retriever span), with position 1 being the highest-ranked chunk.

Top K Selection

The top K chunks are selected based on their rank order, where K is the specified rank position (e.g., K=3 means the top 3 chunks).

Relevance Count

The number of relevant chunks within the top K is counted.

Score Calculation

Precision @ K is computed as the ratio of relevant chunks in the top K to the total number of chunks in the top K (which equals K).

For example, Precision @ 3 measures what percentage of the top 3 chunks are relevant. If 1 out of 3 chunks is relevant, Precision @ 3 = 33%.

Understanding precision @ K

How Precision @ K Helps Optimize Top K

Precision @ K is particularly valuable for determining the optimal number of chunks to retrieve:

Example scenario: When retrieving 10 chunks (Top K = 10), Context Precision is 40%, meaning only 4 out of 10 chunks are relevant. This suggests reducing Top K.

Using Precision @ K: Evaluating Precision @ 4 shows it’s only 40%, meaning for 60% of examples, there are useful chunks in ranks 5-10. However, Precision @ 7 is 90%, indicating that for 90% of examples, the most relevant chunks are in the top 7.

Optimization decision: Reducing Top K from 10 to 7 captures the relevant chunks for most queries while reducing unnecessary retrieval and processing.

Precision @ K is differentiated from Context Precision: Precision @ K evaluates precision at a specific rank K and helps assess ranking quality, while Context Precision considers all retrieved chunks to measure noise in retrieval.

Choosing K

Guidance for Selecting K

Choosing an appropriate K value depends on the retrieval system’s goals, latency and cost constraints, and how much context downstream models can effectively use.

Start with small K values: Begin with small values such as K = 1, 3, or 5 to understand how well the very top-ranked chunks support high-quality responses.

Analyze Precision @ multiple K values: Evaluate Precision @ K across a range of K values (for example, 1, 3, 5, 10) to see where the metric plateaus. Points where precision stops improving significantly often indicate a good upper bound for K.

Balance recall and efficiency: Larger K values may improve recall by including more relevant chunks, but at the cost of more noise, higher latency, and higher token usage. The chosen K should balance these tradeoffs for the specific application.

Optimizing your RAG pipeline

Addressing Low Precision @ K Scores

When Precision @ K scores are low, it indicates that many chunks in the top K positions are not relevant. To improve the system:

Optimize Top K value: Evaluate Precision @ K metrics at different K values to find the optimal number of chunks to retrieve. Reduce K if precision remains high at lower values.

Improve ranking quality: If Precision @ K is low but higher K values show better precision, focus on improving ranking/reranking to move relevant chunks earlier.

Enhance retrieval quality: Refine embedding models, similarity search algorithms, or retrieval parameters to better match queries with relevant content.

Implement reranking: Use a reranking model to improve the order of retrieved chunks, ensuring the most relevant ones appear in the top K positions.

Comparing Precision @ K and Context Precision

Understanding Metric Combinations

Different combinations of Precision @ K and Context Precision scores reveal different aspects of retrieval system performance:

High Precision @ K, High Context Precision: The retrieval system is performing well overall. The top K positions contain mostly relevant chunks (good ranking), and the overall retrieved set has minimal noise. This indicates both effective ranking and high-quality retrieval.

High Precision @ K, Low Context Precision: The top K positions contain mostly relevant chunks (good ranking), but the overall retrieved set has significant noise. This indicates that while the ranking algorithm prioritizes relevant content effectively, the retrieval system is bringing back too many irrelevant chunks beyond the top K. Consider reducing Top K or improving retrieval quality.

Low Precision @ K, High Context Precision: While the overall retrieval contains mostly relevant chunks (low noise), the ranking is poor. Relevant chunks are distributed throughout the retrieved set rather than concentrated in the top K positions. This suggests the retrieval system finds relevant content but needs better ranking or reranking.

Low Precision @ K, Low Context Precision: Both metrics indicate problems. The retrieval system has high noise (many irrelevant chunks) and poor ranking (relevant chunks are not in top positions). This suggests fundamental issues with both retrieval quality and ranking that need to be addressed.

Best practices

Determine Optimal Top K

Evaluating Precision @ K at multiple K values helps find the optimal number of chunks to retrieve for each use case.

Monitor Ranking Quality

Tracking Precision @ K helps ensure retrieval systems rank relevant chunks appropriately in the top positions.

Combine with Context Precision

Precision @ K works alongside Context Precision for a comprehensive view of retrieval effectiveness at both specific ranks and overall.

Analyze Across Queries

Evaluating Precision @ K across different query types helps identify patterns and optimize retrieval strategies accordingly.

When optimizing for Precision @ K, the goal is to find the right balance: a K value that’s high enough to capture all relevant chunks but low enough to avoid retrieving too many irrelevant chunks. Evaluating Precision @ K metrics at different K values helps make data-driven decisions about the Top K parameter.

Creating multiple Precision @ K variants in Galileo

Configuring Multiple K Values

Many teams benefit from tracking several Precision @ K variants (for example, Precision @ 1, 3, 5, and 10) at the same time to understand how ranking quality changes across different depths of the retrieved set.

Define separate metric configurations: Create distinct Precision @ K configurations in Galileo for each K value of interest (such as K = 1, 3, 5, 10) so they appear as separate metrics in experiments and Log streams.

Use code-based metric customization: For each additional K value, create a new code-based metric in Galileo, copy the prefilled scorer code from the preset Precision @ K metric, and update the value of K in the code. This allows multiple Precision @ K variants to share the same logic while differing only in the K parameter.

Overview

Get Started

Logging and Monitoring

Experiments

Runtime Protection

Metrics

Annotations

Integrations

Security

References

Calculation method

Understanding precision @ K

How Precision @ K Helps Optimize Top K

Choosing K

Guidance for Selecting K

Optimizing your RAG pipeline

Addressing Low Precision @ K Scores

Comparing Precision @ K and Context Precision

Understanding Metric Combinations

Best practices

Determine Optimal Top K

Monitor Ranking Quality

Combine with Context Precision

Analyze Across Queries

Creating multiple Precision @ K variants in Galileo

Configuring Multiple K Values

Overview

Get Started

Logging and Monitoring

Experiments

Runtime Protection

Metrics

Annotations

Integrations

Security

References

​Calculation method

​Understanding precision @ K

​How Precision @ K Helps Optimize Top K

​Choosing K

​Guidance for Selecting K

​Optimizing your RAG pipeline

​Addressing Low Precision @ K Scores

​Comparing Precision @ K and Context Precision

​Understanding Metric Combinations

​Best practices

Determine Optimal Top K

Monitor Ranking Quality

Combine with Context Precision

Analyze Across Queries

​Creating multiple Precision @ K variants in Galileo

​Configuring Multiple K Values

Calculation method

Understanding precision @ K

How Precision @ K Helps Optimize Top K

Choosing K

Guidance for Selecting K

Optimizing your RAG pipeline

Addressing Low Precision @ K Scores

Comparing Precision @ K and Context Precision

Understanding Metric Combinations

Best practices

Creating multiple Precision @ K variants in Galileo

Configuring Multiple K Values