Overview
Action Completion determines whether the agent successfully accomplished all of the user’s goals.
- The agent provides a complete answer in the case of a question
- The agent provides a confirmation of successful action in the case of a request
- The response is coherent and factually accurate
- The response comprehensively addresses every aspect of the user’s request
- The response avoids contradicting tool outputs
- The response summarizes all relevant parts returned by tools
Action Completion at a glance
Property | Description |
---|---|
Name of Metric | Action Completion |
Metric Category | Agentic Metrics |
Use this metric for | Measuring whether the agent successfully accomplished the user’s goal |
Can be applied to | Session |
LLM/Luna Support | Supported with both LLM + Luna models |
Protect Runtime Protection | No |
Constants | None - Uses dynamic evaluation |
Usage Context | Agentic workflows, multi-step tasks, tool-using assistants |
Value Type | Confidence score denoted as a percentage. |
Input/Output Requirements | Requires agent responses and user goals for evaluation |
When to Use This Metric
Agentic Workflows: When an AI agent must decide on a course of action and select tools to accomplish tasks.
Multi-step Tasks: When completing a user's request requires multiple steps or decisions.
Tool-using Assistants: When evaluating if the assistant successfully used the right tools and accomplished the intended goals.
Calculation method
If the response does not achieve an Action Completion score of 100%, it indicates that at least one judge considered the model to have failed in accomplishing every user goal.1
Additional Requests
Multiple requests are sent to an LLM using a carefully designed chain-of-thought prompt that adheres to the definition above.
2
Judgment Responses
The LLM generates multiple distinct responses, each containing:
- An explanation
- A final judgment: “Yes” (goal accomplished) or “No” (goal not accomplished)
3
Score Computation
Action Completion Score = (Number of “Yes” Responses) / (Total Number of Responses)
4
Explanation Surfacing
One explanation is surfaced, chosen to align with the majority judgment among the responses.
This metric requires multiple LLM calls to compute, which may impact usage and billing.
Score interpretation
Expected Score: 100% - A perfect score indicates the agent successfully accomplished all user goals with complete, accurate, and comprehensive responses.What different scores mean
- 0.0 - 0.3 (Poor): Agent completely failed to accomplish user goals, provided incomplete answers, or contradicted tool outputs. Common causes include insufficient tool usage, incomplete responses, or factual inaccuracies.
- 0.4 - 0.7 (Fair): Agent made progress toward user goals but didn’t fully address all aspects of the request. Areas for improvement include ensuring comprehensive coverage of all user requirements and better tool utilization.
- 0.8 - 1.0 (Excellent): Agent successfully accomplished all user goals with complete, accurate, and comprehensive responses. Best practices include thorough tool usage, complete answer provision, and proper confirmation of successful actions.
How to improve Action Completion scores
To optimize your agent’s performance and ensure high Action Completion scores, focus on comprehensive goal accomplishment and complete response generation.Common issues and solutions
Issue | Cause | Solution |
---|---|---|
Incomplete responses | Agent stops before addressing all user requirements | Implement comprehensive response generation and ensure all user goals are explicitly addressed |
Tool output contradictions | Agent ignores or contradicts information from tools | Ensure agent properly summarizes and incorporates all relevant tool outputs without contradiction |
Missing confirmations | Agent doesn’t confirm successful actions | Add explicit confirmation steps for action-based requests |
Factual inaccuracies | Agent provides incorrect information | Implement fact-checking mechanisms and ensure responses align with tool outputs |
Best practices for optimization
- Track Progress Over Time: Monitor Action Completion scores across different versions of your agent to identify trends and ensure continuous improvements in task completion capabilities.
- Analyze Failure Patterns: When Action Completion scores are low, examine specific steps or scenarios where agents fail to meet user goals. Use this analysis to identify and address systematic issues.
- Combine with Other Metrics: Use Action Completion alongside other agentic metrics, such as Action Advancement, to get a comprehensive view of your assistant’s effectiveness and identify areas for improvement.
- Test Edge Cases: Create evaluation datasets that include complex, multi-step tasks to thoroughly assess your agent’s ability to handle challenging scenarios and advance user goals effectively.
When optimizing for Action Completion, ensure you’re not sacrificing other important aspects like safety, factual accuracy, or user experience in pursuit of task completion.
Comparison to other metrics
Property | Action Completion | Action Advancement | Tool Selection |
---|---|---|---|
Metric Category | Agentic Performance | Agentic Performance | Agentic Performance |
Use this metric for | Measuring goal accomplishment | Measuring progress toward goals | Measuring tool choice quality |
Best for | Final outcome evaluation | Progress tracking | Tool usage optimization |
LLM/Luna Support | Yes | Yes | Yes |
Protect Runtime Protection | No | No | No |
Value Type | Percentage (0%-100%) | Percentage (0%-100%) | Percentage (0%-100%) |
Limitations | Requires multiple LLM calls | May not capture final success | Doesn’t measure execution quality |