Understand how to measure whether your agent actually accomplished a user’s goals
Property | Description |
---|---|
Name of Metric | Action Completion |
Metric Category | Agentic Metrics |
Use this metric for | Measuring whether the agent successfully accomplished the user’s goal |
Can be applied to | Session |
LLM/Luna Support | Supported with both LLM + Luna models |
Protect Runtime Protection | No |
Constants | None - Uses dynamic evaluation |
Usage Context | Agentic workflows, multi-step tasks, tool-using assistants |
Value Type | Confidence score denoted as a percentage. |
Input/Output Requirements | Requires agent responses and user goals for evaluation |
Additional Requests
Judgment Responses
Score Computation
Explanation Surfacing
Issue | Cause | Solution |
---|---|---|
Incomplete responses | Agent stops before addressing all user requirements | Implement comprehensive response generation and ensure all user goals are explicitly addressed |
Tool output contradictions | Agent ignores or contradicts information from tools | Ensure agent properly summarizes and incorporates all relevant tool outputs without contradiction |
Missing confirmations | Agent doesn’t confirm successful actions | Add explicit confirmation steps for action-based requests |
Factual inaccuracies | Agent provides incorrect information | Implement fact-checking mechanisms and ensure responses align with tool outputs |
Property | Action Completion | Action Advancement | Tool Selection |
---|---|---|---|
Metric Category | Agentic Performance | Agentic Performance | Agentic Performance |
Use this metric for | Measuring goal accomplishment | Measuring progress toward goals | Measuring tool choice quality |
Best for | Final outcome evaluation | Progress tracking | Tool usage optimization |
LLM/Luna Support | Yes | Yes | Yes |
Protect Runtime Protection | No | No | No |
Value Type | Percentage (0%-100%) | Percentage (0%-100%) | Percentage (0%-100%) |
Limitations | Requires multiple LLM calls | May not capture final success | Doesn’t measure execution quality |