Action Completion
Understand how to measure and optimize the effectiveness of your AI agent’s actions
Action Completion determines whether the assistant successfully accomplished all of the user’s goals.
To accomplish a user’s goal, the assistant must:
- Provide a complete answer in the case of a question.
- Provide a confirmation of successful action in the case of a request.
Additionally, the response must:
- Be coherent and factually accurate.
- Comprehensively address every aspect of the user’s request.
- Avoid contradicting tool outputs.
- Summarize all relevant parts returned by tools.
Calculation Method
If the response does not achieve an Action Completion score of 100%, it indicates that at least one judge considered the model to have failed in accomplishing every user goal.
Action Completion is calculated by:
Additional Requests
Multiple requests are sent to an LLM (e.g., OpenAI’s GPT4o) using a carefully designed chain-of-thought prompt that adheres to the definition above.
Judgment Responses
The LLM generates multiple distinct responses, each containing:
- An explanation. A final judgment: “Yes” (goal accomplished) or “No” (goal not accomplished).
Score Computation
Action Completion Score = (Number of “Yes” Responses) / (Total Number of Responses)
Explanation Surfacing
One explanation is surfaced, chosen to align with the majority judgment among the responses.
We display one of the generated explanations alongside the score, always choosing one that aligns with the majority judgment.
This metric requires multiple LLM calls to compute, which may impact usage and billing.
Understanding Action Completion
When to Use This Metric
The Action Completion metric is particularly valuable in the following scenarios:
- Agentic Workflows: When an AI agent must decide on a course of action and select tools to accomplish tasks.
- Multi-step Tasks: When completing a user’s request requires multiple steps or decisions.
- Tool-using Assistants: When evaluating if the assistant used available tools effectively.
This metric helps determine whether the assistant chose appropriate actions and made meaningful progress toward fulfilling the user’s request.
Best Practices
To optimize your assistant’s performance and ensure high Action Completion scores, consider the following best practices:
Track Progress Over Time
Monitor Action Completion scores across different versions of your agent to identify trends and ensure continuous improvements in task completion capabilities.
Analyze Failure Patterns
When Action Completion scores are low, examine specific steps or scenarios where agents fail to meet user goals. Use this analysis to identify and address systematic issues.
Combine with Other Metrics
Use Action Completion alongside other agentic metrics, such as Action Advancement, to get a comprehensive view of your assistant’s effectiveness and identify areas for improvement.
Test Edge Cases
Create evaluation datasets that include complex, multi-step tasks to thoroughly assess your agent’s ability to handle challenging scenarios and advance user goals effectively.
When optimizing for Action Completion, ensure you’re not sacrificing other important aspects like safety, factual accuracy, or user experience in pursuit of task completion.