Action Completion determines whether the assistant successfully accomplished all of the user’s goals.

To accomplish a user’s goal, the assistant must:

  • Provide a complete answer in the case of a question.
  • Provide a confirmation of successful action in the case of a request.

Additionally, the response must:

  • Be coherent and factually accurate.
  • Comprehensively address every aspect of the user’s request.
  • Avoid contradicting tool outputs.
  • Summarize all relevant parts returned by tools.

Calculation Method

If the response does not achieve an Action Completion score of 100%, it indicates that at least one judge considered the model to have failed in accomplishing every user goal.

Action Completion is calculated by:

1

Additional Requests

Multiple requests are sent to an LLM (e.g., OpenAI’s GPT4o) using a carefully designed chain-of-thought prompt that adheres to the definition above.

2

Judgment Responses

The LLM generates multiple distinct responses, each containing:

  • An explanation. A final judgment: “Yes” (goal accomplished) or “No” (goal not accomplished).
3

Score Computation

Action Completion Score = (Number of “Yes” Responses) / (Total Number of Responses)

4

Explanation Surfacing

One explanation is surfaced, chosen to align with the majority judgment among the responses.

We display one of the generated explanations alongside the score, always choosing one that aligns with the majority judgment.

This metric requires multiple LLM calls to compute, which may impact usage and billing.

Understanding Action Completion

When to Use This Metric

The Action Completion metric is particularly valuable in the following scenarios:

  • Agentic Workflows: When an AI agent must decide on a course of action and select tools to accomplish tasks.
  • Multi-step Tasks: When completing a user’s request requires multiple steps or decisions.
  • Tool-using Assistants: When evaluating if the assistant used available tools effectively.

This metric helps determine whether the assistant chose appropriate actions and made meaningful progress toward fulfilling the user’s request.

Best Practices

To optimize your assistant’s performance and ensure high Action Completion scores, consider the following best practices:

Track Progress Over Time

Monitor Action Completion scores across different versions of your agent to identify trends and ensure continuous improvements in task completion capabilities.

Analyze Failure Patterns

When Action Completion scores are low, examine specific steps or scenarios where agents fail to meet user goals. Use this analysis to identify and address systematic issues.

Combine with Other Metrics

Use Action Completion alongside other agentic metrics, such as Action Advancement, to get a comprehensive view of your assistant’s effectiveness and identify areas for improvement.

Test Edge Cases

Create evaluation datasets that include complex, multi-step tasks to thoroughly assess your agent’s ability to handle challenging scenarios and advance user goals effectively.

When optimizing for Action Completion, ensure you’re not sacrificing other important aspects like safety, factual accuracy, or user experience in pursuit of task completion.