Action Completion

Action Completion determines whether the agent successfully accomplished all of the user’s goals.

To accomplish a user’s goal, the agent must:

Provide a complete answer in the case of a question.
Provide a confirmation of successful action in the case of a request.

Additionally, the response must:

Be coherent and factually accurate.
Comprehensively address every aspect of the user’s request.
Avoid contradicting tool outputs.
Summarize all relevant parts returned by tools.

Calculation method

If the response does not achieve an Action Completion score of 100%, it indicates that at least one judge considered the model to have failed in accomplishing every user goal. Action Completion is calculated by:

Additional Requests

Multiple requests are sent to an LLM (e.g., OpenAI’s GPT4o) using a carefully designed chain-of-thought prompt that adheres to the definition above.

Judgment Responses

The LLM generates multiple distinct responses, each containing:

An explanation. A final judgment: “Yes” (goal accomplished) or “No” (goal not accomplished).

Score Computation

Action Completion Score = (Number of “Yes” Responses) / (Total Number of Responses)

Explanation Surfacing

One explanation is surfaced, chosen to align with the majority judgment among the responses.

We display one of the generated explanations alongside the score, always choosing one that aligns with the majority judgment.

This metric requires multiple LLM calls to compute, which may impact usage and billing.

Understanding action completion

When to Use This Metric

The Action Completion metric is the single best measure of whether an agent is truly useful. It is particularly valuable in the following scenarios:

Agentic Workflows: When an AI agent must decide on a course of action and select tools to accomplish tasks.
Multi-step Tasks: When completing a user’s request requires multiple steps or decisions.
Tool-using Assistants: When evaluating if the assistant successfully used the right tools.

Action completion will help determine whether the agent successfully accomplished all of the user’s goals. By tracking Action Completion over time, teams can identify patterns where agents fall short, analyze why certain scenarios lead to failures, and focus on targeted fixes.

Best practices

To optimize your agent’s performance and ensure high Action Completion scores, consider the following best practices:

Track Progress Over Time

Monitor Action Completion scores across different versions of your agent to identify trends and ensure continuous improvements in task completion capabilities.

Analyze Failure Patterns

When Action Completion scores are low, examine specific steps or scenarios where agents fail to meet user goals. Use this analysis to identify and address systematic issues.

Combine with Other Metrics

Use Action Completion alongside other agentic metrics, such as Action Advancement, to get a comprehensive view of your assistant’s effectiveness and identify areas for improvement.

Test Edge Cases

Create evaluation datasets that include complex, multi-step tasks to thoroughly assess your agent’s ability to handle challenging scenarios and advance user goals effectively.

When optimizing for Action Completion, ensure you’re not sacrificing other important aspects like safety, factual accuracy, or user experience in pursuit of task completion.

Overview

Get Started

How-to Guides

Cookbooks

Integrations

Concepts

SDK/API Reference

References

Action Completion

Calculation method

Understanding action completion

When to Use This Metric

Best practices

Track Progress Over Time

Analyze Failure Patterns

Combine with Other Metrics

Test Edge Cases

Overview

Get Started

How-to Guides

Cookbooks

Integrations

Concepts

SDK/API Reference

References

​Calculation method

​Understanding action completion

When to Use This Metric

​Best practices

Track Progress Over Time

Analyze Failure Patterns

Combine with Other Metrics

Test Edge Cases

Calculation method

Understanding action completion

Best practices