Action Advancement measures whether an assistant successfully accomplishes or makes progress toward at least one user goal in a conversation.

An assistant successfully advances a user’s goal when it:

  1. Provides a complete or partial answer to the user’s question
  2. Requests clarification or additional information to better understand the user’s needs
  3. Confirms that a requested action has been successfully completed

For an interaction to count as advancing the user’s goal, the assistant’s response must be:

  • Factually accurate
  • Directly addressing the user’s request
  • Consistent with any tool outputs used

Calculation Method

If the Action Advancement score is less than 100%, it means at least one evaluator determined the assistant failed to make progress on any user goal.

Action Advancement is calculated by:

1

Model Request

Multiple evaluation requests are sent to an LLM evaluator (e.g., OpenAI’s GPT4o-mini) to analyze the assistant’s progress toward user goals.

2

Prompt Engineering

A specialized chain-of-thought prompt guides the model to evaluate whether the assistant made progress on user goals based on the metric’s definition.

3

Evaluation Process

Each evaluation analyzes the interaction and produces both a detailed explanation and a binary judgment (yes/no) on goal advancement.

4

Score Calculation

The final Action Advancement score is computed as the percentage of positive (‘yes’) responses out of all evaluation responses.

We display one of the generated explanations alongside the score, always choosing one that aligns with the majority judgment.

This metric requires multiple LLM calls to compute, which may impact usage and billing.

Understanding Action Advancement

When to Use This Metric

Action Advancement is particularly valuable for evaluating:

Agentic Workflows: When an AI agent must decide on actions and select appropriate tools.

Multi-step Tasks: When completing a user’s request requires multiple steps or decisions.

Tool-using Assistants: When evaluating if the assistant used available tools effectively.

This metric helps you determine whether the assistant chose appropriate actions and made meaningful progress toward fulfilling the user’s request.

Best Practices

Track Progress Over Time

Monitor Action Advancement scores across different versions of your agent to ensure improvements in task completion capabilities.

Analyze Failure Patterns

When Action Advancement scores are low, examine the specific steps where agents fail to make progress to identify systematic issues.

Combine with Other Metrics

Use Action Advancement alongside other agentic metrics to get a comprehensive view of your assistant’s effectiveness.

Test Edge Cases

Create evaluation datasets that include complex, multi-step tasks to thoroughly assess your agent’s ability to advance user goals.

When optimizing for Action Advancement, ensure you’re not sacrificing other important aspects like safety, factual accuracy, or user experience in pursuit of task completion.