If the Action Advancement score is less than 100%, it means at least one evaluator determined the assistant failed to make progress on any user goal.Action Advancement is calculated by:
1
Model Request
Multiple evaluation requests are sent to an LLM evaluator (e.g., OpenAI’s GPT4o-mini) to analyze the assistant’s progress toward user goals.
2
Prompt Engineering
A specialized chain-of-thought prompt guides the model to evaluate whether the assistant made progress on user goals based on the metric’s definition.
3
Evaluation Process
Each evaluation analyzes the interaction and produces both a detailed explanation and a binary judgment (yes/no) on goal advancement.
4
Score Calculation
The final Action Advancement score is computed as the percentage of positive (‘yes’) responses out of all evaluation responses.
We display one of the generated explanations alongside the score, always choosing one that aligns with the majority judgment.
This metric requires multiple LLM calls to compute, which may impact usage and billing.
Monitor Action Advancement scores across different versions of your agent to ensure improvements in task completion capabilities.
Analyze Failure Patterns
When Action Advancement scores are low, examine the specific steps where agents fail to make progress to identify systematic issues.
Combine with Other Metrics
Use Action Advancement alongside other agentic metrics to get a comprehensive view of your assistant’s effectiveness.
Test Edge Cases
Create evaluation datasets that include complex, multi-step tasks to thoroughly assess your agent’s ability to advance user goals.
When optimizing for Action Advancement, ensure you’re not sacrificing other important aspects like safety, factual accuracy, or user experience in pursuit of task completion.