Overview

Action Completion determines whether the agent successfully accomplished all of the user’s goals.
Action Completion addresses the common pain points of agent performance by measuring whether AI agents are actually helping users achieve their end goal rather than just providing responses. Action Completion is successful when all of the below are true: :
  • The agent provides a complete answer in the case of a question
  • The agent provides a confirmation of successful action in the case of a request
  • The response is coherent and factually accurate
  • The response comprehensively addresses every aspect of the user’s request
  • The response avoids contradicting tool outputs
  • The response summarizes all relevant parts returned by tools

Action Completion at a glance

PropertyDescription
Name of MetricAction Completion
Metric CategoryAgentic Metrics
Use this metric forMeasuring whether the agent successfully accomplished the user’s goal
Can be applied toSession
LLM/Luna SupportSupported with both LLM + Luna models
Protect Runtime ProtectionNo
ConstantsNone - Uses dynamic evaluation
Usage ContextAgentic workflows, multi-step tasks, tool-using assistants
Value TypeConfidence score denoted as a percentage.
Input/Output RequirementsRequires agent responses and user goals for evaluation

When to Use This Metric

Action Completion is the single best measure of whether an agent is truly useful, particularly valuable for agentic workflows and multi-step tasks.
Agentic Workflows: When an AI agent must decide on a course of action and select tools to accomplish tasks.
Multi-step Tasks: When completing a user's request requires multiple steps or decisions.
Tool-using Assistants: When evaluating if the assistant successfully used the right tools and accomplished the intended goals.

Calculation method

If the response does not achieve an Action Completion score of 100%, it indicates that at least one judge considered the model to have failed in accomplishing every user goal.
1

Additional Requests

Multiple requests are sent to an LLM using a carefully designed chain-of-thought prompt that adheres to the definition above.
2

Judgment Responses

The LLM generates multiple distinct responses, each containing:
  • An explanation
  • A final judgment: “Yes” (goal accomplished) or “No” (goal not accomplished)
3

Score Computation

Action Completion Score = (Number of “Yes” Responses) / (Total Number of Responses)
4

Explanation Surfacing

One explanation is surfaced, chosen to align with the majority judgment among the responses.
Galileo displays a generated explanation alongside the score, choosing the one that aligns with the majority judgement for troubleshooting.
This metric requires multiple LLM calls to compute, which may impact usage and billing.

Score interpretation

Expected Score: 100% - A perfect score indicates the agent successfully accomplished all user goals with complete, accurate, and comprehensive responses.

What different scores mean:

  • 0.0 - 0.3 (Poor): Agent completely failed to accomplish user goals, provided incomplete answers, or contradicted tool outputs. Common causes include insufficient tool usage, incomplete responses, or factual inaccuracies.
  • 0.4 - 0.7 (Fair): Agent made progress toward user goals but didn’t fully address all aspects of the request. Areas for improvement include ensuring comprehensive coverage of all user requirements and better tool utilization.
  • 0.8 - 1.0 (Excellent): Agent successfully accomplished all user goals with complete, accurate, and comprehensive responses. Best practices include thorough tool usage, complete answer provision, and proper confirmation of successful actions.

How to improve Action Completion scores

To optimize your agent’s performance and ensure high Action Completion scores, focus on comprehensive goal accomplishment and complete response generation.

Common issues and solutions:

IssueCauseSolution
Incomplete responsesAgent stops before addressing all user requirementsImplement comprehensive response generation and ensure all user goals are explicitly addressed
Tool output contradictionsAgent ignores or contradicts information from toolsEnsure agent properly summarizes and incorporates all relevant tool outputs without contradiction
Missing confirmationsAgent doesn’t confirm successful actionsAdd explicit confirmation steps for action-based requests
Factual inaccuraciesAgent provides incorrect informationImplement fact-checking mechanisms and ensure responses align with tool outputs

Best practices for optimization:

  • Track Progress Over Time: Monitor Action Completion scores across different versions of your agent to identify trends and ensure continuous improvements in task completion capabilities.
  • Analyze Failure Patterns: When Action Completion scores are low, examine specific steps or scenarios where agents fail to meet user goals. Use this analysis to identify and address systematic issues.
  • Combine with Other Metrics: Use Action Completion alongside other agentic metrics, such as Action Advancement, to get a comprehensive view of your assistant’s effectiveness and identify areas for improvement.
  • Test Edge Cases: Create evaluation datasets that include complex, multi-step tasks to thoroughly assess your agent’s ability to handle challenging scenarios and advance user goals effectively.
When optimizing for Action Completion, ensure you’re not sacrificing other important aspects like safety, factual accuracy, or user experience in pursuit of task completion.

Comparison to other metrics

PropertyAction CompletionAction AdvancementTool Selection
Metric CategoryAgentic PerformanceAgentic PerformanceAgentic Performance
Use this metric forMeasuring goal accomplishmentMeasuring progress toward goalsMeasuring tool choice quality
Best forFinal outcome evaluationProgress trackingTool usage optimization
LLM/Luna SupportYesYesYes
Protect Runtime ProtectionNoNoNo
Value TypePercentage (0%-100%)Percentage (0%-100%)Percentage (0%-100%)
LimitationsRequires multiple LLM callsMay not capture final successDoesn’t measure execution quality
If you would like to dive deeper or start implementing Action Completion, check out the following resources:

How-to guides: