Agentic metrics help you measure how well your AI agents perform complex, multi-step tasks—especially when those agents need to use tools, make decisions, or interact with external systems. These metrics and helpful for those for anyone building advanced AI assistants, workflow automation, or any system where the AI acts on behalf of a user. Use agentic metrics when you want to:
  • Track whether your agent is making meaningful progress toward its goals.
  • Detect and diagnose errors that occur when your agent uses tools or APIs.
  • Ensure your agent is choosing the best tools or actions for each situation.
Below is a quick reference table of all agentic performance metrics:
NameDescriptionWhen to useExample use case
Action advancementMeasures how effectively each action advances toward the goal.When assessing whether an agent is making meaningful progress in multi-step tasks.A travel planning agent that needs to book flights, hotels, and activities in the correct sequence.
Action completionDetermines whether the agent successfully accomplished all of the user’s goals.To assess whether an agent completed the desired goal.A coding agent that is seeking to close engineering tickets.
Agent efficiencyDetermines if an agent provides a precise answer or resolution to every user ask, with an efficient path.To assess if an agent is taking the most efficient path to a solution.A complex multi-agent chatbot that needs a fast response.
Agent flowMeasures the correctness and coherence of an agentic trajectory by validating it against user-specified natural language tests.To assess a multi-agent system, or a system with multiple tools.An internal process agent that needs to follow strict process rules.
Conversation qualityA binary metric that assesses whether a chatbot interaction left the user feeling satisfied and positive or frustrated and dissatisfied.When building customer facing chatbots.A health insurance chatbot.
Intent changeMeasures a significant shift in the user’s primary conversational goal or workflow during a session, relative to their initial stated intent.To analyze a holistic view across an entire user session to understand what capabilities a user interacts with in a single session.A multi-purpose chatbot for a bank.
Tool errorDetects errors or failures during the execution of tools.When implementing AI agents that use tools and want to track error rates.A coding assistant that uses external APIs to run code and must handle and report execution errors appropriately.
Tool selection qualityEvaluates whether the agent selected the most appropriate tools for the task.When optimizing agent systems for effective tool usage.A data analysis agent that must choose the right visualization or statistical method based on the data type and user question.

Next steps