Understand and evaluate the performance of AI agents using Galileo’s agentic metrics
Name | Description | When to use | Example use case |
---|---|---|---|
Action advancement | Measures how effectively each action advances toward the goal. | When assessing whether an agent is making meaningful progress in multi-step tasks. | A travel planning agent that needs to book flights, hotels, and activities in the correct sequence. |
Action completion | Determines whether the agent successfully accomplished all of the user’s goals. | To assess whether an agent completed the desired goal. | A coding agent that is seeking to close engineering tickets. |
Agent efficiency | Determines if an agent provides a precise answer or resolution to every user ask, with an efficient path. | To assess if an agent is taking the most efficient path to a solution. | A complex multi-agent chatbot that needs a fast response. |
Agent flow | Measures the correctness and coherence of an agentic trajectory by validating it against user-specified natural language tests. | To assess a multi-agent system, or a system with multiple tools. | An internal process agent that needs to follow strict process rules. |
Conversation quality | A binary metric that assesses whether a chatbot interaction left the user feeling satisfied and positive or frustrated and dissatisfied. | When building customer facing chatbots. | A health insurance chatbot. |
Intent change | Measures a significant shift in the user’s primary conversational goal or workflow during a session, relative to their initial stated intent. | To analyze a holistic view across an entire user session to understand what capabilities a user interacts with in a single session. | A multi-purpose chatbot for a bank. |
Tool error | Detects errors or failures during the execution of tools. | When implementing AI agents that use tools and want to track error rates. | A coding assistant that uses external APIs to run code and must handle and report execution errors appropriately. |
Tool selection quality | Evaluates whether the agent selected the most appropriate tools for the task. | When optimizing agent systems for effective tool usage. | A data analysis agent that must choose the right visualization or statistical method based on the data type and user question. |