Agent Flow is a binary metric that checks if an agent’s behavior satisfies all user-defined natural language conditions.
To use this metric, you will need to create a copy and edit the prompt to provide your natural language tests.
Agent Flow at a glance
| Property | Description |
|---|---|
| Name | Agent Flow |
| Category | Agentic AI |
| Can be applied to | Session |
| LLM-as-a-judge Support | ✅ |
| Luna Support | ❌ |
| Protect Runtime Protection | ❌ |
| Value Type | Boolean shown as a percentage confidence score |
When to use this metric
When to Use This Metric
Agents with multiple possible paths: Agent Flow can evaluate an agentic application that has multiple possible paths where you know the expected behavior for each user response. You can validate that the agent performs the expected behavior.
Agents with specific intention views: Agent flow can validate specific interaction rules. For example, ensuring the agent asks for confirmation before completing a purchase.
Agents with unconditional behaviors: Agent flow can check for unconditional behaviors, such as verifying that the agent always calls the authentication tool during a conversation.
Score interpretation
Expected Score: 80%-100%.060%100%
Poor
Fair
Excellent
Configure Agent Flow
This metric needs to be manually customized to include your own natural language tests.1
Create a copy of the Agent Flow metric
From the Metrics Hub, select the Agent Flow metric. You will get a popup asking you to duplicate the metric. Select Duplicate metric to create a copy.

2
Locate the user defined tests section
Locate the user defined tests section in the prompt.
3
Customize the prompt by adding your user-defined tests
This prompt needs to be customized based on your application, and the inputs and outputs you are expecting. Replace
{{ Add your tests here }} with a numbered list of tests in natural language that can be used to evaluate the agent efficiency. This can include:- Expected tool or agent calls, using the tool or agent names
- Conditions on tool or agent calling (e.g. if tool x is called, don’t call agent y)
- Expectations around the input or output parameters to tools and agents
- Limitations on the number of tool or agent calls
list_by_target_muscle_for_exercised, list_by_body_part_for_exercised, list_of_bodyparts_for_exercised. Some user tests might be:4
Save the metric
Save the metric, then turn it on for your Log stream.