- Identify responses where the model is unsure or likely to make mistakes.
- Improve user trust by surfacing confidence scores or warnings.
- Analyze which prompts or situations are most challenging for your AI.
| Name | Description | Supported Nodes | When to Use | Example Use Case |
|---|---|---|---|---|
| Prompt Perplexity | Evaluates how difficult or unusual the prompt is for the model to process. | LLM span | When you want to identify prompts that may confuse the model or lead to lower-quality responses. | Detecting outlier prompts in a customer support chatbot to improve prompt engineering. |
| Uncertainty | Measures the model’s confidence in its generated response. | LLM span | When you want to understand how certain the model is about its answers. | Flagging responses where the model is unsure, so a human can review them before sending to a user. |