- Ensure your AI’s responses match your brand’s voice and tone.
- Check that generated content is clear, concise, and appropriate for your audience.
- Quantitatively measure the quality of generated text compared to human-written references.
| Name | Description | Supported Nodes | When to Use | Example Use Case |
|---|---|---|---|---|
| Tone | Evaluates the emotional tone and style of the response. | Trace (root input/output only) | When the style and tone of AI responses matter for your brand or user experience. | A luxury brand’s customer service chatbot that must maintain a sophisticated, professional tone consistent with the brand image. |
| BLEU & ROUGE | Standard NLP metrics for evaluating text generation quality. These metrics are only available for experiments as they need ground truth set in your dataset. | LLM span | When you want to quantitatively assess the similarity between generated and reference texts. | Evaluating the quality of machine-translated or summarization outputs against human-written references. |