AI agents sometimes deviate from their assigned tasks, leading to non-compliant or incomplete responses. This often happens when instructions are unclear or when randomness in generation is too high.

What Went Wrong?

  • Instructions to the agent were too vague or ambiguous.
  • The LLM was allowed too much flexibility in execution.

How It Showed Up in Metrics:

  • Low Instruction Adherence: The agent did not follow specific task instructions.
  • High Tool Errors: The agent used tools incorrectly due to poor adherence.

Improvements and Solutions

1

Use Galileo’s Metrics to Diagnose the Issue

  • Track Instruction Adherence to measure how well the agent follows provided instructions.
  • Identify Tool Errors caused by instruction misinterpretation.
2

Strengthen Instruction Adherence

  • Use system messages to set explicit constraints on agent behavior.
  • Implement zero-shot or few-shot examples to guide response generation.
  • Adjust prompts based on Galileo’s Instruction Adherence insights.
3

Adjust Prompting for Compliance

  • Be direct and structured: “Summarize Apple’s financial growth over five years. Use financial reports only.”
  • Reduce temperature to lower randomness.
4

Validate and Iterate Using Galileo

  • Track Instruction Adherence scores in Galileo.
  • Flag responses that deviate from expected behavior and iterate on prompts.
  • Compare iterations using Tool Selection Quality and Instruction Adherence metrics.