This metric is particularly valuable for uncovering hallucinations where the model is ignoring instructions, which can lead to responses that don’t meet user requirements or business rules.Here’s a scale that shows the relationship between Instruction Adherence and the potential impact on your AI system:
01
Low Adherence
The model ignored its instructions when generating its response.
High Adherence
The model followed its instructions when generating its response.
Instruction Adherence is computed through a multi-step process:
1
Model Evaluation
The system sends multiple evaluation requests to OpenAI’s GPT4o model to analyze whether the response follows the provided instructions.
2
Analysis Process
A specialized chain-of-thought prompt guides the model through a detailed evaluation of how well the response adheres to the specific instructions given.
3
Multiple Assessments
The system requests and collects multiple distinct responses to ensure a robust evaluation through consensus.
4
Result Generation
Each evaluation produces both a detailed explanation of the reasoning and a binary judgment (yes/no) on instruction adherence.
5
Score Calculation
The final score is computed as the ratio of positive (‘yes’) responses to the total number of evaluation responses.
We also surface one of the generated explanations, always choosing one that aligns with the majority judgment among the responses.
This metric is computed by prompting an LLM multiple times, and thus requires additional LLM calls to compute, which may impact usage and billing.
Write clear, specific instructions without ambiguity or contradictions to improve adherence rates.
Prioritize Critical Instructions
Place the most important instructions prominently in your prompt and consider repeating them for emphasis.
Monitor Across Models
Compare Instruction Adherence scores across different LLMs to identify which models best follow your specific instructions.
Implement Feedback Loops
Use low-adherence examples to refine your prompts and create test cases for future prompt iterations.
When optimizing for Instruction Adherence, balance strict adherence with allowing the model some flexibility. Overly rigid instructions may limit the model’s ability to provide helpful responses in edge cases.