Some models struggle to confidently generate responses, leading to hesitation, incomplete answers, or repeated disclaimers.

For example, consider this prompt and response:

Prompt: "Tell me about climate change."

Model Response: “Well, there are many aspects to climate change. Some people think it’s caused by humans, and others think it’s just natural. It’s hard to say exactly.”

What Went Wrong?

  • The prompt did not provide enough context for confident decision-making
  • The model allowed too much randomness in token selection
  • The prompt was ambiguous in the response it expected

How It Showed Up in Metrics:

  • High Uncertainty: The model hesitated in its response
  • High Prompt Perplexity: The model struggled with predicting the next token
  • Mid-range Instruction Adherence: The model understood the instructions but lacked decisiveness

Improvements and Solutions

For the following improvements, we will be showing how we could change a simple prompt script like the below example:

import os
from galileo.openai import openai
from dotenv import load_dotenv
load_dotenv()

client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

prompt = "Tell me about climate change."
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "system", "content": prompt}],
)
print(response.choices[0].message.content.strip())

1

Provide Stronger Context in Prompts

Include explicit guiding statements, for example:

prompt = """Tell me about climate change. 
  Assume the following is true: 
  1. Climate change is driven by human activity. 
  2. Provide an explanation based on scientific evidence."""

This should reduce the uncertaincy and perplexity in your metrics on Galileo.

2

Adjust Model Sampling Parameters

Lower temperature to make the model more deterministic, for example:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a joke."}],
    temperature=0.3  # Reduced temperature for more predictable responses
)

Use top-k sampling to limit options and prevent hesitation, for example:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me about climate change."}],
    # This is often how top k would be set. 
    # In practice, ChatGPT does not allow modifying top-k, but other models do
    top_k=50  # Limits token selection to the top 50 choices
)

Lowering the temperature and decreasing top_k both generally increase the prompt adherence.

3

Modify Prompt Structure

Use direct phrasing to force a single, clear response, for example:

climate change in two sentences." ``` ```typescript example.ts const prompt =
"Provide a single, definitive explanation of climate change in two
sentences."; ```
</CodeGroup>

Avoid prompts that allow multiple equally valid answers, for example avoiding confusing by adding the source of the opinion we care about:

<CodeGroup>
```python example.py
prompt = "What are the primary causes of climate change according to scientists?"
4

Apply Uncertainty-Based Filtering

Automatically reject responses with an Uncertainty score above a set threshold

📷
Add screenshot of uncertainty metrics before optimization
📷
Add screenshot of uncertainty metrics after optimization
📷
Add screenshot of response trigger configuration
📷
Add screenshot of hesitation pattern analysis