Skip to main content

SQL Adherence assesses whether the generated SQL query semantically aligns with the intent of the natural language query provided.

Metric definition

SQL Adherence — A binary metric that evaluates whether the generated SQL query accurately reflects the user’s intent as expressed in the natural language request.
  • Type: Binary
    • 1 (Valid): The query semantically matches the intention of the natural language query and answers it accurately.
    • 0 (Invalid): The query doesn’t answer the natural language query accurately.
This metric focuses purely on the “meaning” and “intent” of the SQL query, independent of syntactic correctness or schema validity. It validates that the operations performed (filtering, sorting, aggregating) match what the user requested.
This metric requires the original natural language query to be provided. Without the NL query, semantic alignment cannot be evaluated and the metric will not produce meaningful results.
Here’s a scale that shows the relationship between SQL Adherence and potential impact on your AI system:
0100
Non-Adherent
SQL query doesn't accurately answer the natural language request.
Fully Adherent
SQL query semantically matches user intent and answers accurately.
Scale is 0–100 and is derived from binary judgments converted into a confidence score.

Calculation method

SQL Adherence is computed through a multi-step evaluation process:
1

Model Request

One or more evaluation requests are sent to an LLM evaluator to analyze the semantic alignment between the natural language query and the generated SQL.
2

Prompt Engineering

A specialized chain-of-thought prompt guides the model to evaluate whether the SQL accurately reflects user intent across selection, filtering, aggregation, and logical relationships.
3

Evaluation Process

The evaluator analyzes the query for semantic correctness, checking column selection alignment, filtering logic (under/over-filtering), aggregation functions, and proper use of ORDER BY/LIMIT clauses.
4

Score Calculation

Based on the evaluation, a binary score is assigned: 1 (Valid) if the query semantically matches user intent, or 0 (Invalid) if semantic mismatches are detected.
This metric is computed by prompting an LLM and may require multiple LLM calls to compute, which can impact usage and billing.

Supported nodes

  • LLM span
Required inputs:
  • The natural language query posed by the user
  • The generated SQL query (output)
Optional inputs:
  • SQL dialect, schema information, and domain knowledge hints
This metric does NOT check syntactic correctness or schema validity—it focuses purely on semantic alignment with user intent. Use SQL Correctness for syntactic and schematic validation.

What constitutes adherent (1)

  • Semantic Correctness: The SQL accurately reflects user intent and retrieves the exact required data.
  • Completeness: The query includes all necessary components to fulfill the request.
  • Minimal Redundancy (preferred): Uses the most efficient/standard formulation, though returning the identical result set is the core requirement.

What constitutes non-adherent (0)

Semantic Mismatch (Incorrect Logic)

The executable SQL doesn’t match user intent, including:
  • Wrong column/table referenced for the requested data
  • Incorrect filtering/condition (wrong WHERE/HAVING operator or value)
  • Incorrect aggregation/grouping (wrong aggregate function, missing GROUP BY)
  • Incorrect ordering/limiting (ORDER BY/LIMIT error)

Ambiguity Not Resolved

  • System failed to interpret an ambiguous query
  • No attempt to prompt for clarification when needed

Missing Components

  • SQL omits necessary clauses required to fulfill user intent
  • Missing a necessary JOIN to connect related tables
  • Missing WHERE clause for a filtered request

Example use cases

  • Validating that a natural language to SQL assistant correctly interprets user questions.
  • Ensuring data analytics queries return the data users actually requested.
  • Quality assurance for business intelligence tools before presenting query results.
  • Comparing different LLM models or prompts for semantic accuracy in SQL generation.
Example: A data analytics assistant where a user asks “Show me the top 5 customers by total sales” — SQL Adherence validates that the query correctly orders by sales amount in descending order, limits to 5 results, and selects customer information rather than product data.

Best practices

Specify SQL dialect and schema

Providing the SQL dialect and schema information helps the evaluator understand dialect-specific syntax and table relationships for more accurate assessment.

Include domain hints

Provide domain knowledge hints (e.g., “premium customer means tier = ‘gold’”) for more accurate intent matching.

Combine with SQL Correctness

Use alongside SQL Correctness to validate both semantic intent and syntactic/schematic validity.

Iterate with CLHF

Use continuous learning via human feedback to improve the evaluator’s understanding of your domain-specific queries.