SQL Efficiency acts as a heuristic query cost analyzer, assessing whether a generated SQL query is structured efficiently or poses an availability risk to the database.
Metric definition
SQL Efficiency — A binary metric that evaluates whether the generated SQL query avoids known performance anti-patterns and resource exhaustion risks.- Type: Binary
- 1 (Performant): The query is structured efficiently and poses no obvious availability risk.
- 0 (Non-Performant): The query uses known anti-patterns or poses a resource exhaustion risk.
0100
Inefficient
Query uses anti-patterns or poses resource exhaustion risks.Efficient
Query is well-structured with no obvious performance issues.Scale is 0–100 and is derived from binary judgments converted into a confidence score.
Calculation method
SQL Efficiency is computed through a comprehensive code review process:Model Request
One or more evaluation requests are sent to an LLM evaluator to analyze the SQL query for performance anti-patterns.
Prompt Engineering
A specialized chain-of-thought prompt guides the model to evaluate the query for efficiency issues including cross-product detection, resource exhaustion risks, index suppression, and structural flaws.
Evaluation Process
The evaluator analyzes the query for known anti-patterns such as Cartesian joins, unbounded recursive CTEs, non-SARGable predicates, inefficient outer join filtering, and correlated subqueries.
This metric is computed by prompting an LLM and may require multiple LLM calls to compute, which can impact usage and billing.
Supported nodes
- LLM span
- The generated SQL query (output)
- Optional context such as the natural language query
- Schema information for understanding table relationships
What constitutes efficient (1)
- Avoids Anti-Patterns: No known, statically detectable performance anti-patterns.
- No Resource Exhaustion: No obvious structures like Cartesian joins or unbounded recursive CTEs.
- Efficient Filtering: Filters correctly placed (e.g., ON clause for LEFT JOIN conditions).
- Index-Friendly: Predicates structured to allow index utilization.
What constitutes inefficient (0)
Unconstrained Cartesian Joins
Multiple tables listed without linking conditions, creating massive data explosions:Unsafe Recursive CTEs
Recursive Common Table Expressions without proper termination conditions:Inefficient Outer Join Filtering
Placing filters for outer-joined tables in WHERE instead of ON:Index-Suppressing Predicates (Non-SARGable)
Wrapping columns in functions prevents index usage:Other Anti-Patterns
- SELECT *: Retrieving all columns when only specific ones are needed
- Correlated Subqueries: Subqueries referencing outer query columns (N+1 problem)
- Implicit Type Casting: Comparing mismatched types (e.g., VARCHAR to unquoted integers)
- Misplaced HAVING: Using HAVING for non-aggregated column filters
- Unnecessary DISTINCT: Deduplicating already-unique columns
- Inefficient UNION: Using UNION when UNION ALL suffices
Example use cases
- Pre-flight validation of generated SQL before execution on production databases.
- Identifying potential DoS risks in user-facing Text-to-SQL applications.
- Training and improving SQL generation models to avoid anti-patterns.
- Code review automation for data analytics pipelines.
Best practices
Specify SQL dialect and schema
Providing the SQL dialect and schema information helps the evaluator understand dialect-specific anti-patterns and table relationships for optimization analysis.
Combine with other Text-to-SQL metrics
Use alongside SQL Correctness and SQL Adherence for comprehensive validation of generated queries.
Iterate with CLHF
Use continuous learning via human feedback to tune detection for your specific database patterns.
Review flagged queries
Manually review inefficient queries to understand patterns and improve your SQL generation prompts.